Re: Custom RDD in spark, cannot find custom method

2016-03-27 Thread Tenghuan He
Hi Ted I changed def customable(partitioner: Partitioner): RDD[(K, V)] = self.withScope { to def customable(partitioner: Partitioner): MyRDD[K, V] = self.withScope { after rebuilding the whole spark project(Since it takes long time, I didn't do as you told at first), it also works. Thnaks On

Twitter receiver not running in spark 1.6.0

2016-03-27 Thread lokeshkumar
Hi forum For some reason if I include a twitter receiver and start the streaming context, I get the below exception not sure why Can someone let me know if anyone has already encountered this issue or am I doing something wrong? java.lang.ArithmeticException: / by zero at

Re: StateSpec raises error "missing arguments for method"

2016-03-27 Thread Cyril Scetbon
hmm, I finally found that the following syntax works better : scala> val mappingFunction = (key: String, value: Option[Int], state: State[Int]) => { | Option(key) | } mappingFunction: (String, Option[Int], org.apache.spark.streaming.State[Int]) => Option[String] = scala> val spec =

StateSpec raises error "missing arguments for method"

2016-03-27 Thread Cyril Scetbon
Hi, I'm testing a sample code and I get this error. Here is the sample code I use : scala> import org.apache.spark.streaming._ import org.apache.spark.streaming._ scala> def mappingFunction(key: String, value: Option[Int], state: State[Int]): Option[String] = { |Option(key) | }

Re: Re: IntelliJ idea not work well with spark

2016-03-27 Thread Tenghuan He
​Hi Wenchao, I use steps described in the page and it works great, you can have a try:) http://danielnee.com/2015/01/setting-up-intellij-for-spark/​ On Mon, Mar 28, 2016 at 9:38 AM, 吴文超 wrote: > for the simplest word count, > val wordCounts = textFile.flatMap(line =>

RE: Does SparkSql has official jdbc/odbc driver?

2016-03-27 Thread Raymond Honderdors
For now they are for free Sent from my Samsung Galaxy smartphone. Original message From: Sage Meng Date: 3/28/2016 04:14 (GMT+02:00) To: Raymond Honderdors Cc: mich.talebza...@gmail.com, user@spark.apache.org Subject: Re:

Re: Custom RDD in spark, cannot find custom method

2016-03-27 Thread Tenghuan He
Thanks very much Ted I added MyRDD.scala to the spark source code and rebuilt the whole spark project, using myrdd.asInstanceOf[MyRDD] doesn't work. It seems that MyRDD is not exposed to the spark-shell. Finally I write a seperate spark application and add the MyRDD.scala to the project then the

--packages configuration equivalent item name?

2016-03-27 Thread Russell Jurney
I run PySpark with CSV support like so: IPYTHON=1 pyspark --packages com.databricks:spark-csv_2.10:1.4.0 I don't want to type this --packages argument each time. Is there a config item for --packages? I can't find one in the reference at http://spark.apache.org/docs/latest/configuration.html If

Fwd: NoSuchElementException in ChiSqSelector fit method (version 1.6.0)

2016-03-27 Thread Joshua Cason
Hi All, I'm running into an error that's not making a lot of sense to me, and couldn't find sufficient info on the web to answer it myself. BTW, you can also reply on Stack Overflow: http://stackoverflow.com/questions/36254005/nosuchelementexception-in-chisqselector-fit-method-version-1-6-0

Re: Re: IntelliJ idea not work well with spark

2016-03-27 Thread 吴文超
for the simplest word count, val wordCounts = textFile.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey((a, b) => a + b) In the idea IDE, it shows reduceByKey can not be resolved. Maybe it is the problem of version incompatible。What do you use to construct the projec t?

Re: Does SparkSql has official jdbc/odbc driver?

2016-03-27 Thread Sage Meng
Hi Raymond Honderdors, are odbc/jdbc drivers for spark sql from databricks free, or the drivers from databricks can only be used on databricks's spark-sql release? 2016-03-25 17:48 GMT+08:00 Raymond Honderdors : > Recommended drivers for spark / thrift are the

Re: Potential conflict with org.iq80.snappy in Spark 1.6.0 environment?

2016-03-27 Thread Vasu Parameswaran
It turned out that there is a conflict for this within the Spark environment. This error got resolved after I shaded out org.iq80.snappy via the maven shade plugin. Just posting so others may benefit. On Sun, Mar 20, 2016 at 11:40 PM, Akhil Das wrote: > Looks

Re: IntelliJ idea not work well with spark

2016-03-27 Thread Eugene Morozov
Could you, pls share your code, so that I could try it. -- Be well! Jean Morozov On Sun, Mar 27, 2016 at 5:20 PM, 吴文超 wrote: > I am a newbie to spark, when I use IntelliJ idea to write some scala code, > i found it reports error when using spark's implicit

This works to filter transactions older than certain months

2016-03-27 Thread Mich Talebzadeh
Hi, A while back I was looking for functional programming to filter out transactions older > n months etc. This turned out to be pretty easy. I get today's day as follows var today = sqlContext.sql("SELECT FROM_unixtime(unix_timestamp(), '-MM-dd') ").collect.apply(0).getString(0) CSV

Re: Custom RDD in spark, cannot find custom method

2016-03-27 Thread Ted Yu
My interpretation is that variable myrdd is of type RDD to REPL, though it was an instance of MyRDD. Using asInstanceOf in spark-shell should allow you to call your custom method. Here is declaration of RDD: abstract class RDD[T: ClassTag]( You can extend RDD and include your custom logic in

Re: Custom RDD in spark, cannot find custom method

2016-03-27 Thread Alexander Krasnukhin
Well, passing state between custom methods is trickier. But why don't you merge both methods into one and then no need to pass state. -- Alexander aka Six-Hat-Thinker > On 27 Mar 2016, at 19:24, Tenghuan He wrote: > > Hi Alexander, > Thanks for your reply > > In the

Re: Custom RDD in spark, cannot find custom method

2016-03-27 Thread Tenghuan He
Hi Alexander, Thanks for your reply In the custom rdd, there are some fields I have defined so that both custom method and compute method can see and operate them, can the method in implicit class implement that? On Mon, Mar 28, 2016 at 1:09 AM, Alexander Krasnukhin

Re: Custom RDD in spark, cannot find custom method

2016-03-27 Thread Tenghuan He
​Thanks Ted, but I have a doubt that as the code ​above (line 4) in the spark-shell shows myrdd is already a MyRDD, does that not make sense? 1 scala> val part = new org.apache.spark.HashPartitioner(10) 2 scala> val baseRDD = sc.parallelize(1 to 10).map(x => (x,

Re: Custom RDD in spark, cannot find custom method

2016-03-27 Thread Alexander Krasnukhin
Extending breaks chaining and not nice. I think it is much better to write implicit class with extra methods. This way you add new methods without touching hierarchy at all i.e. object RddFunctions { implicit class RddFunctionsImplicit[T](rdd: RDD[T]) { /*** * Cache RDD and name it in

Re: Custom RDD in spark, cannot find custom method

2016-03-27 Thread Ted Yu
bq. def customable(partitioner: Partitioner): RDD[(K, V)] = self.withScope { In above, you declare return type as RDD. While you actually intended to declare MyRDD as the return type. Or, you can cast myrdd as MyRDD in spark-shell. BTW I don't think it is good practice to add custom method to

Re: Custom RDD in spark, cannot find custom method

2016-03-27 Thread Tenghuan He
Hi Ted, The codes are running in spark-shell scala> val part = new org.apache.spark.HashPartitioner(10) scala> val baseRDD = sc.parallelize(1 to 10).map(x => (x, "hello")).partitionBy(part).cache() scala> val myrdd = baseRDD.customable(part) // here customable is a method added to the

Re: Custom RDD in spark, cannot find custom method

2016-03-27 Thread Ted Yu
Can you show the full stack trace (or top 10 lines) and the snippet using your MyRDD ? Thanks On Sun, Mar 27, 2016 at 9:22 AM, Tenghuan He wrote: > ​Hi everyone, > > I am creating a custom RDD which extends RDD and add a custom method, > however the custom method

Custom RDD in spark, cannot find custom method

2016-03-27 Thread Tenghuan He
​Hi everyone, I am creating a custom RDD which extends RDD and add a custom method, however the custom method cannot be found. The custom RDD looks like the following: class MyRDD[K, V]( var base: RDD[(K, V)], part: Partitioner ) extends RDD[(K, V)](base.context, Nil) { def

IntelliJ idea not work well with spark

2016-03-27 Thread 吴文超
I am a newbie to spark, when I use IntelliJ idea to write some scala code, i found it reports error when using spark's implicit conversion.e.g. whe use the RDD as Pair RDD to get reduceByKey function. However, the project can run normally in the cluster. As somebody says it needs import

Re: Databricks fails to read the csv file with blank line at the file header

2016-03-27 Thread Mich Talebzadeh
Pretty simple as usual it is a combination of ETL and ELT. Basically csv files are loaded into staging directory on host, compressed before pushing into hdfs 1. ETL --> Get rid of the header blank line on the csv files 2. ETL --> Compress the csv files 3. ETL --> Put the compressed CVF

Re: whether a certain piece can be assigned to a specicified node by some codes in my program.

2016-03-27 Thread Ted Yu
Please take a look at the MyRDD class in: core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala There is scaladoc for the class. See how getPreferredLocations() is implemented. Cheers On Sun, Mar 27, 2016 at 2:01 AM, chenyong wrote: > Thank you Ted for

Non-classification neural networks

2016-03-27 Thread Jim Carroll
Hello all, We were using the old "Artificial Neural Network" : https://github.com/apache/spark/pull/1290 This code appears to have been incorporated in 1.5.2 but it's only exposed publicly via the MultilayerPerceptronClassifier. Is there a way to use the old feedforward/backprop

Sessionization with DF Window functions

2016-03-27 Thread Giorgi Jvaridze
Hi, I have DF of user events from log file and I'm trying to construct sessions of same user. Session is list of events from same user, where time difference between two consecutive events (when sorted by time) within the session isn't greater than 30 minutes. Currently I'm using DF's Lag

Re: SparkSQL exception on spark.sql.codegen

2016-03-27 Thread song ma
Hi Eric and Michael: I run into this problem with Spark 1.4.1 too. The error stack is: java.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.sql.catalyst.expressions.codegen.GeneratePredicate$ at

RE: whether a certain piece can be assigned to a specicified node by some codes in my program.

2016-03-27 Thread chenyong
Thank you Ted for your reply.Your example code is little bit difficult for me to understanding. Could youn please give me an explaination about it. From: cy...@hotmail.com To: user@spark.apache.org Subject: whether a certain piece can be assigned to a specicified node by some codes in my

Re: sliding Top N window

2016-03-27 Thread Lars Albertsson
[+spark list again] (I did not want to send "commercial spam" to the list :-)) The reduce function for CMSs is element-wise addition, and the reverse reduce function is element-wise subtraction. The heavy hitters list does not have monoid addition, but you can cheat. I suggest creating a heavy

Re: Testing spark with AWS spot instances

2016-03-27 Thread Alexander Pivovarov
I use spot instances for 100 slaves cluster (r3.2xlarge on us-west-1) Jobs I run usually take about 15 hours - cluster is stable and fast. 1-2 computers might be terminated but it's very rare event and Spark can handle it. On Fri, Mar 25, 2016 at 6:28 PM, Sven Krasser wrote: