RE: Exiting driver main() method...

2015-05-02 Thread Mohammed Guller
No, you don’t need to do anything special. Perhaps, your application is getting stuck somewhere? If you can share your code, someone may be able to help. Mohammed From: James Carman [mailto:ja...@carmanconsulting.com] Sent: Friday, May 1, 2015 5:53 AM To: user@spark.apache.org Subject: Exiting

Re: real time Query engine Spark-SQL on Hbase

2015-05-02 Thread Siddharth Ubale
Hi, Thanks for the reply. Hbase cli takes less than 500 ms for the same query. I am running a simple query i.t Select * from Customers where c_id='123123'. Why would the same query which takes 500 ms at Hbase cli end up taking around 8 secs via Spark-Sql? I am unable t understand this.

com.esotericsoftware.kryo.KryoException: java.lang.IndexOutOfBoundsException: Index:

2015-05-02 Thread shahab
Hi, I am using sprak-1.2.0 and I used Kryo serialization but I get the following excepton. java.io.IOException: com.esotericsoftware.kryo.KryoException: java.lang.IndexOutOfBoundsException: Index: 3448, Size: 1 I do apprecciate if anyone could tell me how I can resolve this? best, /Shahab

not getting any mail

2015-05-02 Thread Jeetendra Gangele
Hi All I am not getting any mail from this community?

Remoting warning when submitting to cluster

2015-05-02 Thread javidelgadillo
Hello all!! We've been prototyping some spark applications to read messages from Kafka topics. The application is quite simple, we use KafkaUtils.createStream to receive a stream of CSV messages from a Kafka Topic. We parse the CSV and count the number of messages we get in each RDD. At a

Re: Spark - Hive Metastore MySQL driver

2015-05-02 Thread Ted Yu
Can you try the patch from: [SPARK-6913][SQL] Fixed java.sql.SQLException: No suitable driver found Cheers On Sat, Mar 28, 2015 at 12:41 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com wrote: This is from my Hive installation -sh-4.1$ ls /apache/hive/lib | grep derby derby-10.10.1.1.jar

Re: How to add a column to a spark RDD with many columns?

2015-05-02 Thread dsgriffin
val newRdd = myRdd.map(row = row ++ Array((row(1).toLong * row(199).toLong).toString)) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-add-a-column-to-a-spark-RDD-with-many-columns-tp22729p22735.html Sent from the Apache Spark User List mailing list

Re: Exiting driver main() method...

2015-05-02 Thread Akhil Das
It used to exit without any problem for me. You can basically check in the driver UI (that runs on 4040) and see what exactly its doing. Thanks Best Regards On Fri, May 1, 2015 at 6:22 PM, James Carman ja...@carmanconsulting.com wrote: In all the examples, it seems that the spark application

empty jdbc RDD in spark

2015-05-02 Thread Hafiz Mujadid
Hi all! I am trying to read hana database using spark jdbc RDD here is my code def readFromHana() { val conf = new SparkConf() conf.setAppName(test).setMaster(local) val sc = new SparkContext(conf) val rdd = new JdbcRDD(sc, () = {

Re: Drop a column from the DataFrame.

2015-05-02 Thread dsgriffin
Just use select() to create a new DataFrame with only the columns you want. Sort of the opposite of what you want -- but you can select all but the columns you want minus the one you don. You could even use a filter to remove just the one column you want on the fly:

Re: spark.logConf with log4j.rootCategory=WARN

2015-05-02 Thread Akhil Das
It could be. Thanks Best Regards On Fri, May 1, 2015 at 9:11 PM, roy rp...@njit.edu wrote: Hi, I have recently enable log4j.rootCategory=WARN, console in spark configuration. but after that spark.logConf=True has becomes ineffective. So just want to confirm if this is because

Re: Spark Streaming Kafka Avro NPE on deserialization of payload

2015-05-02 Thread Akhil Das
There was a similar discussion over here http://mail-archives.us.apache.org/mod_mbox/spark-user/201411.mbox/%3ccakz4c0s_cuo90q2jxudvx9wc4fwu033kx3-fjujytxxhr7p...@mail.gmail.com%3E Thanks Best Regards On Fri, May 1, 2015 at 7:12 PM, Todd Nist tsind...@gmail.com wrote: *Resending as I do not

Re: Spark worker error on standalone cluster

2015-05-02 Thread Michael Ryabtsev
Thanks Akhil, I am trying to investigate this path. The spark is the same, but may be there is a difference in Hadoop. On Sat, May 2, 2015 at 6:25 PM, Akhil Das ak...@sigmoidanalytics.com wrote: Just make sure your are having the same version of spark in your cluster and the project's build

spark filestrea problem

2015-05-02 Thread Evo Eftimov
it seems that on Spark Streaming 1.2 the filestream API may have a bug - it doesn't detect new files when moving or renaming them on HDFS - only when copying them but that leads to a well known problem with .tmp files which get removed and make spark steraming filestream throw exception

sparkR equivalent to SparkContext.newAPIHadoopRDD?

2015-05-02 Thread David Holiday
Hi gang, I'm giving sparkR a test drive and am bummed to discover that the SparkContext API in sparkR is only a subset of what's available in stock spark. Specifically, I need to be able to pull data from accumulo into sparkR. I can do it with stock spark but can't figure out how to make the

Problem in Standalone Mode

2015-05-02 Thread drarse
When I run my program with Spark-Submit everythink are ok. But when I try run in satandalone mode I obtain the nex Exceptions: ((This is with val df = sqlContext.jsonFile(./datos.json) )) java.io.EOFException [error] at

Re: Help with publishing to Kafka from Spark Streaming?

2015-05-02 Thread Saisai Shao
Here is the pull request, you may refer to this: https://github.com/apache/spark/pull/2994 Thanks Jerry 2015-05-01 14:38 GMT+08:00 Pavan Sudheendra pavan0...@gmail.com: Link to the question: http://stackoverflow.com/questions/29974017/spark-kafka-producer-not-serializable-exception

Re: how to pass configuration properties from driver to executor?

2015-05-02 Thread Akhil Das
Infact, sparkConf.set(spark.whateverPropertyYouWant,Value) gets shipped to the executors. Thanks Best Regards On Fri, May 1, 2015 at 2:55 PM, Michael Ryabtsev mich...@totango.com wrote: Hi, We've had a similar problem, but with log4j properties file. The only working way we've found, was

[PSA] Use Stack Overflow!

2015-05-02 Thread Nick Chammas
This mailing list sees a lot of traffic every day. With such a volume of mail, you may find it hard to find discussions you are interested in, and if you are the one starting discussions you may sometimes feel your mail is going into a black hole. We can't change the nature of this mailing list

Generating version agnostic jar path value for --jars clause

2015-05-02 Thread nitinkak001
I have a list of cloudera jars which I need to provide in --jars clause, mainly for the HiveContext functionality I am using. However, many of these jars have version number as part of their names. This leads to an issue that the names might change when I do a Cloudera upgrade. Just a note here,

Re: Spark worker error on standalone cluster

2015-05-02 Thread Akhil Das
Just make sure your are having the same version of spark in your cluster and the project's build file. Thanks Best Regards On Fri, May 1, 2015 at 2:43 PM, Michael Ryabtsev (Totango) mich...@totango.com wrote: Hi everyone, I have a spark application that works fine on a standalone Spark

spark filestream problem

2015-05-02 Thread Evo Eftimov
it seems that on Spark Streaming 1.2 the filestream API may have a bug - it doesn't detect new files when moving or renaming them on HDFS - only when copying them but that leads to a well known problem with .tmp files which get removed and make spark steraming filestream throw exception -- View

Re: Enabling Event Log

2015-05-02 Thread Jeetendra Gangele
is it working now? On 1 May 2015 at 13:43, James King jakwebin...@gmail.com wrote: Oops! well spotted. Many thanks Shixiong. On Fri, May 1, 2015 at 1:25 AM, Shixiong Zhu zsxw...@gmail.com wrote: spark.history.fs.logDirectory is for the history server. For Spark applications, they should

to split an RDD to multiple ones?

2015-05-02 Thread Yifan LI
Hi, I have an RDD srdd containing (unordered-)data like this: s1_0, s3_0, s2_1, s2_2, s3_1, s1_3, s1_2, … What I want is (it will be much better if they could be in ascending order): srdd_s1: s1_0, s1_1, s1_2, …, s1_n srdd_s2: s2_0, s2_1, s2_2, …, s2_n srdd_s3: s3_0, s3_1, s3_2, …, s3_n … …

Re: DataFrame filter referencing error

2015-05-02 Thread Francesco Bigarella
First of all, thank you for your replies. I was previously doing this via normal jdbc connection and it worked without problems. Then I liked the idea that sparksql could take care of opening/closing the connection. I tried also with single quotes, since that was my first guess but didn't work.

spark filestream problem

2015-05-02 Thread Evo Eftimov
it seems that on Spark Streaming 1.2 the filestream API may have a bug - it doesn't detect new files when moving or renaming them on HDFS - only when copying them but that leads to a well known problem with .tmp files which get removed and make spark steraming filestream throw exception --

Re: empty jdbc RDD in spark

2015-05-02 Thread Ted Yu
bq. SELECT * FROM MEMBERS LIMIT ? OFFSET ?, Have you tried dropping limit and offset clause from the above query ? Cheers On Fri, May 1, 2015 at 1:56 PM, Hafiz Mujadid hafizmujadi...@gmail.com wrote: Hi all! I am trying to read hana database using spark jdbc RDD here is my code def

Re: Number of input partitions in SparkContext.sequenceFile

2015-05-02 Thread Archit Thakur
Hi, How did u check no of splits in ur file. Did i run ur mr job or calculated it.? The formula for split size is max(minSize, min(max size, block size)). Can u check if it satisfies ur case.? Thanks Regards, Archit Thakur. On Saturday, April 25, 2015, Wenlei Xie wenlei@gmail.com wrote:

Re: How to add a column to a spark RDD with many columns?

2015-05-02 Thread Carter
Thanks for your reply! It is what I am after. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-add-a-column-to-a-spark-RDD-with-many-columns-tp22729p22740.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: not getting any mail

2015-05-02 Thread Ted Yu
Looks like there were delays across Apache project mailing lists. Emails are coming through now. On May 2, 2015, at 9:14 AM, Jeetendra Gangele gangele...@gmail.com wrote: Hi All I am not getting any mail from this community?

RE: spark filestream problem

2015-05-02 Thread Evo Eftimov
I have figured it out in the meantime - simply when moving file on HDFS it preserves its time stamp and on the other hand the spark filestream adapter seems to care as much about filenames as timestamps - hence NEW files with OLD time stamps will NOT be processed - yuk The hack you can use is to

RE: spark filestream problem

2015-05-02 Thread Evo Eftimov
I have figured it out in the meantime - simply when moving file on HDFS it preserves its time stamp and on the other hand the spark filestream adapter seems to care as much about filenames as timestamps - hence NEW files with OLD time stamps will NOT be processed - yuk The hack you can use is to

Submit Kill Spark Application program programmatically from another application

2015-05-02 Thread Yijie Shen
Hi, I am wondering if it is possible to submit, monitor  kill spark applications from another service. I have wrote a service this: parse user commands translate them into understandable arguments to an already prepared Spark-SQL application submit the application along with arguments to

Re: Can I group elements in RDD into different groups and let each group share some elements?

2015-05-02 Thread Olivier Girardot
Did you look at the cogroup transformation or the cartesian transformation ? Regards, Olivier. Le sam. 2 mai 2015 à 22:01, Franz Chien franzj...@gmail.com a écrit : Hi all, Can I group elements in RDD into different groups and let each group share elements? For example, I have 10,000

Re: real time Query engine Spark-SQL on Hbase

2015-05-02 Thread Ted Yu
In the upcoming 1.4.0 release, SPARK-3468 should give you better clue. Cheers On Fri, May 1, 2015 at 12:30 PM, Siddharth Ubale siddharth.ub...@syncoms.com wrote: Hi, Thanks for the reply. Hbase cli takes less than 500 ms for the same query. I am running a simple query i.t Select *

Re: ClassNotFoundException for Kryo serialization

2015-05-02 Thread Akshat Aranya
Now I am running up against some other problem while trying to schedule tasks: 15/05/01 22:32:03 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0) java.lang.IllegalStateException: unread block data at

Re: to split an RDD to multiple ones?

2015-05-02 Thread Olivier Girardot
I guess : val srdd_s1 = srdd.filter(_.startsWith(s1_)).sortBy(_) val srdd_s2 = srdd.filter(_.startsWith(s2_)).sortBy(_) val srdd_s3 = srdd.filter(_.startsWith(s3_)).sortBy(_) Regards, Olivier. Le sam. 2 mai 2015 à 22:53, Yifan LI iamyifa...@gmail.com a écrit : Hi, I have an RDD *srdd*

Re: Drop a column from the DataFrame.

2015-05-02 Thread Olivier Girardot
Sounds like a patch for a drop method... Le sam. 2 mai 2015 à 21:03, dsgriffin dsgrif...@gmail.com a écrit : Just use select() to create a new DataFrame with only the columns you want. Sort of the opposite of what you want -- but you can select all but the columns you want minus the one you

Re: com.esotericsoftware.kryo.KryoException: java.lang.IndexOutOfBoundsException: Index:

2015-05-02 Thread Olivier Girardot
Can you post your code, otherwise there's not much we can do. Regards, Olivier. Le sam. 2 mai 2015 à 21:15, shahab shahab.mok...@gmail.com a écrit : Hi, I am using sprak-1.2.0 and I used Kryo serialization but I get the following excepton. java.io.IOException:

Re: Drop a column from the DataFrame.

2015-05-02 Thread Ted Yu
This is coming in 1.4.0 https://issues.apache.org/jira/browse/SPARK-7280 On May 2, 2015, at 2:27 PM, Olivier Girardot ssab...@gmail.com wrote: Sounds like a patch for a drop method... Le sam. 2 mai 2015 à 21:03, dsgriffin dsgrif...@gmail.com a écrit : Just use select() to create a new

Re: to split an RDD to multiple ones?

2015-05-02 Thread Yifan LI
Thanks, Olivier and Franz. :) Best, Yifan LI On 02 May 2015, at 23:23, Olivier Girardot ssab...@gmail.com wrote: I guess : val srdd_s1 = srdd.filter(_.startsWith(s1_)).sortBy(_) val srdd_s2 = srdd.filter(_.startsWith(s2_)).sortBy(_) val srdd_s3 =

Re: Spark - Timeout Issues - OutOfMemoryError

2015-05-02 Thread Akhil Das
You could try repartitioning your listings RDD, also doing a collectAsMap would basically bring all your data to driver, in that case you might want to set the storage level as Memory and disk not sure that will do any help on the driver though. Thanks Best Regards On Thu, Apr 30, 2015 at 11:10

Can I group elements in RDD into different groups and let each group share some elements?‏

2015-05-02 Thread Franz Chien
Hi all, Can I group elements in RDD into different groups and let each group share elements? For example, I have 10,000 elements in RDD from e1 to e1, and I want to group and aggregate them by another mapping with size of 2000, ex: ( (e1,e42), (e1,e554), (e3, e554)…… (2000th group)) My first

Re: directory loader in windows

2015-05-02 Thread ayan guha
Thanks for answer. I am now trying to set HADOOP_HOME but the issues still persists. Also, I can see only windows-utils.exe in my HADDOP_HOME, but no WINUTILS.EXE. I do not have hadoop installed in my system, as I am not using HDFS, but I am using Spark 1.3.1 prebuilt with Hadoop 2.6. AM I