date:20150502

RE: Exiting driver main() method...

2015-05-02 Thread Mohammed Guller

No, you don’t need to do anything special. Perhaps, your application is getting stuck somewhere? If you can share your code, someone may be able to help. Mohammed From: James Carman [mailto:ja...@carmanconsulting.com] Sent: Friday, May 1, 2015 5:53 AM To: user@spark.apache.org Subject: Exiting

Re: real time Query engine Spark-SQL on Hbase

2015-05-02 Thread Siddharth Ubale

Hi, Thanks for the reply. Hbase cli takes less than 500 ms for the same query. I am running a simple query i.t Select * from Customers where c_id='123123'. Why would the same query which takes 500 ms at Hbase cli end up taking around 8 secs via Spark-Sql? I am unable t understand this.

com.esotericsoftware.kryo.KryoException: java.lang.IndexOutOfBoundsException: Index:

2015-05-02 Thread shahab

Hi, I am using sprak-1.2.0 and I used Kryo serialization but I get the following excepton. java.io.IOException: com.esotericsoftware.kryo.KryoException: java.lang.IndexOutOfBoundsException: Index: 3448, Size: 1 I do apprecciate if anyone could tell me how I can resolve this? best, /Shahab

not getting any mail

2015-05-02 Thread Jeetendra Gangele

Hi All I am not getting any mail from this community?

Remoting warning when submitting to cluster

2015-05-02 Thread javidelgadillo

Hello all!! We've been prototyping some spark applications to read messages from Kafka topics. The application is quite simple, we use KafkaUtils.createStream to receive a stream of CSV messages from a Kafka Topic. We parse the CSV and count the number of messages we get in each RDD. At a

Re: Spark - Hive Metastore MySQL driver

2015-05-02 Thread Ted Yu

Can you try the patch from: [SPARK-6913][SQL] Fixed java.sql.SQLException: No suitable driver found Cheers On Sat, Mar 28, 2015 at 12:41 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com wrote: This is from my Hive installation -sh-4.1$ ls /apache/hive/lib | grep derby derby-10.10.1.1.jar

Re: How to add a column to a spark RDD with many columns?

2015-05-02 Thread dsgriffin

val newRdd = myRdd.map(row = row ++ Array((row(1).toLong * row(199).toLong).toString)) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-add-a-column-to-a-spark-RDD-with-many-columns-tp22729p22735.html Sent from the Apache Spark User List mailing list

Re: Exiting driver main() method...

2015-05-02 Thread Akhil Das

It used to exit without any problem for me. You can basically check in the driver UI (that runs on 4040) and see what exactly its doing. Thanks Best Regards On Fri, May 1, 2015 at 6:22 PM, James Carman ja...@carmanconsulting.com wrote: In all the examples, it seems that the spark application

empty jdbc RDD in spark

2015-05-02 Thread Hafiz Mujadid

Hi all! I am trying to read hana database using spark jdbc RDD here is my code def readFromHana() { val conf = new SparkConf() conf.setAppName(test).setMaster(local) val sc = new SparkContext(conf) val rdd = new JdbcRDD(sc, () = {

Re: Drop a column from the DataFrame.

2015-05-02 Thread dsgriffin

Just use select() to create a new DataFrame with only the columns you want. Sort of the opposite of what you want -- but you can select all but the columns you want minus the one you don. You could even use a filter to remove just the one column you want on the fly:

Re: spark.logConf with log4j.rootCategory=WARN

2015-05-02 Thread Akhil Das

It could be. Thanks Best Regards On Fri, May 1, 2015 at 9:11 PM, roy rp...@njit.edu wrote: Hi, I have recently enable log4j.rootCategory=WARN, console in spark configuration. but after that spark.logConf=True has becomes ineffective. So just want to confirm if this is because

Re: Spark Streaming Kafka Avro NPE on deserialization of payload

2015-05-02 Thread Akhil Das

There was a similar discussion over here http://mail-archives.us.apache.org/mod_mbox/spark-user/201411.mbox/%3ccakz4c0s_cuo90q2jxudvx9wc4fwu033kx3-fjujytxxhr7p...@mail.gmail.com%3E Thanks Best Regards On Fri, May 1, 2015 at 7:12 PM, Todd Nist tsind...@gmail.com wrote: *Resending as I do not

Re: Spark worker error on standalone cluster

2015-05-02 Thread Michael Ryabtsev

Thanks Akhil, I am trying to investigate this path. The spark is the same, but may be there is a difference in Hadoop. On Sat, May 2, 2015 at 6:25 PM, Akhil Das ak...@sigmoidanalytics.com wrote: Just make sure your are having the same version of spark in your cluster and the project's build

spark filestrea problem

2015-05-02 Thread Evo Eftimov

it seems that on Spark Streaming 1.2 the filestream API may have a bug - it doesn't detect new files when moving or renaming them on HDFS - only when copying them but that leads to a well known problem with .tmp files which get removed and make spark steraming filestream throw exception

sparkR equivalent to SparkContext.newAPIHadoopRDD?

2015-05-02 Thread David Holiday

Hi gang, I'm giving sparkR a test drive and am bummed to discover that the SparkContext API in sparkR is only a subset of what's available in stock spark. Specifically, I need to be able to pull data from accumulo into sparkR. I can do it with stock spark but can't figure out how to make the

Problem in Standalone Mode

2015-05-02 Thread drarse

When I run my program with Spark-Submit everythink are ok. But when I try run in satandalone mode I obtain the nex Exceptions: ((This is with val df = sqlContext.jsonFile(./datos.json) )) java.io.EOFException [error] at

Re: Help with publishing to Kafka from Spark Streaming?

2015-05-02 Thread Saisai Shao

Here is the pull request, you may refer to this: https://github.com/apache/spark/pull/2994 Thanks Jerry 2015-05-01 14:38 GMT+08:00 Pavan Sudheendra pavan0...@gmail.com: Link to the question: http://stackoverflow.com/questions/29974017/spark-kafka-producer-not-serializable-exception

Re: how to pass configuration properties from driver to executor?

2015-05-02 Thread Akhil Das

Infact, sparkConf.set(spark.whateverPropertyYouWant,Value) gets shipped to the executors. Thanks Best Regards On Fri, May 1, 2015 at 2:55 PM, Michael Ryabtsev mich...@totango.com wrote: Hi, We've had a similar problem, but with log4j properties file. The only working way we've found, was

[PSA] Use Stack Overflow!

2015-05-02 Thread Nick Chammas

This mailing list sees a lot of traffic every day. With such a volume of mail, you may find it hard to find discussions you are interested in, and if you are the one starting discussions you may sometimes feel your mail is going into a black hole. We can't change the nature of this mailing list

Generating version agnostic jar path value for --jars clause

2015-05-02 Thread nitinkak001

I have a list of cloudera jars which I need to provide in --jars clause, mainly for the HiveContext functionality I am using. However, many of these jars have version number as part of their names. This leads to an issue that the names might change when I do a Cloudera upgrade. Just a note here,

Re: Spark worker error on standalone cluster

2015-05-02 Thread Akhil Das

Just make sure your are having the same version of spark in your cluster and the project's build file. Thanks Best Regards On Fri, May 1, 2015 at 2:43 PM, Michael Ryabtsev (Totango) mich...@totango.com wrote: Hi everyone, I have a spark application that works fine on a standalone Spark

spark filestream problem

2015-05-02 Thread Evo Eftimov

it seems that on Spark Streaming 1.2 the filestream API may have a bug - it doesn't detect new files when moving or renaming them on HDFS - only when copying them but that leads to a well known problem with .tmp files which get removed and make spark steraming filestream throw exception -- View

Re: Enabling Event Log

2015-05-02 Thread Jeetendra Gangele

is it working now? On 1 May 2015 at 13:43, James King jakwebin...@gmail.com wrote: Oops! well spotted. Many thanks Shixiong. On Fri, May 1, 2015 at 1:25 AM, Shixiong Zhu zsxw...@gmail.com wrote: spark.history.fs.logDirectory is for the history server. For Spark applications, they should

to split an RDD to multiple ones?

2015-05-02 Thread Yifan LI

Hi, I have an RDD srdd containing (unordered-)data like this: s1_0, s3_0, s2_1, s2_2, s3_1, s1_3, s1_2, … What I want is (it will be much better if they could be in ascending order): srdd_s1: s1_0, s1_1, s1_2, …, s1_n srdd_s2: s2_0, s2_1, s2_2, …, s2_n srdd_s3: s3_0, s3_1, s3_2, …, s3_n … …

Re: DataFrame filter referencing error

2015-05-02 Thread Francesco Bigarella

First of all, thank you for your replies. I was previously doing this via normal jdbc connection and it worked without problems. Then I liked the idea that sparksql could take care of opening/closing the connection. I tried also with single quotes, since that was my first guess but didn't work.

spark filestream problem

2015-05-02 Thread Evo Eftimov

it seems that on Spark Streaming 1.2 the filestream API may have a bug - it doesn't detect new files when moving or renaming them on HDFS - only when copying them but that leads to a well known problem with .tmp files which get removed and make spark steraming filestream throw exception --

Re: empty jdbc RDD in spark

2015-05-02 Thread Ted Yu

bq. SELECT * FROM MEMBERS LIMIT ? OFFSET ?, Have you tried dropping limit and offset clause from the above query ? Cheers On Fri, May 1, 2015 at 1:56 PM, Hafiz Mujadid hafizmujadi...@gmail.com wrote: Hi all! I am trying to read hana database using spark jdbc RDD here is my code def

Re: Number of input partitions in SparkContext.sequenceFile

2015-05-02 Thread Archit Thakur

Hi, How did u check no of splits in ur file. Did i run ur mr job or calculated it.? The formula for split size is max(minSize, min(max size, block size)). Can u check if it satisfies ur case.? Thanks Regards, Archit Thakur. On Saturday, April 25, 2015, Wenlei Xie wenlei@gmail.com wrote:

Re: How to add a column to a spark RDD with many columns?

2015-05-02 Thread Carter

Thanks for your reply! It is what I am after. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-add-a-column-to-a-spark-RDD-with-many-columns-tp22729p22740.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: not getting any mail

2015-05-02 Thread Ted Yu

Looks like there were delays across Apache project mailing lists. Emails are coming through now. On May 2, 2015, at 9:14 AM, Jeetendra Gangele gangele...@gmail.com wrote: Hi All I am not getting any mail from this community?

RE: spark filestream problem

2015-05-02 Thread Evo Eftimov

I have figured it out in the meantime - simply when moving file on HDFS it preserves its time stamp and on the other hand the spark filestream adapter seems to care as much about filenames as timestamps - hence NEW files with OLD time stamps will NOT be processed - yuk The hack you can use is to

RE: spark filestream problem

2015-05-02 Thread Evo Eftimov

I have figured it out in the meantime - simply when moving file on HDFS it preserves its time stamp and on the other hand the spark filestream adapter seems to care as much about filenames as timestamps - hence NEW files with OLD time stamps will NOT be processed - yuk The hack you can use is to

Submit Kill Spark Application program programmatically from another application

2015-05-02 Thread Yijie Shen

Hi, I am wondering if it is possible to submit, monitor kill spark applications from another service. I have wrote a service this: parse user commands translate them into understandable arguments to an already prepared Spark-SQL application submit the application along with arguments to

Re: Can I group elements in RDD into different groups and let each group share some elements?

2015-05-02 Thread Olivier Girardot

Did you look at the cogroup transformation or the cartesian transformation ? Regards, Olivier. Le sam. 2 mai 2015 à 22:01, Franz Chien franzj...@gmail.com a écrit : Hi all, Can I group elements in RDD into different groups and let each group share elements? For example, I have 10,000

Re: real time Query engine Spark-SQL on Hbase

2015-05-02 Thread Ted Yu

In the upcoming 1.4.0 release, SPARK-3468 should give you better clue. Cheers On Fri, May 1, 2015 at 12:30 PM, Siddharth Ubale siddharth.ub...@syncoms.com wrote: Hi, Thanks for the reply. Hbase cli takes less than 500 ms for the same query. I am running a simple query i.t Select *

Re: ClassNotFoundException for Kryo serialization

2015-05-02 Thread Akshat Aranya

Now I am running up against some other problem while trying to schedule tasks: 15/05/01 22:32:03 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0) java.lang.IllegalStateException: unread block data at

Re: to split an RDD to multiple ones?

2015-05-02 Thread Olivier Girardot

I guess : val srdd_s1 = srdd.filter(_.startsWith(s1_)).sortBy(_) val srdd_s2 = srdd.filter(_.startsWith(s2_)).sortBy(_) val srdd_s3 = srdd.filter(_.startsWith(s3_)).sortBy(_) Regards, Olivier. Le sam. 2 mai 2015 à 22:53, Yifan LI iamyifa...@gmail.com a écrit : Hi, I have an RDD *srdd*

Re: Drop a column from the DataFrame.

2015-05-02 Thread Olivier Girardot

Sounds like a patch for a drop method... Le sam. 2 mai 2015 à 21:03, dsgriffin dsgrif...@gmail.com a écrit : Just use select() to create a new DataFrame with only the columns you want. Sort of the opposite of what you want -- but you can select all but the columns you want minus the one you

Re: com.esotericsoftware.kryo.KryoException: java.lang.IndexOutOfBoundsException: Index:

2015-05-02 Thread Olivier Girardot

Can you post your code, otherwise there's not much we can do. Regards, Olivier. Le sam. 2 mai 2015 à 21:15, shahab shahab.mok...@gmail.com a écrit : Hi, I am using sprak-1.2.0 and I used Kryo serialization but I get the following excepton. java.io.IOException:

Re: Drop a column from the DataFrame.

2015-05-02 Thread Ted Yu

This is coming in 1.4.0 https://issues.apache.org/jira/browse/SPARK-7280 On May 2, 2015, at 2:27 PM, Olivier Girardot ssab...@gmail.com wrote: Sounds like a patch for a drop method... Le sam. 2 mai 2015 à 21:03, dsgriffin dsgrif...@gmail.com a écrit : Just use select() to create a new

Re: to split an RDD to multiple ones?

2015-05-02 Thread Yifan LI

Thanks, Olivier and Franz. :) Best, Yifan LI On 02 May 2015, at 23:23, Olivier Girardot ssab...@gmail.com wrote: I guess : val srdd_s1 = srdd.filter(_.startsWith(s1_)).sortBy(_) val srdd_s2 = srdd.filter(_.startsWith(s2_)).sortBy(_) val srdd_s3 =

Re: Spark - Timeout Issues - OutOfMemoryError

2015-05-02 Thread Akhil Das

You could try repartitioning your listings RDD, also doing a collectAsMap would basically bring all your data to driver, in that case you might want to set the storage level as Memory and disk not sure that will do any help on the driver though. Thanks Best Regards On Thu, Apr 30, 2015 at 11:10

Can I group elements in RDD into different groups and let each group share some elements?‏

2015-05-02 Thread Franz Chien

Hi all, Can I group elements in RDD into different groups and let each group share elements? For example, I have 10,000 elements in RDD from e1 to e1, and I want to group and aggregate them by another mapping with size of 2000, ex: ( (e1,e42), (e1,e554), (e3, e554)…… (2000th group)) My first

Re: directory loader in windows

2015-05-02 Thread ayan guha

Thanks for answer. I am now trying to set HADOOP_HOME but the issues still persists. Also, I can see only windows-utils.exe in my HADDOP_HOME, but no WINUTILS.EXE. I do not have hadoop installed in my system, as I am not using HDFS, but I am using Spark 1.3.1 prebuilt with Hadoop 2.6. AM I

44 matches

Mail list logo