Re: Some Serious Issue with Spark Streaming ? Blocks Getting Removed and Jobs have Failed..

2015-01-01 Thread zgm
I am also seeing this error in a YARN spark streaming (1.2.0) application Tim Smith wrote Similar issue (Spark 1.0.0). Streaming app runs for a few seconds before these errors start to pop all over the driver logs: 14/09/12 17:30:23 WARN TaskSetManager: Loss was due to java.lang.Exception

Re: Some Serious Issue with Spark Streaming ? Blocks Getting Removed and Jobs have Failed..

2015-01-01 Thread Sean Owen
I believe the message merely means that a block has been removed from memory because either it is not needed or because it is also persisted on disk and memory is low. It does not mean data is lost. What is the end problem you observe? This does not match the problem you link to in the mailing

Re: NoSuchMethodError: com.typesafe.config.Config.getDuration with akka-http/akka-stream

2015-01-01 Thread Akhil Das
Its a typesafe jar conflict, you will need to put the jar with getDuration method in the first position of your classpath. Thanks Best Regards On Wed, Dec 31, 2014 at 4:38 PM, Christophe Billiard christophe.billi...@gmail.com wrote: Hi all, I am currently trying to combine datastax's

Re: Trying to make spark-jobserver work with yarn

2015-01-01 Thread Akhil Das
Hi Fernando, Here's a https://github.com/sigmoidanalytics/Test simple log parser/analyser written in scala (you can run it without spark-shell/submit). https://github.com/sigmoidanalytics/Test Basically to run a spark job without spark-submit or shell you need a build file

Re: Reading nested JSON data with Spark SQL

2015-01-01 Thread Pankaj Narang
Hih I am having simiiar problem and tries your solution with spark 1.2 build withing hadoop I am saving object to parquet files where some fields are of type Array. When I fetch them as below I get java.lang.ClassCastException: [B cannot be cast to java.lang.CharSequence def

Re: Spark app performance

2015-01-01 Thread Akhil Das
Would be great if you can share the piece of code happening inside your mapPartition, I'm assuming you are creating/handling a lot of Complex objects and hence it slows down the performance. Here's a link http://spark.apache.org/docs/latest/tuning.html to performance tuning if you haven't seen it

Re: spark stream + cassandra (execution on event)

2015-01-01 Thread Akhil Das
One approach will be to create a event based streaming pipeline, like your spark streaming will listen on a socket or whatever for the event to happen and once it happens, it will hit your cassandra and does work. Thanks Best Regards On Wed, Dec 31, 2014 at 3:14 PM, Oleg Ruchovets

Re: FlatMapValues

2015-01-01 Thread Hitesh Khamesra
How about this..apply flatmap on per line. And in that function, parse each line and return all the colums as per your need. On Wed, Dec 31, 2014 at 10:16 AM, Sanjay Subramanian sanjaysubraman...@yahoo.com.invalid wrote: hey guys Some of u may care :-) but this is just give u a background

Re: spark ignoring all memory settings and defaulting to 512MB?

2015-01-01 Thread Kevin Burton
ok.. we need to get these centralized somewhere as the documentation for spark-env.sh sends people far far far off in the wrong direction. maybe remove all the directives in that script in favor of a link to a page that is more live and can be updated? Kevin On Thu, Jan 1, 2015 at 12:43 AM,

Re: spark ignoring all memory settings and defaulting to 512MB?

2015-01-01 Thread Sean Owen
You don't in general configure Spark with environment variables. They exist but largely for backwards compatibility. Use arguments like --executor-memory on spark-submit, which are explained in the docs and the help message. It is possible to directly set the system properties with -D too if you

Re: Reading nested JSON data with Spark SQL

2015-01-01 Thread Pankaj Narang
Also it looks like that when I store the String in parquet and try to fetch them using spark code I got classcast exception below how my array of strings are saved. each character ascii value is present in array of ints res25: Array[Seq[String]] r= Array(ArrayBuffer(Array(104, 116, 116, 112,

Only One Kafka receiver is running in spark irrespective of multiple DStreams

2015-01-01 Thread Tapas Swain
Hi All, I am consuming a 8 partition kafka topic through multiple Dstreams and Processing them in Spark. But irrespective of multiple InputDstreams the spark master UI is showing only one receiver. The following is the consumer part of spark code: int numStreams = 8;

Re: JdbcRDD and ClassTag issue

2015-01-01 Thread Sujee
Hi, I encountered the same issue and solved it. Please check my blog post http://www.sparkexpert.com/2015/01/02/load-database-data-into-spark-using-jdbcrdd-in-java/ Thank you -- View this message in context:

Re: Reading nested JSON data with Spark SQL

2015-01-01 Thread Pankaj Narang
oops sqlContext.setConf(spark.sql.parquet.binaryAsString, true) thois solved the issue important for everyone -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Reading-nested-JSON-data-with-Spark-SQL-tp19310p20936.html Sent from the Apache Spark User

Re: JdbcRDD

2015-01-01 Thread Sujee
Hi, I wrote a blog post about this. http://www.sparkexpert.com/2015/01/02/load-database-data-into-spark-using-jdbcrdd-in-java/ -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/JdbcRDD-tp19233p20939.html Sent from the Apache Spark User List mailing list

sparkContext.textFile does not honour the minPartitions argument

2015-01-01 Thread Aniket Bhatnagar
I am trying to read a file into a single partition but it seems like sparkContext.textFile ignores the passed minPartitions value. I know I can repartition the RDD but I was curious to know if this is expected or if this is a bug that needs to be further investigated?

Re: Trying to make spark-jobserver work with yarn

2015-01-01 Thread Fernando O.
Thanks Akhil, that will help a lot ! It turned out that spark-jobserver does not work in development mode but if you deploy a server it works (looks like the dependencies when running jobserver from sbt are not right) On Thu, Jan 1, 2015 at 5:22 AM, Akhil Das ak...@sigmoidanalytics.com

Re: FlatMapValues

2015-01-01 Thread Sanjay Subramanian
thanks let me try that out From: Hitesh Khamesra hiteshk...@gmail.com To: Sanjay Subramanian sanjaysubraman...@yahoo.com Cc: Kapil Malik kma...@adobe.com; Sean Owen so...@cloudera.com; user@spark.apache.org user@spark.apache.org Sent: Thursday, January 1, 2015 9:46 AM Subject: Re:

Re: Spark app performance

2015-01-01 Thread Raghavendra Pandey
I have seen that link. I am using RDD of Byte Array n Kryo serialization. Inside mapPartition when I measure time it is never more than 1 ms whereas total time took by application is like 30 min. Codebase has lot of dependencies. I m trying to come up with a simple version where I can reproduce

Re: Compile error from Spark 1.2.0

2015-01-01 Thread zigen
thank you, issue of ticket. 2015/01/02 15:45、Akhil Das ak...@sigmoidanalytics.com のメッセージ: Yep, Opened SPARK-5054 Thanks Best Regards On Tue, Dec 30, 2014 at 5:52 AM, Michael Armbrust mich...@databricks.com wrote: Yeah, this looks like a regression in the API due to the addition of

Re: Compile error from Spark 1.2.0

2015-01-01 Thread Akhil Das
Yep, Opened SPARK-5054 https://issues.apache.org/jira/browse/SPARK-5054 Thanks Best Regards On Tue, Dec 30, 2014 at 5:52 AM, Michael Armbrust mich...@databricks.com wrote: Yeah, this looks like a regression in the API due to the addition of arbitrary decimal support. Can you open a JIRA?

Re: DAG info

2015-01-01 Thread Josh Rosen
This log message is normal; in this case, this message is saying that the final stage needed to compute your job does not have any dependencies / parent stages and that there are no parent stages that need to be computed. On Thu, Jan 1, 2015 at 11:02 PM, shahid sha...@trialx.com wrote: hi guys

DAG info

2015-01-01 Thread shahid
hi guys i have just starting using spark, i am getting this as an info 15/01/02 11:54:17 INFO DAGScheduler: Parents of final stage: List() 15/01/02 11:54:17 INFO DAGScheduler: Missing parents: List() 15/01/02 11:54:17 INFO DAGScheduler: Submitting Stage 6 (PythonRDD[12] at RDD at

Re: sparkContext.textFile does not honour the minPartitions argument

2015-01-01 Thread Rishi Yadav
Hi Ankit, Optional number of partitions value is to increase number of partitions not reduce it from default value. On Thu, Jan 1, 2015 at 10:43 AM, Aniket Bhatnagar aniket.bhatna...@gmail.com wrote: I am trying to read a file into a single partition but it seems like sparkContext.textFile

Re: DecisionTree Algorithm used in Spark MLLib

2015-01-01 Thread Manish Amde
Hi Anoop, The Spark decision tree implementation supports: regression and multi class classification, continuous and categorical features, pruning and does not support missing features at present. You can probably think of it as distributed CART though personally I always find the acronyms