Re: error in running StructuredStreaming-Kafka integration code (Spark 2.x & Kafka 10)

2017-07-10 Thread David Newberger
Karen, It looks like the Kafka version is incorrect. You mention Kafka 0.10 however the classpath references Kafka 0.9 Thanks, David On July 10, 2017 at 1:44:06 PM, karan alang (karan.al...@gmail.com) wrote: Hi All, I'm running Spark Streaming - Kafka integration using Spark 2.x & Kafka 10.

RE: HBase-Spark Module

2016-07-29 Thread David Newberger
Hi Ben, This seems more like a question for community.cloudera.com. However, it would be in hbase not spark I believe. https://repository.cloudera.com/artifactory/webapp/#/artifacts/browse/tree/General/cloudera-release-repo/org/apache/hbase/hbase-spark David Newberger -Original Message

RE: difference between dataframe and dataframwrite

2016-06-16 Thread David Newberger
DataFrame is a collection of data which is organized into named columns. DataFrame.write is an interface for saving the contents of a DataFrame to external storage. Hope this helps David Newberger From: pseudo oduesp [mailto:pseudo20...@gmail.com] Sent: Thursday, June 16, 2016 9:43 AM

RE: streaming example has error

2016-06-16 Thread David Newberger
Try adding wordCounts.print() before ssc.start() David Newberger From: Lee Ho Yeung [mailto:jobmatt...@gmail.com] Sent: Wednesday, June 15, 2016 9:16 PM To: David Newberger Cc: user@spark.apache.org Subject: Re: streaming example has error got another error StreamingContext: Error starting

RE: Limit pyspark.daemon threads

2016-06-15 Thread David Newberger
;, the maximum amount of CPU cores to request for the application from across the cluster (not from each machine). If not set, the default will bespark.deploy.defaultCores on Spark's standalone cluster manager, or infinite (all available cores) on Mesos.” David Newberger From: agateaaa [mailto:ag

RE: Handle empty kafka in Spark Streaming

2016-06-15 Thread David Newberger
Newberger -Original Message- From: Yogesh Vyas [mailto:informy...@gmail.com] Sent: Wednesday, June 15, 2016 8:30 AM To: David Newberger Subject: Re: Handle empty kafka in Spark Streaming I am looking for something which checks the JavaPairReceiverInputDStreambefore further going for any

RE: Handle empty kafka in Spark Streaming

2016-06-15 Thread David Newberger
If you're asking how to handle no messages in a batch window then I would add an isEmpty check like: dStream.foreachRDD(rdd => { if (!rdd.isEmpty()) ... } Or something like that. David Newberger -Original Message- From: Yogesh Vyas [mailto:informy...@gmail.com] Sent: Wednes

RE: streaming example has error

2016-06-15 Thread David Newberger
Have you tried to “set spark.driver.allowMultipleContexts = true”? David Newberger From: Lee Ho Yeung [mailto:jobmatt...@gmail.com] Sent: Tuesday, June 14, 2016 8:34 PM To: user@spark.apache.org Subject: streaming example has error when simulate streaming with nc -lk got error below

RE: Creating a Hive table through Spark and potential locking issue (a bug)

2016-06-08 Thread David Newberger
Could you be looking at 2 jobs trying to use the same file and one getting to it before the other and finally removing it? David Newberger From: Mich Talebzadeh [mailto:mich.talebza...@gmail.com] Sent: Wednesday, June 8, 2016 1:33 PM To: user; user @spark Subject: Creating a Hive table through

RE: Twitter streaming error : No lease on /user/hduser/checkpoint/temp (inode 806125): File does not exist.

2016-06-03 Thread David Newberger
Hi Mich, My gut says you are correct that each application should have its own checkpoint directory. Though honestly I’m a bit fuzzy on checkpointing still as I’ve not worked with it much yet. Cheers, David Newberger From: Mich Talebzadeh [mailto:mich.talebza...@gmail.com] Sent: Friday, June

RE: Twitter streaming error : No lease on /user/hduser/checkpoint/temp (inode 806125): File does not exist.

2016-06-03 Thread David Newberger
I was going to ask if you had 2 jobs running. If the checkpointing for both are setup to look at the same location I could see an error like this happening. Do both spark jobs have a reference to a checkpointing dir? David Newberger From: Mich Talebzadeh [mailto:mich.talebza...@gmail.com] Sent

RE: Spark Streaming - long garbage collection time

2016-06-03 Thread David Newberger
Have you tried UseG1GC in place of UseConcMarkSweepGC? This article really helped me with GC a few short weeks ago https://databricks.com/blog/2015/05/28/tuning-java-garbage-collection-for-spark-applications.html David Newberger -Original Message- From: Marco1982 [mailto:marco.plata

RE: About a problem running a spark job in a cdh-5.7.0 vmware image.

2016-06-03 Thread David Newberger
Spark, it is cloned and can no longer be modified by the user. Spark does not support modifying the configuration at runtime. “ David Newberger From: Alonso Isidoro Roman [mailto:alons...@gmail.com] Sent: Friday, June 3, 2016 10:37 AM To: David Newberger Cc: user@spark.apache.org Subject: Re: Abo

RE: [REPOST] Severe Spark Streaming performance degradation after upgrading to 1.6.1

2016-06-03 Thread David Newberger
What does your processing time look like. Is it consistently within that 20sec micro batch window? David Newberger From: Adrian Tanase [mailto:atan...@adobe.com] Sent: Friday, June 3, 2016 8:14 AM To: user@spark.apache.org Cc: Cosmin Ciobanu Subject: [REPOST] Severe Spark Streaming performance

RE: About a problem running a spark job in a cdh-5.7.0 vmware image.

2016-06-03 Thread David Newberger
Alonso, The CDH VM uses YARN and the default deploy mode is client. I’ve been able to use the CDH VM for many learning scenarios. http://www.cloudera.com/documentation/enterprise/latest.html http://www.cloudera.com/documentation/enterprise/latest/topics/spark.html David Newberger From

RE: About a problem when mapping a file located within a HDFS vmware cdh-5.7 image

2016-05-31 Thread David Newberger
Have you tried it without either of the setMaster lines? Also, CDH 5.7 uses spark 1.6.0 with some patches. I would recommend using the cloudera repo for spark files in build sbt. I’d also check other files in the build sbt to see if there are cdh specific versions. David Newberger From

RE: About a problem when mapping a file located within a HDFS vmware cdh-5.7 image

2016-05-31 Thread David Newberger
Is https://github.com/alonsoir/awesome-recommendation-engine/blob/master/build.sbt the build.sbt you are using? David Newberger QA Analyst WAND - The Future of Restaurant Technology (W) www.wandcorp.com<http://www.wandcorp.com/> (E) david.newber...@wandcorp.com<mailto:dav

RE: Can not set spark dynamic resource allocation

2016-05-20 Thread David Newberger
Hi All, The error you are seeing looks really similar to Spark-13514 to me. I could be wrong though https://issues.apache.org/jira/browse/SPARK-13514 Can you check yarn.nodemanager.local-dirs in your YARN configuration for "file://" Cheers! David Newberger -Original Message

RE: Spark replacing Hadoop

2016-04-14 Thread David Newberger
Can we assume your question is “Will Spark replace Hadoop MapReduce?” or do you literally mean replacing the whole of Hadoop? David From: Ashok Kumar [mailto:ashok34...@yahoo.com.INVALID] Sent: Thursday, April 14, 2016 2:13 PM To: User Subject: Spark replacing Hadoop Hi, I hear that some

RE: DStream how many RDD's are created by batch

2016-04-12 Thread David Newberger
Hi Natu, I believe you are correct one RDD would be created for each file. Cheers, David From: Natu Lauchande [mailto:nlaucha...@gmail.com] Sent: Tuesday, April 12, 2016 1:48 PM To: David Newberger Cc: user@spark.apache.org Subject: Re: DStream how many RDD's are created by batch Hi David

RE: DStream how many RDD's are created by batch

2016-04-12 Thread David Newberger
Hi, Time is usually the criteria if I’m understanding your question. An RDD is created for each batch interval. If your interval is 500ms then an RDD would be created every 500ms. If it’s 2 seconds then an RDD is created every 2 seconds. Cheers, David From: Natu Lauchande

Using Experminal Spark Features

2015-12-30 Thread David Newberger
this approach yet and if so what has you experience been with using it? If it helps we'd be looking to implement it using Scala. Secondly, in general what has people experience been with using experimental features in Spark? Cheers, David Newberger

RE: fishing for help!

2015-12-21 Thread David Newberger
Hi Eran, Based on the limited information the first things that come to my mind are Processor, RAM, and Disk speed. David Newberger QA Analyst WAND - The Future of Restaurant Technology (W) www.wandcorp.com<http://www.wandcorp.com/> (E) david.newber...@wandcorp.com<mailto:dav