Re: Spark Streaming not reading missed data

2018-01-16 Thread vijay.bvp
you are creating streaming context each time val streamingContext = new StreamingContext(sparkSession.sparkContext, Seconds(config.getInt(Constants.Properties.BatchInterval))) if you want fault-tolerance, to read from where it stopped between spark job restarts, the correct way is to restore

unsubscribe

2018-01-16 Thread Anu B Nair

"Got wrong record after seeking to offset" issue

2018-01-16 Thread Justin Miller
Greetings all, I’ve recently started hitting on the following error in Spark Streaming in Kafka. Adjusting maxRetries and spark.streaming.kafka.consumer.poll.ms even to five minutes doesn’t seem to be helping. The problem only manifested in the last few days, restarting with a new consumer

spark streaming kafka not displaying data in local eclipse

2018-01-16 Thread vr spark
Hi, I have a simple Java program to read data from kafka using spark streaming. When i run it from eclipse on my mac, it is connecting to the zookeeper, bootstrap nodes, But its not displaying any data. it does not give any error. it just shows 18/01/16 20:49:15 INFO Executor: Finished task

Re: Spark Streaming not reading missed data

2018-01-16 Thread KhajaAsmath Mohammed
sometimes I get this messages in logs but the job still runs. do you have solution on how to fix this? I have added the code in my earlier email. Exception in thread "pool-22-thread-9" java.lang.NullPointerException at org.apache.spark.streaming.CheckpointWriter$CheckpointWriteHandler.run(

Re: Spark Streaming not reading missed data

2018-01-16 Thread Jörn Franke
It could be a missing persist before the checkpoint > On 16. Jan 2018, at 22:04, KhajaAsmath Mohammed > wrote: > > Hi, > > Spark streaming job from kafka is not picking the messages and is always > taking the latest offsets when streaming job is stopped for 2 hours.

Spark Streaming not reading missed data

2018-01-16 Thread KhajaAsmath Mohammed
Hi, Spark streaming job from kafka is not picking the messages and is always taking the latest offsets when streaming job is stopped for 2 hours. It is not picking up the offsets that are required to be processed from checkpoint directory. any suggestions on how to process the old messages too

Null pointer exception in checkpoint directory

2018-01-16 Thread KhajaAsmath Mohammed
Hi, I keep getting null pointer exception in the spark streaming job with checkpointing. any suggestions to resolve this. Exception in thread "pool-22-thread-9" java.lang.NullPointerException at org.apache.spark.streaming.CheckpointWriter$CheckpointWriteHandler.run(Checkpoint.scala:233) at

Mail List Daily Archive

2018-01-16 Thread dmp
Hello, Upon joining this mailing list I did not receive it seems a link to configure options. Is there a way to set for a daily archive of correspondences like mailman. - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

unsubscribe

2018-01-16 Thread karthik
unsubscribe

Re: [Spark ML] Positive-Only Training Classification in Scala

2018-01-16 Thread Matt Hicks
If I try to use LogisticRegression with only positive training it always gives me positive results: Positive Only private def positiveOnly(): Unit = {val training = spark.createDataFrame(Seq( (1.0, Vectors.dense(0.0, 1.1, 0.1)), (1.0, Vectors.dense(0.0, 1.0,

Re: [Spark ML] Positive-Only Training Classification in Scala

2018-01-16 Thread Matt Hicks
Hi Hari, I'm not sure I understand.  I apologize, I'm still pretty new to Spark and Spark ML.  Can you point me to some example code or documentation that would more fully represent this? Thanks On Tue, Jan 16, 2018 2:54 AM, hosur narahari hnr1...@gmail.com wrote: You can make use of

Re: Run jobs in parallel in standalone mode

2018-01-16 Thread Eyal Zituny
Hi, as far as i know, this is not a typical behavior for spark, it might be relates to the implementation of the Kinetica spark connector you can try to write the DF to a csv instead using *df.write.csv("")* and see how the spark job behave Eyal On Tue, Jan 16, 2018 at 2:19 PM, Onur EKİNCİ

unsubscribe

2018-01-16 Thread Jose Pedro de Santana Neto
unsubscribe

unsubscribe

2018-01-16 Thread Muhammad Yaseen Aftab
unsubscribe

RE: Run jobs in parallel in standalone mode

2018-01-16 Thread Onur EKİNCİ
Correction. We found out that Spark extracts data from mssql database column by column. Spark divides data by column. Then it executes 10 jobs to pull data from mssql database. Is there a way that we can run those jobs in parallel or increse/decrease the number of jobs? According to what

RE: Run jobs in parallel in standalone mode

2018-01-16 Thread Onur EKİNCİ
Hi Eyal, Thank you for your help. The following commands worked in terms of running multiple executors simultaneously. However, Spark repeats the 10 same jobs consecutively. It had been doing it before as well. The jobs are extracting data from Mssql. Why would it run the same job 10 times?

Re: Run jobs in parallel in standalone mode

2018-01-16 Thread Richard Qiao
2 points to consider: 1. Check sql server/simba max connection number 2. Allocate 3-5 cores for each executor and allocate more executors. Sent from my iPhone > On Jan 16, 2018, at 04:01, Onur EKİNCİ wrote: > > Sorry it is not oracle. It is Mssql. > > Do you have any

Re: Inner join with the table itself

2018-01-16 Thread Michael Shtelma
Hi Jacek, Thank you for the workaround. It is really working in this way: pos.as("p1").join(pos.as("p2")).filter($"p1.POSITION_ID0"===$"p2.POSITION_ID") I have checked, that in this way I get the same execution plan as for the join with renamed columns. Best, Michael On Mon, Jan 15, 2018 at

Re: Run jobs in parallel in standalone mode

2018-01-16 Thread Eyal Zituny
hi, I'm not familiar with the Kinetica spark driver, but it seems that your job has a single task which might indicate that you have a single partition in the df i would suggest to try to create your df with more partitions, this can be done by adding the following options when reading the

RE: Run jobs in parallel in standalone mode

2018-01-16 Thread Onur EKİNCİ
Sorry it is not oracle. It is Mssql. Do you have any opinion for the solution. I really appreciate Onur EKİNCİ Bilgi Yönetimi Yöneticisi Knowledge Management Manager m:+90 553 044 2341 d:+90 212 329 7000 İTÜ Ayazağa Kampüsü, Teknokent ARI4 Binası 34469 Maslak İstanbul - Google

Re: Run jobs in parallel in standalone mode

2018-01-16 Thread Richard Qiao
Curious you are using"jdbc:sqlserve" to connect oracle, why? Also kindly reminder scrubbing your user id password. Sent from my iPhone > On Jan 16, 2018, at 03:00, Onur EKİNCİ wrote: > > Hi, > > We are trying to get data from an Oracle database into Kinetica database

Re: [Spark ML] Positive-Only Training Classification in Scala

2018-01-16 Thread hosur narahari
You can make use of probability vector from spark classification. When you run spark classification model for prediction, along with classifying into its class spark also gives probability vector(what's the probability that this could belong to each individual class) . So just take the probability