you are creating streaming context each time
val streamingContext = new StreamingContext(sparkSession.sparkContext,
Seconds(config.getInt(Constants.Properties.BatchInterval)))
if you want fault-tolerance, to read from where it stopped between spark job
restarts, the correct way is to restore
Greetings all,
I’ve recently started hitting on the following error in Spark Streaming in
Kafka. Adjusting maxRetries and spark.streaming.kafka.consumer.poll.ms even to
five minutes doesn’t seem to be helping. The problem only manifested in the
last few days, restarting with a new consumer
Hi,
I have a simple Java program to read data from kafka using spark streaming.
When i run it from eclipse on my mac, it is connecting to the zookeeper,
bootstrap nodes,
But its not displaying any data. it does not give any error.
it just shows
18/01/16 20:49:15 INFO Executor: Finished task
sometimes I get this messages in logs but the job still runs. do you have
solution on how to fix this? I have added the code in my earlier email.
Exception in thread "pool-22-thread-9" java.lang.NullPointerException
at org.apache.spark.streaming.CheckpointWriter$CheckpointWriteHandler.run(
It could be a missing persist before the checkpoint
> On 16. Jan 2018, at 22:04, KhajaAsmath Mohammed
> wrote:
>
> Hi,
>
> Spark streaming job from kafka is not picking the messages and is always
> taking the latest offsets when streaming job is stopped for 2 hours.
Hi,
Spark streaming job from kafka is not picking the messages and is always
taking the latest offsets when streaming job is stopped for 2 hours. It is
not picking up the offsets that are required to be processed from
checkpoint directory. any suggestions on how to process the old messages
too
Hi,
I keep getting null pointer exception in the spark streaming job with
checkpointing. any suggestions to resolve this.
Exception in thread "pool-22-thread-9" java.lang.NullPointerException
at
org.apache.spark.streaming.CheckpointWriter$CheckpointWriteHandler.run(Checkpoint.scala:233)
at
Hello,
Upon joining this mailing list I did not receive it seems a
link to configure options. Is there a way to set for a daily
archive of correspondences like mailman.
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
unsubscribe
If I try to use LogisticRegression with only positive training it always gives
me positive results:
Positive Only private def positiveOnly(): Unit = {val
training = spark.createDataFrame(Seq( (1.0, Vectors.dense(0.0, 1.1, 0.1)),
(1.0, Vectors.dense(0.0, 1.0,
Hi Hari, I'm not sure I understand. I apologize, I'm still pretty new to
Spark and Spark ML. Can you point me to some example code or documentation that
would more fully represent this?
Thanks
On Tue, Jan 16, 2018 2:54 AM, hosur narahari hnr1...@gmail.com wrote:
You can make use of
Hi,
as far as i know, this is not a typical behavior for spark,
it might be relates to the implementation of the Kinetica spark connector
you can try to write the DF to a csv instead using
*df.write.csv("")*
and see how the spark job behave
Eyal
On Tue, Jan 16, 2018 at 2:19 PM, Onur EKİNCİ
unsubscribe
unsubscribe
Correction. We found out that Spark extracts data from mssql database column by
column. Spark divides data by column. Then it executes 10 jobs to pull data
from mssql database.
Is there a way that we can run those jobs in parallel or increse/decrease the
number of jobs? According to what
Hi Eyal,
Thank you for your help. The following commands worked in terms of running
multiple executors simultaneously. However, Spark repeats the 10 same jobs
consecutively. It had been doing it before as well. The jobs are extracting
data from Mssql. Why would it run the same job 10 times?
2 points to consider:
1. Check sql server/simba max connection number
2. Allocate 3-5 cores for each executor and allocate more executors.
Sent from my iPhone
> On Jan 16, 2018, at 04:01, Onur EKİNCİ wrote:
>
> Sorry it is not oracle. It is Mssql.
>
> Do you have any
Hi Jacek,
Thank you for the workaround.
It is really working in this way:
pos.as("p1").join(pos.as("p2")).filter($"p1.POSITION_ID0"===$"p2.POSITION_ID")
I have checked, that in this way I get the same execution plan as for
the join with renamed columns.
Best,
Michael
On Mon, Jan 15, 2018 at
hi,
I'm not familiar with the Kinetica spark driver, but it seems that your job
has a single task which might indicate that you have a single partition in
the df
i would suggest to try to create your df with more partitions, this can be
done by adding the following options when reading the
Sorry it is not oracle. It is Mssql.
Do you have any opinion for the solution. I really appreciate
Onur EKİNCİ
Bilgi Yönetimi Yöneticisi
Knowledge Management Manager
m:+90 553 044 2341 d:+90 212 329 7000
İTÜ Ayazağa Kampüsü, Teknokent ARI4 Binası 34469 Maslak İstanbul - Google
Curious you are using"jdbc:sqlserve" to connect oracle, why?
Also kindly reminder scrubbing your user id password.
Sent from my iPhone
> On Jan 16, 2018, at 03:00, Onur EKİNCİ wrote:
>
> Hi,
>
> We are trying to get data from an Oracle database into Kinetica database
You can make use of probability vector from spark classification.
When you run spark classification model for prediction, along with
classifying into its class spark also gives probability vector(what's the
probability that this could belong to each individual class) . So just take
the probability
23 matches
Mail list logo