Spark 2.0.0 Error Caused by: java.lang.IllegalArgumentException: requirement failed: Block broadcast_21_piece0 is already present in the MemoryStore

2016-10-11 Thread sandesh deshmane
I am getting this error some times when I run pyspark with spark 2.0.0 App > at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) App > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) App > at

Re: how to avoid duplicate messages with spark streaming using checkpoint after restart in case of failure

2016-06-22 Thread sandesh deshmane
it can be >> replaced by one of the others. >> Exactly once, it needed requires in any system including spark more >> effort and usually the throughput is lower. A risk evaluation from a >> business point of view has to be done anyway... >> >> > On 22 Jun 2016, a

Re: how to avoid duplicate messages with spark streaming using checkpoint after restart in case of failure

2016-06-22 Thread sandesh deshmane
ers. > Exactly once, it needed requires in any system including spark more effort > and usually the throughput is lower. A risk evaluation from a business > point of view has to be done anyway... > > > On 22 Jun 2016, at 09:09, sandesh deshmane <sandesh.v...@gmail.com> > wrote:

Re: how to avoid duplicate messages with spark streaming using checkpoint after restart in case of failure

2016-06-22 Thread sandesh deshmane
processed, but at run time i need to do that lookup and for us , the number of messages is very high, so look up will ad up in processing time ? Thanks Sandesh Deshmane On Wed, Jun 22, 2016 at 2:36 PM, Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > Yes this is more of Kafka issue

Re: how to avoid duplicate messages with spark streaming using checkpoint after restart in case of failure

2016-06-22 Thread sandesh deshmane
lebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > > On 22 June

how to avoid duplicate messages with spark streaming using checkpoint after restart in case of failure

2016-06-22 Thread sandesh deshmane
Hi, I am writing spark streaming application which reads messages from Kafka. I am using checkpointing and write ahead logs ( WAL) to achieve fault tolerance . I have created batch size of 10 sec for reading messages from kafka. I read messages for kakfa and generate the count of messages as

Re: Error while using checkpointing . Spark streaming 1.5.2- DStream checkpointing has been enabled but the DStreams with their functions are not serialisable

2016-06-09 Thread sandesh deshmane
y capturing unexpected things in the closure of the > Function you have defined, because myFunction is defined outside. Try > defining the myFunction inside the Function and see if the problem persists. > > On Thu, Jun 9, 2016 at 3:57 AM, sandesh deshmane <sandesh.v...@gmail.com>

Error while using checkpointing . Spark streaming 1.5.2- DStream checkpointing has been enabled but the DStreams with their functions are not serialisable

2016-06-09 Thread sandesh deshmane
Hi, I am using spark streaming for streaming data from kafka 0.8 I am using checkpointing in HDFS . I am getting error like below java.io.NotSerializableException: DStream checkpointing has been enabled but the DStreams with their functions are not serialisable field (class:

Re: Error while deploying spark 1.6.1 on EC2

2016-03-14 Thread sandesh deshmane
> I am trying to install spark on EC2. > > I am getting below error. I had issues like RPC timeout and Fetchtimeout > for spark 1.6.0 so as per release notes was trying to get new cluster with > 1.6.1 > > Can you help? looks like spark 1.6.1 package is missing from s3. > > [timing] scala init: