Hi,
I am using spark streaming for streaming data from kafka 0.8
I am using checkpointing in HDFS . I am getting error like below
java.io.NotSerializableException: DStream checkpointing has been enabled
but the DStreams with their functions are not serialisable
field (class:
y capturing unexpected things in the closure of the
> Function you have defined, because myFunction is defined outside. Try
> defining the myFunction inside the Function and see if the problem persists.
>
> On Thu, Jun 9, 2016 at 3:57 AM, sandesh deshmane <sandesh.v...@gmail.com>
Hi,
I am writing spark streaming application which reads messages from Kafka.
I am using checkpointing and write ahead logs ( WAL) to achieve fault
tolerance .
I have created batch size of 10 sec for reading messages from kafka.
I read messages for kakfa and generate the count of messages as
lebzadeh
>
>
>
> LinkedIn *
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 22 June
processed, but at run time i need to do
that lookup and for us , the number of messages is very high, so look up
will ad up in processing time ?
Thanks
Sandesh Deshmane
On Wed, Jun 22, 2016 at 2:36 PM, Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:
> Yes this is more of Kafka issue
ers.
> Exactly once, it needed requires in any system including spark more effort
> and usually the throughput is lower. A risk evaluation from a business
> point of view has to be done anyway...
>
> > On 22 Jun 2016, at 09:09, sandesh deshmane <sandesh.v...@gmail.com>
> wrote:
it can be
>> replaced by one of the others.
>> Exactly once, it needed requires in any system including spark more
>> effort and usually the throughput is lower. A risk evaluation from a
>> business point of view has to be done anyway...
>>
>> > On 22 Jun 2016, a
> I am trying to install spark on EC2.
>
> I am getting below error. I had issues like RPC timeout and Fetchtimeout
> for spark 1.6.0 so as per release notes was trying to get new cluster with
> 1.6.1
>
> Can you help? looks like spark 1.6.1 package is missing from s3.
>
> [timing] scala init:
I am getting this error some times when I run pyspark with spark 2.0.0
App > at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
App > at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
App > at