[Spark 3.0.0] Job fails with NPE - worked in Spark 2.4.4

2020-07-23 Thread Neelesh Salian
e above error. Any advice on how I can go about debugging/ solving this? -- Regards, Neelesh S. Salian

Re: checkpointing without streaming?

2017-05-18 Thread Neelesh Sambhajiche
://apache-spark-user-list. >> 1001560.n3.nabble.com/checkpointing-without-streaming-tp4541p28691.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> - >> To unsubscribe

Re: Spark books

2017-05-03 Thread Neelesh Salian
decide whether to buy the book learning spark, spark for >>> machine learning etc. or wait for a new edition covering the new concepts >>> like dataframe and datasets. Anyone got any suggestions? >>> >> >> > > > -- > Best Regards, > Ayan Guha > -- Regards, Neelesh S. Salian

Re: Steps to Run Spark Scala job from Oozie on EC2 Hadoop clsuter

2016-03-07 Thread Neelesh Salian
ld somebody help me by providing the steps /redirect me to >> blog/documentation on how to run Spark job written in scala through Oozie. >> >> Would really appreciate the help. >> >> >> >> Thanks, >> Divya >> > > > > -- > Thanks > Deepak > www.bigdatabig.com > www.keosha.net > > > -- Neelesh Srinivas Salian Customer Operations Engineer

Re: Spark Streaming: Is it possible to schedule multiple active batches?

2016-02-20 Thread Neelesh
spark.streaming.concurrentJobs may help. Experimental according to TD from an older thread here http://stackoverflow.com/questions/23528006/how-jobs-are-assigned-to-executors-in-spark-streaming On Sat, Feb 20, 2016 at 11:24 AM, Jorge Rodriguez wrote: > > Is it possible to

Re: TaskCompletionListener and Exceptions

2015-12-21 Thread Neelesh
I also created a JIRA for task failures https://issues.apache.org/jira/browse/SPARK-12452 On Mon, Dec 21, 2015 at 9:54 AM, Neelesh <neele...@gmail.com> wrote: > I am leaning towards something like that. Things get interesting when > multiple different transformations and regro

Re: TaskCompletionListener and Exceptions

2015-12-21 Thread Neelesh
else would have to speak to the possibility of getting task > failures added to listener callbacks. > > On Sat, Dec 19, 2015 at 5:44 PM, Neelesh <neele...@gmail.com> wrote: > >> Hi, >> I'm trying to build automatic Kafka watermark handling in my stream

Re: Kafka - streaming from multiple topics

2015-12-21 Thread Neelesh
just need to fix it. If you're having the second > problem, use different spark jobs for different topics. > > On Sun, Dec 20, 2015 at 2:28 PM, Neelesh <neele...@gmail.com> wrote: > >> @Chris, >> There is a 1-1 mapping b/w spark partitions & kafka partitions out

Re: Kafka - streaming from multiple topics

2015-12-20 Thread Neelesh
allel and still maintain the > ordering guarantees offered by Kafka. > > if this is true, then I'd suggest @neelesh create more partitions within > the Kafka Topic to improve parallelism - same as any distributed, > partitioned data processing engine including spark. > > if thi

Re: Kafka - streaming from multiple topics

2015-12-19 Thread Neelesh
A related issue - When I put multiple topics in a single stream, the processing delay is as bad as the slowest task in the number of tasks created. Even though the topics are unrelated to each other, RDD at time "t1" has to wait for the RDD at "t0" is fully executed, even if most cores are

TaskCompletionListener and Exceptions

2015-12-19 Thread Neelesh
! -neelesh

Re: Kafka & Spark Streaming

2015-09-25 Thread Neelesh
...@koeninger.org> wrote: > Yes, the partition IDs are the same. > > As far as the failure / subclassing goes, you may want to keep an eye on > https://issues.apache.org/jira/browse/SPARK-10320 , not sure if the > suggestions in there will end up going anywhere. > > On Fri

Re: Kafka & Spark Streaming

2015-09-25 Thread Neelesh
ide the task execution code, in cases where the intermediate operations do not change partitions, shuffle etc. -neelesh On Fri, Sep 25, 2015 at 11:14 AM, Cody Koeninger <c...@koeninger.org> wrote: > > http://spark.apache.org/docs/latest/streaming-kafka-integration.html#approach-2-dire

Re: Kafka & Spark Streaming

2015-09-25 Thread Neelesh
ry, or eventually kill the job. > > On Fri, Sep 25, 2015 at 1:55 PM, Neelesh <neele...@gmail.com> wrote: > >> Thanks Petr, Cody. This is a reasonable place to start for me. What I'm >> trying to achieve >> >> stream.foreachRDD {rdd=> >>rdd.foreachPa

Re: kafka direct streaming with checkpointing

2015-09-25 Thread Neelesh
As Cody says, to achieve true exactly once, the book keeping has to happen in the sink data system, that too assuming its a transactional store. Wherever possible, we try to make the application idempotent (upsert in HBase, ignore-on-duplicate for MySQL etc), but there are still cases (analytics,

Kafka & Spark Streaming

2015-09-25 Thread Neelesh
on taskCompletedEvent on the driver and even figure out that there was an error, there is no way of mapping this task back to the partition and retrieving offset range, topic & kafka partition # etc. Any pointers appreciated! Thanks! -neelesh

Re: Spark Performance on Yarn

2015-04-22 Thread Neelesh Salian
Does it still hit the memory limit for the container? An expensive transformation? On Wed, Apr 22, 2015 at 8:45 AM, Ted Yu yuzhih...@gmail.com wrote: In master branch, overhead is now 10%. That would be 500 MB FYI On Apr 22, 2015, at 8:26 AM, nsalian neeleshssal...@gmail.com wrote:

Re: Spark Streaming 1.3 Kafka Direct Streams

2015-04-06 Thread Neelesh
Somewhat agree on subclassing and its issues. It looks like the alternative in spark 1.3.0 to create a custom build. Is there an enhancement filed for this? If not, I'll file one. Thanks! -neelesh On Wed, Apr 1, 2015 at 12:46 PM, Tathagata Das t...@databricks.com wrote: The challenge

Spark Streaming 1.3 Kafka Direct Streams

2015-04-01 Thread Neelesh
it run on workers? Any help appreciated thanks! -neelesh

Re: Spark Streaming 1.3 Kafka Direct Streams

2015-04-01 Thread Neelesh
. Thanks again! On Wed, Apr 1, 2015 at 10:01 AM, Cody Koeninger c...@koeninger.org wrote: https://github.com/koeninger/kafka-exactly-once/blob/master/blogpost.md The kafka consumers run in the executors. On Wed, Apr 1, 2015 at 11:18 AM, Neelesh neele...@gmail.com wrote: With receivers

Re: Latest enhancement in Low Level Receiver based Kafka Consumer

2015-04-01 Thread Neelesh
streams work yet Another thread - Kafka 0.8.2 supports non ZK offset management , which I think is more scalable than bombarding ZK. I'm working on supporting the new offset management strategy for Kafka with kafka-spark-consumer. Thanks! -neelesh On Wed, Apr 1, 2015 at 9:49 AM, Dibyendu

Re: Spark Streaming 1.3 Kafka Direct Streams

2015-04-01 Thread Neelesh
is private so you cant subclass it without building your own spark. On Wed, Apr 1, 2015 at 1:09 PM, Neelesh neele...@gmail.com wrote: Thanks Cody, that was really helpful. I have a much better understanding now. One last question - Kafka topics are initialized once in the driver, is there an easy

Untangling dependency issues in spark streaming

2015-03-29 Thread Neelesh
Hi, My streaming app uses org.apache.httpcomponent:httpclient:4.3.6, but spark uses 4.2.6 , and I believe thats what's causing the following error. I've tried setting spark.executor.userClassPathFirst spark.driver.userClassPathFirst to true in the config, but that does not solve it either.

Re: Spark Streaming and message ordering

2015-02-20 Thread Neelesh
at 10:39 AM, Neelesh neele...@gmail.com wrote: Thanks for the detailed response Cody. Our use case is to do some external lookups (cached and all) for every event, match the event against the looked up data, decide whether to write an entry in mysql and write it in the order in which the events

Re: Spark Streaming and message ordering

2015-02-20 Thread Neelesh
of the technology you're using to read from kafka (spark, storm, whatever), kafka only gives you ordering as to a particular partition. So you're going to need to do some kind of downstream sorting if you really care about a global order. On Fri, Feb 20, 2015 at 1:43 AM, Neelesh neele

Re: Spark Streaming and message ordering

2015-02-19 Thread Neelesh
limiting right now is static and cannot adapt to the state of the cluster thnx -neelesh On Wed, Feb 18, 2015 at 4:13 PM, jay vyas jayunit100.apa...@gmail.com wrote: This is a *fantastic* question. The idea of how we identify individual things in multiple DStreams is worth looking

Re: Spark Streaming and message ordering

2015-02-19 Thread Neelesh
. So you will get deterministic ordering, but only on a per-partition basis. On Thu, Feb 19, 2015 at 11:31 PM, Neelesh neele...@gmail.com wrote: I had a chance to talk to TD today at the Strata+Hadoop Conf in San Jose. We talked a bit about this after his presentation about this - the short

Spark Streaming and message ordering

2015-02-18 Thread Neelesh
There does not seem to be a definitive answer on this. Every time I google for message ordering,the only relevant thing that comes up is this - http://samza.apache.org/learn/documentation/0.8/comparisons/spark-streaming.html . With a kafka receiver that pulls data from a single kafka partition

Re: Error when Spark streaming consumes from Kafka

2015-02-02 Thread Neelesh
We're planning to use this as well (Dibyendu's https://github.com/dibbhatt/kafka-spark-consumer ). Dibyendu, thanks for the efforts. So far its working nicely. I think there is merit in make it the default Kafka Receiver for spark streaming. -neelesh On Mon, Feb 2, 2015 at 5:25 PM, Dibyendu