e above error.
Any advice on how I can go about debugging/ solving this?
--
Regards,
Neelesh S. Salian
://apache-spark-user-list.
>> 1001560.n3.nabble.com/checkpointing-without-streaming-tp4541p28691.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> -
>> To unsubscribe
decide whether to buy the book learning spark, spark for
>>> machine learning etc. or wait for a new edition covering the new concepts
>>> like dataframe and datasets. Anyone got any suggestions?
>>>
>>
>>
>
>
> --
> Best Regards,
> Ayan Guha
>
--
Regards,
Neelesh S. Salian
ld somebody help me by providing the steps /redirect me to
>> blog/documentation on how to run Spark job written in scala through Oozie.
>>
>> Would really appreciate the help.
>>
>>
>>
>> Thanks,
>> Divya
>>
>
>
>
> --
> Thanks
> Deepak
> www.bigdatabig.com
> www.keosha.net
>
>
>
--
Neelesh Srinivas Salian
Customer Operations Engineer
spark.streaming.concurrentJobs may help. Experimental according to TD from
an older thread here
http://stackoverflow.com/questions/23528006/how-jobs-are-assigned-to-executors-in-spark-streaming
On Sat, Feb 20, 2016 at 11:24 AM, Jorge Rodriguez
wrote:
>
> Is it possible to
I also created a JIRA for task failures
https://issues.apache.org/jira/browse/SPARK-12452
On Mon, Dec 21, 2015 at 9:54 AM, Neelesh <neele...@gmail.com> wrote:
> I am leaning towards something like that. Things get interesting when
> multiple different transformations and regro
else would have to speak to the possibility of getting task
> failures added to listener callbacks.
>
> On Sat, Dec 19, 2015 at 5:44 PM, Neelesh <neele...@gmail.com> wrote:
>
>> Hi,
>> I'm trying to build automatic Kafka watermark handling in my stream
just need to fix it. If you're having the second
> problem, use different spark jobs for different topics.
>
> On Sun, Dec 20, 2015 at 2:28 PM, Neelesh <neele...@gmail.com> wrote:
>
>> @Chris,
>> There is a 1-1 mapping b/w spark partitions & kafka partitions out
allel and still maintain the
> ordering guarantees offered by Kafka.
>
> if this is true, then I'd suggest @neelesh create more partitions within
> the Kafka Topic to improve parallelism - same as any distributed,
> partitioned data processing engine including spark.
>
> if thi
A related issue - When I put multiple topics in a single stream, the
processing delay is as bad as the slowest task in the number of tasks
created. Even though the topics are unrelated to each other, RDD at time
"t1" has to wait for the RDD at "t0" is fully executed, even if most
cores are
!
-neelesh
...@koeninger.org> wrote:
> Yes, the partition IDs are the same.
>
> As far as the failure / subclassing goes, you may want to keep an eye on
> https://issues.apache.org/jira/browse/SPARK-10320 , not sure if the
> suggestions in there will end up going anywhere.
>
> On Fri
ide the task execution code, in cases where the intermediate operations
do not change partitions, shuffle etc.
-neelesh
On Fri, Sep 25, 2015 at 11:14 AM, Cody Koeninger <c...@koeninger.org> wrote:
>
> http://spark.apache.org/docs/latest/streaming-kafka-integration.html#approach-2-dire
ry, or eventually kill the job.
>
> On Fri, Sep 25, 2015 at 1:55 PM, Neelesh <neele...@gmail.com> wrote:
>
>> Thanks Petr, Cody. This is a reasonable place to start for me. What I'm
>> trying to achieve
>>
>> stream.foreachRDD {rdd=>
>>rdd.foreachPa
As Cody says, to achieve true exactly once, the book keeping has to happen
in the sink data system, that too assuming its a transactional store.
Wherever possible, we try to make the application idempotent (upsert in
HBase, ignore-on-duplicate for MySQL etc), but there are still cases
(analytics,
on
taskCompletedEvent on the driver and even figure out that there was an
error, there is no way of mapping this task back to the partition and
retrieving offset range, topic & kafka partition # etc.
Any pointers appreciated!
Thanks!
-neelesh
Does it still hit the memory limit for the container?
An expensive transformation?
On Wed, Apr 22, 2015 at 8:45 AM, Ted Yu yuzhih...@gmail.com wrote:
In master branch, overhead is now 10%.
That would be 500 MB
FYI
On Apr 22, 2015, at 8:26 AM, nsalian neeleshssal...@gmail.com wrote:
Somewhat agree on subclassing and its issues. It looks like the alternative
in spark 1.3.0 to create a custom build. Is there an enhancement filed for
this? If not, I'll file one.
Thanks!
-neelesh
On Wed, Apr 1, 2015 at 12:46 PM, Tathagata Das t...@databricks.com wrote:
The challenge
it
run on workers?
Any help appreciated
thanks!
-neelesh
.
Thanks again!
On Wed, Apr 1, 2015 at 10:01 AM, Cody Koeninger c...@koeninger.org wrote:
https://github.com/koeninger/kafka-exactly-once/blob/master/blogpost.md
The kafka consumers run in the executors.
On Wed, Apr 1, 2015 at 11:18 AM, Neelesh neele...@gmail.com wrote:
With receivers
streams
work yet
Another thread - Kafka 0.8.2 supports non ZK offset management , which I
think is more scalable than bombarding ZK. I'm working on supporting the
new offset management strategy for Kafka with kafka-spark-consumer.
Thanks!
-neelesh
On Wed, Apr 1, 2015 at 9:49 AM, Dibyendu
is private so you cant subclass it without
building your own spark.
On Wed, Apr 1, 2015 at 1:09 PM, Neelesh neele...@gmail.com wrote:
Thanks Cody, that was really helpful. I have a much better understanding
now. One last question - Kafka topics are initialized once in the driver,
is there an easy
Hi,
My streaming app uses org.apache.httpcomponent:httpclient:4.3.6, but
spark uses 4.2.6 , and I believe thats what's causing the following error.
I've tried setting
spark.executor.userClassPathFirst spark.driver.userClassPathFirst to true
in the config, but that does not solve it either.
at 10:39 AM, Neelesh neele...@gmail.com wrote:
Thanks for the detailed response Cody. Our use case is to do some
external lookups (cached and all) for every event, match the event against
the looked up data, decide whether to write an entry in mysql and write it
in the order in which the events
of the technology you're using to read from kafka (spark,
storm, whatever), kafka only gives you ordering as to a particular
partition. So you're going to need to do some kind of downstream sorting
if you really care about a global order.
On Fri, Feb 20, 2015 at 1:43 AM, Neelesh neele
limiting right now is static and
cannot adapt to the state of the cluster
thnx
-neelesh
On Wed, Feb 18, 2015 at 4:13 PM, jay vyas jayunit100.apa...@gmail.com
wrote:
This is a *fantastic* question. The idea of how we identify individual
things in multiple DStreams is worth looking
. So you will get deterministic ordering, but only on a
per-partition basis.
On Thu, Feb 19, 2015 at 11:31 PM, Neelesh neele...@gmail.com wrote:
I had a chance to talk to TD today at the Strata+Hadoop Conf in San Jose.
We talked a bit about this after his presentation about this - the short
There does not seem to be a definitive answer on this. Every time I google
for message ordering,the only relevant thing that comes up is this -
http://samza.apache.org/learn/documentation/0.8/comparisons/spark-streaming.html
.
With a kafka receiver that pulls data from a single kafka partition
We're planning to use this as well (Dibyendu's
https://github.com/dibbhatt/kafka-spark-consumer ). Dibyendu, thanks for
the efforts. So far its working nicely. I think there is merit in make it
the default Kafka Receiver for spark streaming.
-neelesh
On Mon, Feb 2, 2015 at 5:25 PM, Dibyendu
29 matches
Mail list logo