Hi All,
I get the following error while running my spark streaming application, we
have a large application running multiple stateful (with mapWithState) and
stateless operations. It's getting difficult to isolate the error since
spark itself hangs and the only error we see is in the spark log
I need some clarification for Kafka consumers in Spark or otherwise. I have
the following Kafka Consumer. The consumer is reading from a topic, and I
have a mechanism which blocks the consumer from time to time.
The producer is a separate thread which is continuously sending data. I
want to
Sending out the message again.. Hopefully someone cal clarify :)
I would like some clarification on the execution model for spark streaming.
Broadly, I am trying to understand if output operations in a DAG are only
processed after all intermediate operations are finished for all parts of
the
ill kafka output 1 and kafka output 2 wait for all
operations to finish on B and C before sending an output, or will they
simply send an output as soon as the ops in B and C are done.
What kind of synchronization guarantees are there?
On Sun, May 28, 2017 at 9:59 AM, Nipun Arora <nipunarora
up vote
0
down vote
favorite
I would like some clarification on the execution model for spark streaming.
Broadly, I am trying to understand if output operations in a DAG are only
processed after all intermediate operations are finished for all parts of
the DAG.
Let me give an example:
I have a
Hi,
I would like some clarification on the execution model for spark streaming.
Broadly, I am trying to understand if output operations in a DAG are only
processed after all intermediate operations are finished for all parts of
the DAG.
Let me give an example:
I have a dstream -A , I do map
Hi,
I wanted to know how to get the the input and output streams from
SparkAppHandle?
I start application like the following:
SparkAppHandle sparkAppHandle = sparkLauncher.startApplication();
I have used the following previously to capture inputstream from error and
output streams, but I would
Hi,
I am trying to get a simple spark application to run programatically. I
looked at
http://spark.apache.org/docs/2.1.0/api/java/index.html?org/apache/spark/launcher/package-summary.html,
at the following code.
public class MyLauncher {
public static void main(String[] args) throws
gt; Hi Nipun
> >
> > Have you checked out the job servwr
> >
> > https://github.com/spark-jobserver/spark-jobserver
> >
> > Regards
> > Sam
> > On Fri, 12 May 2017 at 21:00, Nipun Arora <nipunarora2...@gmail.com>
> wrote:
> >>
> &
Hi,
We have written a java spark application (primarily uses spark sql). We
want to expand this to provide our application "as a service". For this, we
are trying to write a REST API. While a simple REST API can be easily made,
and I can get Spark to run through the launcher. I wonder, how the
Hi All,
To support our Spark Streaming based anomaly detection tool, we have made a
patch in Spark 1.6.2 to dynamically update broadcast variables.
I'll first explain our use-case, which I believe should be common to
several people using Spark Streaming applications. Broadcast variables are
ient objects are not persistent. How can we go
around this problem?
Thanks,
Nipun
On Tue, Feb 7, 2017 at 6:35 PM, Nipun Arora <nipunarora2...@gmail.com>
wrote:
> Ryan,
>
> Apologies for coming back so late, I created a github repo to resolve
> this problem. On trying you
Ryan,
Apologies for coming back so late, I created a github repo to resolve
this problem. On trying your solution for making the pool a Singleton,
I get a null pointer exception in the worker.
Do you have any other suggestions, or a simpler mechanism for handling this?
I have put all the current
Just to be clear the pool object creation happens in the driver code, and
not in any anonymous function which should be executed in the executor.
On Tue, Jan 31, 2017 at 10:21 PM Nipun Arora <nipunarora2...@gmail.com>
wrote:
> Thanks for the suggestion Ryan, I will convert it to
, 2017 at 6:09 PM Shixiong(Ryan) Zhu <shixi...@databricks.com>
wrote:
> Looks like you create KafkaProducerPool in the driver. So when the task is
> running in the executor, it will always see an new empty KafkaProducerPool
> and create KafkaProducers. But nobody closes the
des. It's also possible that it leaks
> resources.
>
> On Tue, Jan 31, 2017 at 2:12 PM, Nipun Arora <nipunarora2...@gmail.com>
> wrote:
>
> It is spark 1.6
>
> Thanks
> Nipun
>
> On Tue, Jan 31, 2017 at 1:45 PM Shixiong(Ryan) Zhu <
> shixi...@databricks
It is spark 1.6
Thanks
Nipun
On Tue, Jan 31, 2017 at 1:45 PM Shixiong(Ryan) Zhu <shixi...@databricks.com>
wrote:
> Could you provide your Spark version please?
>
> On Tue, Jan 31, 2017 at 10:37 AM, Nipun Arora <nipunarora2...@gmail.com>
> wrote:
>
> Hi,
&g
Hi,
I get a resource leak, where the number of file descriptors in spark
streaming keeps increasing. We end up with a "too many file open" error
eventually through an exception caused in:
JAVARDDKafkaWriter, which is writing a spark JavaDStream
The exception is attached inline. Any help will be
help.
Thanks
Nipun
-- Forwarded message -----
From: Nipun Arora <nipunarora2...@gmail.com>
Date: Wed, Jan 18, 2017 at 5:25 PM
Subject: [SparkStreaming] SparkStreaming not allowing to do parallelize
within a transform operation to generate a new RDD
To: user <user@spark.a
Hi,
I am trying to transform an RDD in a dstream by adding changing the log
with the maximum timestamp, and adding a duplicate copy of it with some
modifications.
The following is the example code:
JavaDStream logMessageWithHB =
logMessageMatched.transform(new Function() {
Please note: I have asked the following question in stackoverflow as well
http://stackoverflow.com/questions/41729451/adding-to-spark-streaming-dstream-rdd-the-max-line-of-each-rdd
I am trying to add to each RDD in a JavaDStream the line with the maximum
timestamp, with some modification.
/blob/master/examples/src/main/java/org/apache/spark/examples/streaming/JavaKafkaWordCount.java
>
>
> On Wed, Feb 10, 2016 at 1:28 PM, Nipun Arora <nipunarora2...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I am trying some basic integration and wa
not getting any data, i.e. the file end has probably been reached
by printing the first two lines of every micro-batch.
Thanks
Nipun
On Mon, Feb 8, 2016 at 10:05 PM Nipun Arora <nipunarora2...@gmail.com>
wrote:
> I have a spark-streaming service, where I am processing and detecting
&g
Hi,
I am trying some basic integration and was going through the manual.
I would like to read from a topic, and get a JavaReceiverInputDStream
for messages in that topic. However the example is of
JavaPairReceiverInputDStream<>. How do I get a stream for only a single
topic in Java?
Reference
I have a spark-streaming service, where I am processing and detecting
anomalies on the basis of some offline generated model. I feed data into
this service from a log file, which is streamed using the following command
tail -f | nc -lk
Here the spark streaming service is taking data from
ion basis, not global ordering.
>
> You typically want to acquire resources inside the foreachpartition
> closure, just before handling the iterator.
>
>
> http://spark.apache.org/docs/latest/streaming-programming-guide.html#design-patterns-for-using-foreachrdd
>
> On Mon, Nov
Hi,
I wanted to understand forEachPartition logic. In the code below, I am
assuming the iterator is executing in a distributed fashion.
1. Assuming I have a stream which has timestamp data which is sorted. Will
the stringiterator in foreachPartition process each line in order?
2. Assuming I have
Hi,
I am sending data to an elasticsearch deployment. The printing to file
seems to work fine, but I keep getting no-node found for ES when I send
data to it. I suspect there is some special way to handle the connection
object? Can anyone explain what should be changed here?
Thanks
Nipun
The
I wanted to understand something about the internals of spark streaming
executions.
If I have a stream X, and in my program I send stream X to function A and
function B:
1. In function A, I do a few transform/filter operations etc. on X->Y->Z to
create stream Z. Now I do a forEach Operation on Z
the
> list!
>
> Regards,
>
> Lars Albertsson
>
>
>
> On Thu, Oct 22, 2015 at 10:48 PM, Nipun Arora <nipunarora2...@gmail.com>
> wrote:
> > Hi,
> > In general in spark stream one can do transformations ( filter, map
> etc.) or
> > output operati
; customerWithPromotion.foreachPartition();
> }
> });
>
>
> On 21-Oct-2015, at 10:55 AM, Nipun Arora <nipunarora2...@gmail.com> wrote:
>
> Hi All,
>
> Can anyone provide a design pattern for the following code shown in the
> Spark User Manual, in JAVA ? I have t
Hi,
In general in spark stream one can do transformations ( filter, map etc.)
or output operations (collect, forEach) etc. in an event-driven pardigm...
i.e. the action happens only if a message is received.
Is it possible to do actions every few seconds in a polling based fashion,
regardless if
Hi All,
Can anyone provide a design pattern for the following code shown in the
Spark User Manual, in JAVA ? I have the same exact use-case, and for some
reason the design pattern for Java is missing.
Scala version taken from :
This prevents these artifacts from being included in the assembly JARs.
See scope
https://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html#Dependency_Scope
On Mon, Jun 22, 2015 at 10:28 AM, Nipun Arora nipunarora2...@gmail.com
wrote:
Hi Tathagata,
I am attaching
btw. just for reference I have added the code in a gist:
https://gist.github.com/nipunarora/ed987e45028250248edc
and a stackoverflow reference here:
http://stackoverflow.com/questions/31006490/broadcast-variable-null-pointer-exception-in-spark-streaming
On Tue, Jun 23, 2015 at 11:01 AM, Nipun
checkpointing?
I had a similar issue when recreating a streaming context from checkpoint
as broadcast variables are not checkpointed.
On 23 Jun 2015 5:01 pm, Nipun Arora nipunarora2...@gmail.com wrote:
Hi,
I have a spark streaming application where I need to access a model saved
in a HashMap.
I
Hi,
I have a spark streaming application where I need to access a model saved
in a HashMap.
I have *no problems in running the same code with broadcast variables in
the local installation.* However I get a *null pointer* *exception* when I
deploy it on my spark test cluster.
I have stored a
I found the error so just posting on the list.
It seems broadcast variables cannot be declared static.
If you do you get a null pointer exception.
Thanks
Nipun
On Tue, Jun 23, 2015 at 11:08 AM, Nipun Arora nipunarora2...@gmail.com
wrote:
btw. just for reference I have added the code in a gist
dataset in memory, you can tweak the parameter
of remember, such that it does checkpoint at appropriate time.
Thanks
Twinkle
On Thursday, June 18, 2015, Nipun Arora nipunarora2...@gmail.com wrote:
Hi All,
I am updating my question so that I give more detail. I have also created
@Twinkle - what did you mean by Regarding not keeping whole dataset in
memory, you can tweak the parameter of remember, such that it does
checkpoint at appropriate time?
On Thu, Jun 18, 2015 at 11:40 AM, Nipun Arora nipunarora2...@gmail.com
wrote:
Hi All,
I appreciate the help :)
Here
Hi,
I have the following piece of code, where I am trying to transform a spark
stream and add min and max to it of eachRDD. However, I get an error saying
max call does not exist, at run-time (compiles properly). I am using
spark-1.4
I have added the question to stackoverflow as well:
in
the classpath so you do not have to bundle it.
On Thu, Jun 18, 2015 at 8:44 AM, Nipun Arora nipunarora2...@gmail.com
wrote:
Hi,
I have the following piece of code, where I am trying to transform a
spark stream and add min and max to it of eachRDD. However, I get an error
saying max call does
in future iterations?
I would like to keep some accumulated history to make calculations.. not
the entire dataset, but persist certain events which can be used in future
DStream RDDs?
Thanks
Nipun
On Wed, Jun 17, 2015 at 11:15 PM, Nipun Arora nipunarora2...@gmail.com
wrote:
Hi Silvio,
Thanks for your
Hi,
Is there anyway in spark streaming to keep data across multiple
micro-batches? Like in a HashMap or something?
Can anyone make suggestions on how to keep data across iterations where
each iteration is an RDD being processed in JavaDStream?
This is especially the case when I am trying to
Hi,
Is there anyway in spark streaming to keep data across multiple
micro-batches? Like in a HashMap or something?
Can anyone make suggestions on how to keep data across iterations where
each iteration is an RDD being processed in JavaDStream?
This is especially the case when I am trying to
.
Any suggestions?
Thanks
Nipun
On Wed, Jun 17, 2015 at 10:52 PM, Silvio Fiorito
silvio.fior...@granturing.com wrote:
Hi, just answered in your other thread as well...
Depending on your requirements, you can look at the updateStateByKey API
From: Nipun Arora
Date: Wednesday, June 17
46 matches
Mail list logo