from:"gaganbm"

Re: Strange behaviour of different SSCs with same Kafka topic

2014-04-21 Thread gaganbm

Yes. I am running this in a local mode and the SSCs run on the same JVM.
So, if I deploy this on a cluster, such behavior would be gone ? Also, is
there anyway I can start the SSCs on a local machine but on different JVMs?
I couldn't find anything about this in the documentation.

The inter-mingling of data seems to be gone after I made some of those
external classes as 'scala objects' and keeping static maps and all. Is
that a good idea as far as performance is concerned ?

Thanks

Gagan B Mishra

On Tue, Apr 22, 2014 at 1:59 AM, Tathagata Das [via Apache Spark User List]
ml-node+s1001560n4556...@n3.nabble.com wrote:

Are you by any chance starting two StreamingContexts in the same JVM? That
could explain a lot of the weird mixing of data that you are seeing. Its
not a supported usage scenario to start multiple streamingContexts
simultaneously in the same JVM.

On Thu, Apr 17, 2014 at 10:58 PM, gaganbm [hidden
email]http://user/SendEmail.jtp?type=nodenode=4556i=0
wrote:

It happens with normal data rate, i.e., lets say 20 records per second.

Apart from that, I am also getting some more strange behavior. Let me
explain.

I establish two sscs. Start them one after another. In SSCs I get the
streams from Kafka sources, and do some manipulations. Like, adding some
Record_Name for example, to each of the incoming records. Now this
Record_Name is different for both the SSCs, and I get this field from some
other class, not relevant to the streams.

Now, expected behavior should be, all records in SSC1 gets added with the
field RECORD_NAME_1 and all records in SSC2 should get added with the field
RECORD_NAME_2. Both the SSCs have nothing to do with each other as I
believe.

However, strangely enough, I find many records in SSC1 get added with
RECORD_NAME_2 and vice versa. Is it some kind of serialization issue ?
That, the class which provides this RECORD_NAME gets serialized and is
reconstructed and then some weird thing happens inside ? I am unable to
figure out.

So, apart from skewed frequency and volume of records in both the
streams, I am getting this inter-mingling of data among the streams.

Can you help me in how to use some external data to manipulate the RDD
records ?

Thanks and regards

Gagan B Mishra

*Programmer*
*560034, Bangalore*
*India*

On Tue, Apr 15, 2014 at 4:09 AM, Tathagata Das [via Apache Spark User
List] [hidden email] http://user/SendEmail.jtp?type=nodenode=4434i=0
wrote:

Does this happen at low event rate for that topic as well, or only for a
high volume rate?

On Wed, Apr 9, 2014 at 11:24 PM, gaganbm [hidden
email]http://user/SendEmail.jtp?type=nodenode=4238i=0
wrote:

I am really at my wits' end here.

I have different Streaming contexts, lets say 2, and both listening to
same
Kafka topics. I establish the KafkaStream by setting different consumer
groups to each of them.

Ideally, I should be seeing the kafka events in both the streams. But
what I
am getting is really unpredictable. Only one stream gets a lot of
events and
the other one almost gets nothing or very less compared to the other.
Also
the frequency is very skewed. I get a lot of events in one stream
continuously, and after some duration I get a few events in the other
one.

I don't know where I am going wrong. I can see consumer fetcher threads
for
both the streams that listen to the Kafka topics.

I can give further details if needed. Any help will be great.

Thanks

--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Strange-behaviour-of-different-SSCs-with-same-Kafka-topic-tp4050.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

--
If you reply to this email, your message will be added to the
discussion below:

http://apache-spark-user-list.1001560.n3.nabble.com/Strange-behaviour-of-different-SSCs-with-same-Kafka-topic-tp4050p4238.html
To start a new topic under Apache Spark User List, email [hidden
email]http://user/SendEmail.jtp?type=nodenode=4434i=1
To unsubscribe from Apache Spark User List, click here.
NAMLhttp://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml

--
View this message in context: Re: Strange behaviour of different SSCs
with same Kafka
topichttp://apache-spark-user-list.1001560.n3.nabble.com/Strange-behaviour-of-different-SSCs-with-same-Kafka-topic-tp4050p4434.html

Sent from the Apache Spark User List mailing list
archivehttp://apache-spark-user-list.1001560.n3.nabble.com/at Nabble.com.

--
If you reply to this email

Need clarification of joining streams

2014-04-21 Thread gaganbm

I wanted some clarification on the behavior of join streams.

As I believe, the join works per batch. I am reading data from two Kafka
streams and then joining them based on some keys. But what happens if one
stream hasn't produced any data in that batch duration, and the other has
some ? Or lets say, one stream is getting data at a higher rate, and the
other one is not so frequent. How does it behave in such a case ?

Thanks 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Need-clarification-of-joining-streams-tp4583.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

KafkaReciever Error when starting ssc (Actor name not unique)

2014-04-09 Thread gaganbm

Hi All,

I am getting this exception when doing ssc.start to start the streaming
context.


ERROR KafkaReceiver - Error receiving data
akka.actor.InvalidActorNameException: actor name [NetworkReceiver-0] is not
unique!
at
akka.actor.dungeon.ChildrenContainer$NormalChildrenContainer.reserve(ChildrenContainer.scala:130)
at akka.actor.dungeon.Children$class.reserveChild(Children.scala:77)
at akka.actor.ActorCell.reserveChild(ActorCell.scala:338)
at akka.actor.dungeon.Children$class.makeChild(Children.scala:186)
at akka.actor.dungeon.Children$class.attachChild(Children.scala:42)
at akka.actor.ActorCell.attachChild(ActorCell.scala:338)
at akka.actor.ActorSystemImpl.actorOf(ActorSystem.scala:518)
at
org.apache.spark.streaming.dstream.NetworkReceiver.actor$lzycompute(NetworkInputDStream.scala:94)
at
org.apache.spark.streaming.dstream.NetworkReceiver.actor(NetworkInputDStream.scala:94)
at
org.apache.spark.streaming.dstream.NetworkReceiver.start(NetworkInputDStream.scala:122)
at
org.apache.spark.streaming.scheduler.NetworkInputTracker$ReceiverExecutor$$anonfun$8.apply(NetworkInputTracker.scala:173)
at
org.apache.spark.streaming.scheduler.NetworkInputTracker$ReceiverExecutor$$anonfun$8.apply(NetworkInputTracker.scala:169)
at
org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:884)
at
org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:884)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:109)
at org.apache.spark.scheduler.Task.run(Task.scala:53)
at
org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:213)
at
org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:49)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)

I tried cleaning up the zookeeper and kafka temp/cached files. But still the
same. 

Any help on this ? 




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/KafkaReciever-Error-when-starting-ssc-Actor-name-not-unique-tp3978.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

RDD Collect returns empty arrays

2014-03-26 Thread gaganbm

I am getting strange behavior with the RDDs.

All I want is to persist the RDD contents in a single file. 

The saveAsTextFile() saves them in multiple textfiles for each partition. So
I tried with rdd.coalesce(1,true).saveAsTextFile(). This fails with the
exception :

org.apache.spark.SparkException: Job aborted: Task 75.0:0 failed 1 times
(most recent failure: Exception failure: java.lang.IllegalStateException:
unread block data) 

Then I tried collecting the RDD contents in an array, and writing the array
to the file manually. Again, that fails. It is giving me empty arrays, even
when data is there.

/**The below saves the data in multiple text files. So data is there for
sure **/
rdd.saveAsTextFile(resultDirectory)
/**The below simply prints size 0 for all the RDDs in a stream. Why ?! **/
val arr = rdd.collect
println(SIZE of RDD  + rdd.id +   + arr.size)

Kindly help! I am clueless on how to proceed.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/RDD-Collect-returns-empty-arrays-tp3242.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: rdd.saveAsTextFile problem

2014-03-25 Thread gaganbm

Hi Folks,

Is this issue resolved ? If yes, could you please throw some light on how to
fix this ?

I am facing the same problem during writing to text files.

When I do 

stream.foreachRDD(rdd ={
rdd.saveAsTextFile(Some path)
})

This works fine for me. But it creates multiple text files for each
partition within an RDD.

So I tried with coalesce option to merge my results in a single file for
each RDD as :

stream.foreachRDD(rdd ={
rdd.coalesce(1, true).saveAsTextFile(Some 
path)
})

This fails with :
org.apache.spark.SparkException: Job aborted: Task 75.0:0 failed 1 times
(most recent failure: Exception failure: java.lang.IllegalStateException:
unread block data)

I am using Spark Streaming 0.9.0

Any clue what's going wrong when using coalesce ?





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/rdd-saveAsTextFile-problem-tp176p3238.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Strange behaviour of different SSCs with same Kafka topic

Need clarification of joining streams

KafkaReciever Error when starting ssc (Actor name not unique)

RDD Collect returns empty arrays

Re: rdd.saveAsTextFile problem

5 matches

Site Navigation

Mail list logo

Footer information