It doesn't make sense to me. Because in the another cluster process all
data in less than a second.
Anyway, I'm going to set that parameter.
2015-07-31 0:36 GMT+02:00 Tathagata Das t...@databricks.com:
Yes, and that is indeed the problem. It is trying to process all the data
in Kafka, and
I detected the error. The final step is to index data in ElasticSearch, The
elasticSearch in one of the cluster is overhelmed and it doesn't work
correctly.
I linked the cluster which doesn't work with another ES and don't get any
delay.
Sorry, it wasn't relationed with Spark!
2015-07-31
I have some problem with the JobScheduler. I have executed same code in two
cluster. I read from three topics in Kafka with DirectStream so I have
three tasks.
I have check YARN and there aren't more jobs launched.
The cluster where I have troubles I got this logs:
15/07/30 14:32:58 INFO
I read about maxRatePerPartition parameter, I haven't set this parameter.
Could it be the problem?? Although this wouldn't explain why it doesn't
work in one of the clusters.
2015-07-30 14:47 GMT+02:00 Guillermo Ortiz konstt2...@gmail.com:
They just share the kafka, the rest of resources are
They just share the kafka, the rest of resources are independents. I tried
to stop one cluster and execute just the cluster isn't working but it
happens the same.
2015-07-30 14:41 GMT+02:00 Guillermo Ortiz konstt2...@gmail.com:
I have some problem with the JobScheduler. I have executed same
Just so I'm clear, the difference in timing you're talking about is this:
15/07/30 14:33:59 INFO DAGScheduler: Job 24 finished: foreachRDD at
MetricsSpark.scala:67, took 60.391761 s
15/07/30 14:37:35 INFO DAGScheduler: Job 93 finished: foreachRDD at
MetricsSpark.scala:67, took 0.531323 s
Are
I have three topics with one partition each topic. So each jobs run about
one topics.
2015-07-30 16:20 GMT+02:00 Cody Koeninger c...@koeninger.org:
Just so I'm clear, the difference in timing you're talking about is this:
15/07/30 14:33:59 INFO DAGScheduler: Job 24 finished: foreachRDD at
If the jobs are running on different topicpartitions, what's different
about them? Is one of them 120x the throughput of the other, for
instance? You should be able to eliminate cluster config as a difference
by running the same topic partition on the different clusters and comparing
the
Yes, and that is indeed the problem. It is trying to process all the data
in Kafka, and therefore taking 60 seconds. You need to set the rate limits
for that.
On Thu, Jul 30, 2015 at 8:51 AM, Cody Koeninger c...@koeninger.org wrote:
If you don't set it, there is no maximum rate, it will get
The difference is that one recives more data than the others two. I can
pass thought parameters the topics, so, I could execute the code trying
with one topic and figure out with one is the topic, although I guess that
it's the topics which gets more data.
Anyway it's pretty weird those delays in
10 matches
Mail list logo