Spark streaming multiple kafka topic doesn't work at-least-once

2017-01-23 Thread hakanilter
Hi everyone,

I have a spark (1.6.0-cdh5.7.1) streaming job which receives data from
multiple kafka topics. After starting the job, everything works fine first
(like 700 req/sec) but after a while (couples of days or a week) it starts
processing only some part of the data (like 350 req/sec). When I check the
kafka topics, I can see that there are still 700 req/sec coming to the
topics. I don't see any errors, exceptions or any other problem. The job
works fine when I start the same code with just single kafka topic. 

Do you have any idea or a clue to understand the problem? 

Thanks.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-streaming-multiple-kafka-topic-doesn-t-work-at-least-once-tp28334.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Problem with loading files: Loss was due to java.io.EOFException java.io.EOFException

2014-05-21 Thread hakanilter
The problem is solved after hadoop-core dependency added. But I think there
is a misunderstanding about local files. I found this one:

Note that if you've connected to a Spark master, it's possible that it will
attempt to load the file on one of the different machines in the cluster, so
make sure it's available on all the cluster machines. In general, in future
you will want to put your data in HDFS, S3, or similar file systems to avoid
this problem.

http://docs.sigmoidanalytics.com/index.php/Using_the_Spark_Shell

This means that you can't use local files with spark. I don't understand
why, because after calling addFile() or textFile(), the file can be
downloaded by every node on the cluster and became accessible. 

Anyway, if you got Loss was due to java.io.EOFException, you have to make
sure that hadoop libs are available.

dependency
groupIdorg.apache.spark/groupId
artifactIdspark-core_2.10/artifactId
version0.9.1/version
/dependency
dependency
groupIdorg.apache.hadoop/groupId
artifactIdhadoop-core/artifactId
version2.0.0-mr1-cdh4.6.0/version
/dependency
dependency
groupIdorg.apache.hadoop/groupId
artifactIdhadoop-common/artifactId
version2.0.0-cdh4.6.0/version
/dependency
dependency
groupIdorg.apache.hadoop/groupId
artifactIdhadoop-client/artifactId
version2.0.0-cdh4.6.0/version
/dependency

Cheers!




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Problem-with-loading-files-Loss-was-due-to-java-io-EOFException-java-io-EOFException-tp6090p6201.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.