Honestly, I would stay far away from saving offsets in Zookeeper if at
all possible. It's better to store them alongside your results.
On Wed, Oct 26, 2016 at 10:44 AM, Sunita Arvind wrote:
> This is enough to get it to work:
>
>
This is enough to get it to work:
df.save(conf.getString("ParquetOutputPath")+offsetSaved, "parquet",
SaveMode.Overwrite)
And tests so far (in local env) seem good with the edits. Yet to test
on the cluster. Cody, appreciate your thoughts on the edits.
Just want to make sure I am not doing an
The error in the file I just shared is here:
val partitionOffsetPath:String = topicDirs.consumerOffsetDir + "/" +
partition._2(0); --> this was just partition and hence there was an
error
fetching the offset.
Still testing. Somehow Cody, your code never lead to file already
exists sort of
Attached is the edited code. Am I heading in right direction? Also, I am
missing something due to which, it seems to work well as long as the
application is running and the files are created right. But as soon as I
restart the application, it goes back to fromOffset as 0. Any thoughts?
regards
Thanks for confirming Cody.
To get to use the library, I had to do:
val offsetsStore = new
ZooKeeperOffsetsStore(conf.getString("zkHosts"), "/consumers/topics/"+
topics + "/0")
It worked well. However, I had to specify the partitionId in the zkPath.
If I want the library to pick all the
You are correct that you shouldn't have to worry about broker id.
I'm honestly not sure specifically what else you are asking at this point.
On Tue, Oct 25, 2016 at 1:39 PM, Sunita Arvind wrote:
> Just re-read the kafka architecture. Something that slipped my mind is, it
Just re-read the kafka architecture. Something that slipped my mind is, it
is leader based. So topic/partitionId pair will be same on all the brokers.
So we do not need to consider brokerid while storing offsets. Still
exploring rest of the items.
regards
Sunita
On Tue, Oct 25, 2016 at 11:09 AM,
Hello Experts,
I am trying to use the saving to ZK design. Just saw Sudhir's comments that
it is old approach. Any reasons for that? Any issues observed with saving
to ZK. The way we are planning to use it is:
1. Following http://aseigneurin.github.io/2016/05/07/spark-kafka-
See
https://github.com/koeninger/kafka-exactly-once
On Aug 23, 2016 10:30 AM, "KhajaAsmath Mohammed"
wrote:
> Hi Experts,
>
> I am looking for some information on how to acheive zero data loss while
> working with kafka and Spark. I have searched online and blogs have
>
saving offsets to zookeeper is old approach, check-pointing internally
saves the offsets to HDFS/location of checkpointing.
more details here:
http://spark.apache.org/docs/latest/streaming-kafka-integration.html
On Tue, Aug 23, 2016 at 10:30 AM, KhajaAsmath Mohammed <
mdkhajaasm...@gmail.com>
Hi Experts,
I am looking for some information on how to acheive zero data loss while
working with kafka and Spark. I have searched online and blogs have
different answer. Please let me know if anyone has idea on this.
Blog 1:
11 matches
Mail list logo