Re: Kafka stream message sampling

2016-05-16 Thread Samuel Zhou
Hi, Mich, I created the Kafka DStream with following Java code: sparkConf = new SparkConf().setAppName(this.getClass().getSimpleName() + ", topic: " + topics); jssc = new JavaStreamingContext(sparkConf, Durations.seconds(batchInterval )); HashSet topicsSet = new

Re: Kafka stream message sampling

2016-05-16 Thread Mich Talebzadeh
Hi Samuel, How do you create your RDD based on Kakfa direct stream? Do you have your code snippet? HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

Kafka stream message sampling

2016-05-15 Thread Samuel Zhou
Hi, I was trying to use filter to sampling a Kafka direct stream, and the filter function just take 1 messages from 10 by using hashcode % 10 == 0, but the number of events of input for each batch didn't shrink to 10% of original traffic. So I want to ask if there are any way to shrink the batch