Hi Samuel, How do you create your RDD based on Kakfa direct stream?
Do you have your code snippet? HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com On 15 May 2016 at 23:24, Samuel Zhou <zhou...@gmail.com> wrote: > Hi, > > I was trying to use filter to sampling a Kafka direct stream, and the > filter function just take 1 messages from 10 by using hashcode % 10 == 0, > but the number of events of input for each batch didn't shrink to 10% of > original traffic. So I want to ask if there are any way to shrink the batch > size by a sampling function to save the traffic from Kafka? > > Thanks! > Samuel >