Re: Spark Kafka stream processing time increasing gradually

2016-06-22 Thread Roshan Singh
Thanks for the detailed explanation. Just tested it, worked like a charm. On Mon, Jun 20, 2016 at 1:02 PM, N B wrote: > Its actually necessary to retire keys that become "Zero" or "Empty" so to > speak. In your case, the key is "imageURL" and values are a dictionary, one >

Re: Spark Kafka stream processing time increasing gradually

2016-06-20 Thread N B
Its actually necessary to retire keys that become "Zero" or "Empty" so to speak. In your case, the key is "imageURL" and values are a dictionary, one of whose fields is "count" that you are maintaining. For simplicity and illustration's sake I will assume imageURL to be a strings like "abc". Your

Re: Spark Kafka stream processing time increasing gradually

2016-06-16 Thread Roshan Singh
Hi, According to the docs ( https://spark.apache.org/docs/latest/api/python/pyspark.streaming.html#pyspark.streaming.DStream.reduceByKeyAndWindow), filerFunc can be used to retain expiring keys. I do not want to retain any expiring key, so I do not understand how can this help me stabilize it.

Re: Spark Kafka stream processing time increasing gradually

2016-06-16 Thread N B
We had this same issue with the reduceByKeyAndWindow API that you are using. For fixing this issue, you have to use different flavor of that API, specifically the 2 versions that allow you to give a 'Filter function' to them. Putting in the filter functions helped stabilize our application too.