Thanks for the detailed explanation. Just tested it, worked like a charm.
On Mon, Jun 20, 2016 at 1:02 PM, N B wrote:
> Its actually necessary to retire keys that become "Zero" or "Empty" so to
> speak. In your case, the key is "imageURL" and values are a dictionary, one
>
Its actually necessary to retire keys that become "Zero" or "Empty" so to
speak. In your case, the key is "imageURL" and values are a dictionary, one
of whose fields is "count" that you are maintaining. For simplicity and
illustration's sake I will assume imageURL to be a strings like "abc". Your
Hi,
According to the docs (
https://spark.apache.org/docs/latest/api/python/pyspark.streaming.html#pyspark.streaming.DStream.reduceByKeyAndWindow),
filerFunc can be used to retain expiring keys. I do not want to retain any
expiring key, so I do not understand how can this help me stabilize it.
We had this same issue with the reduceByKeyAndWindow API that you are
using. For fixing this issue, you have to use different flavor of that
API, specifically the 2 versions that allow you to give a 'Filter function'
to them. Putting in the filter functions helped stabilize our application
too.