Hello community,

I am working on a project in which statistics (like predicate selectivity) are collected during execution. I think that it's a good idea to keep these statistics in executor level. So, all tasks in same executor share the same variable and no extra network traffic is needed. Also, I am not especially interested in thread safety, it's not a big deal if some updates are lost - we are trying to see the general trend.

This could be done, for example, with an in-memory data structure store server like Redis in each worker machine. But, could it be done in Spark natively?

thanks,
nik


---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Reply via email to