Hello community,
I am working on a project in which statistics (like predicate
selectivity) are collected during execution. I think that it's a good
idea to keep these statistics in executor level. So, all tasks in same
executor share the same variable and no extra network traffic is needed.
Also, I am not especially interested in thread safety, it's not a big
deal if some updates are lost - we are trying to see the general trend.
This could be done, for example, with an in-memory data structure store
server like Redis in each worker machine. But, could it be done in Spark
natively?
thanks,
nik
---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org