Github user HeartSaVioR commented on the issue:

    https://github.com/apache/spark/pull/21733
  
    I guess we would have to treat reducing state memory size to have worth to 
do: as described in above commit, we already optimized in 
HDFSBackedStateStoreProvider for reducing state store disk size (as well as 
network transfer) via not storing 4 bytes per each row (from both key and 
value). This approach would normally save more than previous optimization on 
value row, given key would have window information which contains two values: 
start and end.
    
    The main issue on this approach for me is possible perf. impact on 
workloads. Hopefully the workload I've covered shows even slight perf. 
improvement but not sure for other workloads yet. I might say we need to 
consider changing default behavior when I have overall good backing numbers, 
but in any way, I'm sure I agree that deciding from committer(s) is necessary. 
Would we be better to initiate mail thread in dev. mailing list?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to