[GitHub] spark pull request: [SPARK-3491] [MLlib] [PySpark] use pickle to s...

davies Fri, 19 Sep 2014 10:43:54 -0700

Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/2378#issuecomment-56210084
  
    @mengxr PickleSerializer do not compress data, there is CompressSerializer 
can do it using gzip(level 1). Compression can help for small range of double 
or repeated values, will be worser with random double in large range.
    
    BatchedSerializer can help to reduce the overhead of name of class. In JVM, 
the memory of short lived objects can not be reused without GC, so 
batched-serialization will not increase the gc pressure if the batch size it 
not too large. (depend on how gc is configured)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3491] [MLlib] [PySpark] use pickle to s...

Reply via email to