[GitHub] spark pull request #21546: [SPARK-23030][SQL][PYTHON] Use Arrow stream forma...

BryanCutler Tue, 24 Jul 2018 11:43:37 -0700

Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21546#discussion_r204868891
  
    --- Diff: python/pyspark/serializers.py ---
    @@ -184,27 +184,67 @@ def loads(self, obj):
             raise NotImplementedError
     
     
    -class ArrowSerializer(FramedSerializer):
    +class BatchOrderSerializer(Serializer):
    --- End diff --
    
    Yeah, I could separate this but is there anything I can do to alleviate 
your concern?  I'm not sure I'll have the time to try to make another PR before 
2.4.0 code freeze and I think this is a really useful memory optimization to 
help prevent OOM in the driver JVM.  Also, I might have to rerun the benchmarks 
here, just to be thorough, because the previous ones were from quite a while 
ago.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21546: [SPARK-23030][SQL][PYTHON] Use Arrow stream forma...

Reply via email to