[GitHub] spark pull request #18659: [SPARK-21190][PYSPARK][WIP] Python Vectorized UDF...

BryanCutler Fri, 15 Sep 2017 15:21:46 -0700

Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18659#discussion_r139261343
  
    --- Diff: python/pyspark/serializers.py ---
    @@ -199,6 +211,33 @@ def __repr__(self):
             return "ArrowSerializer"
     
     
    +class ArrowPandasSerializer(ArrowSerializer):
    +
    +    def __init__(self):
    +        super(ArrowPandasSerializer, self).__init__()
    +
    +    def dumps(self, series):
    +        """
    +        Make an ArrowRecordBatch from a Pandas Series and serialize
    +        """
    +        import pyarrow as pa
    --- End diff --
    
    Yeah, it would probably be best to handle it the same way as in 
`toPandas()`.
    
    That got me thinking that it is a little weird to have an SQLConf 
"spark.sql.execution.arrow.enable" that is set for `toPandas()` but has no 
bearing with `pandas_udf`.  It doesn't need to since it is an explicit call but 
seems a little contradictory, what do you think?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18659: [SPARK-21190][PYSPARK][WIP] Python Vectorized UDF...

Reply via email to