[GitHub] spark pull request #14467: [SPARK-16861][PYSPARK][CORE] Refactor PySpark acc...

srowen Wed, 21 Sep 2016 02:40:44 -0700

Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14467#discussion_r79795258
  
    --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala 
---
    @@ -866,11 +866,14 @@ class BytesToString extends 
org.apache.spark.api.java.function.Function[Array[By
     }
     
     /**
    - * Internal class that acts as an `AccumulatorParam` for Python 
accumulators. Inside, it
    + * Internal class that acts as an `AccumulatorV2` for Python accumulators. 
Inside, it
      * collects a list of pickled strings that we pass to Python through a 
socket.
      */
    -private class PythonAccumulatorParam(@transient private val serverHost: 
String, serverPort: Int)
    -  extends AccumulatorParam[JList[Array[Byte]]] {
    +private[spark] class PythonAccumulatorV2(@transient private val 
serverHost: String, serverPort: Int)
    +  extends AccumulatorV2[JList[Array[Byte]], JList[Array[Byte]]] {
    --- End diff --
    
    You're not having to reimplement all the other methods. That seems like a 
win? This thing is fundamentally accumulating a collection of things too. 
@davies ?
    
    I don't know about the merge logic. I assume that something here is 
required to send the data back to the Python driver process in order for the 
accumulator to work, but I don't know this well. At least, that can stay as-is 
for now. I didn't actually change it much at all in the branch above, it's 
mostly indentation changes.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14467: [SPARK-16861][PYSPARK][CORE] Refactor PySpark acc...

Reply via email to