Davies Liu created SPARK-5224:
---------------------------------

             Summary: parallelize list/ndarray is really slow
                 Key: SPARK-5224
                 URL: https://issues.apache.org/jira/browse/SPARK-5224
             Project: Spark
          Issue Type: Bug
          Components: PySpark
    Affects Versions: 1.2.0
            Reporter: Davies Liu
            Priority: Blocker


After the default batchSize changed to 0 (batched based on the size of object), 
but parallelize() still use BatchedSerializer with batchSize=1.

Also, BatchedSerializer did not work well with list and numpy.ndarray



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to