Davies Liu created SPARK-5224: --------------------------------- Summary: parallelize list/ndarray is really slow Key: SPARK-5224 URL: https://issues.apache.org/jira/browse/SPARK-5224 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 1.2.0 Reporter: Davies Liu Priority: Blocker
After the default batchSize changed to 0 (batched based on the size of object), but parallelize() still use BatchedSerializer with batchSize=1. Also, BatchedSerializer did not work well with list and numpy.ndarray -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org