This PR fix the problem: https://github.com/apache/spark/pull/2659
cc @josh Davies On Tue, Nov 11, 2014 at 7:47 PM, bliuab <bli...@cse.ust.hk> wrote: > In spark-1.0.2, I have come across an error when I try to broadcast a quite > large numpy array(with 35M dimension). The error information except the > java.lang.NegativeArraySizeException error and details is listed below. > Moreover, when broadcast a relatively smaller numpy array(30M dimension), > everything works fine. And 30M dimension numpy array takes 230M memory > which, in my opinion, not very large. > As far as I have surveyed, it seems related with py4j. However, I have no > idea how to fix this. I would be appreciated if I can get some hint. > ------------ > py4j.protocol.Py4JError: An error occurred while calling o23.broadcast. > Trace: > java.lang.NegativeArraySizeException > at py4j.Base64.decode(Base64.java:292) > at py4j.Protocol.getBytes(Protocol.java:167) > at py4j.Protocol.getObject(Protocol.java:276) > at > py4j.commands.AbstractCommand.getArguments(AbstractCommand.java:81) > at py4j.commands.CallCommand.execute(CallCommand.java:77) > at py4j.GatewayConnection.run(GatewayConnection.java:207) > ------------- > And the test code is a follows: > conf = > SparkConf().setAppName('brodyliu_LR').setMaster('spark://10.231.131.87:5051') > conf.set('spark.executor.memory', '4000m') > conf.set('spark.akka.timeout', '100000') > conf.set('spark.ui.port','8081') > conf.set('spark.cores.max','150') > #conf.set('spark.rdd.compress', 'True') > conf.set('spark.default.parallelism', '300') > #configure the spark environment > sc = SparkContext(conf=conf, batchSize=1) > > vec = np.random.rand(35000000) > a = sc.broadcast(vec) > > > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Pyspark-Error-when-broadcast-numpy-array-tp18662.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org