Dear Liu:

Thank you very much for your help. I will update that patch. By the way, as
I have succeed to broadcast an array of size(30M) the log said that such
array takes around 230MB memory. As a result, I think the numpy array that
leads to error is much smaller than 2G.

On Wed, Nov 12, 2014 at 12:29 PM, Davies Liu-2 [via Apache Spark User List]
<ml-node+s1001560n18673...@n3.nabble.com> wrote:

> This PR fix the problem: https://github.com/apache/spark/pull/2659
>
> cc @josh
>
> Davies
>
> On Tue, Nov 11, 2014 at 7:47 PM, bliuab <[hidden email]
> <http://user/SendEmail.jtp?type=node&node=18673&i=0>> wrote:
>
> > In spark-1.0.2, I have come across an error when I try to broadcast a
> quite
> > large numpy array(with 35M dimension). The error information except the
> > java.lang.NegativeArraySizeException error and details is listed below.
> > Moreover, when broadcast a relatively smaller numpy array(30M
> dimension),
> > everything works fine. And 30M dimension numpy array takes 230M memory
> > which, in my opinion, not very large.
> > As far as I have surveyed, it seems related with py4j. However, I have
> no
> > idea how to fix  this. I would be appreciated if I can get some hint.
> > ------------
> > py4j.protocol.Py4JError: An error occurred while calling o23.broadcast.
> > Trace:
> > java.lang.NegativeArraySizeException
> >         at py4j.Base64.decode(Base64.java:292)
> >         at py4j.Protocol.getBytes(Protocol.java:167)
> >         at py4j.Protocol.getObject(Protocol.java:276)
> >         at
> > py4j.commands.AbstractCommand.getArguments(AbstractCommand.java:81)
> >         at py4j.commands.CallCommand.execute(CallCommand.java:77)
> >         at py4j.GatewayConnection.run(GatewayConnection.java:207)
> > -------------
> > And the test code is a follows:
> > conf =
> > SparkConf().setAppName('brodyliu_LR').setMaster('spark://
> 10.231.131.87:5051')
> > conf.set('spark.executor.memory', '4000m')
> > conf.set('spark.akka.timeout', '100000')
> > conf.set('spark.ui.port','8081')
> > conf.set('spark.cores.max','150')
> > #conf.set('spark.rdd.compress', 'True')
> > conf.set('spark.default.parallelism', '300')
> > #configure the spark environment
> > sc = SparkContext(conf=conf, batchSize=1)
> >
> > vec = np.random.rand(35000000)
> > a = sc.broadcast(vec)
> >
> >
> >
> >
> >
> >
> > --
> > View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Pyspark-Error-when-broadcast-numpy-array-tp18662.html
> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> <http://user/SendEmail.jtp?type=node&node=18673&i=1>
> > For additional commands, e-mail: [hidden email]
> <http://user/SendEmail.jtp?type=node&node=18673&i=2>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> <http://user/SendEmail.jtp?type=node&node=18673&i=3>
> For additional commands, e-mail: [hidden email]
> <http://user/SendEmail.jtp?type=node&node=18673&i=4>
>
>
>
> ------------------------------
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://apache-spark-user-list.1001560.n3.nabble.com/Pyspark-Error-when-broadcast-numpy-array-tp18662p18673.html
>  To unsubscribe from Pyspark Error when broadcast numpy array, click here
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=18662&code=YmxpdWFiQGNzZS51c3QuaGt8MTg2NjJ8NTUwMDMxMjYz>
> .
> NAML
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>



-- 
My Homepage: www.cse.ust.hk/~bliuab
MPhil student in Hong Kong University of Science and Technology.
Clear Water Bay, Kowloon, Hong Kong.
Profile at LinkedIn <http://www.linkedin.com/pub/liu-bo/55/52b/10b>.




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Pyspark-Error-when-broadcast-numpy-array-tp18662p18674.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to