[ https://issues.apache.org/jira/browse/SYSTEMML-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Niketan Pansare resolved SYSTEMML-1370. --------------------------------------- Resolution: Fixed Fix Version/s: SystemML 1.0 Fixed in the commit https://github.com/apache/incubator-systemml/commit/81090134d2de04a3ae90c6f8d79b4c68cb14aab5 > Py4JError: An error occurred while calling > z:org.apache.sysml.runtime.instructions.spark.utils.RDDConverterUtilsExt.convertPy4JArrayToMB. > ----------------------------------------------------------------------------------------------------------------------------------------- > > Key: SYSTEMML-1370 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1370 > Project: SystemML > Issue Type: Bug > Components: APIs > Affects Versions: Not Applicable > Environment: pyspark with local Spark 2.1 > Reporter: Berthold Reinwald > Fix For: SystemML 1.0 > > > Do we have undocumented limits for RDDConverterUtilsExt.convertPy4JArrayToMB? > Below simple script works for 23100 rows, while 46900 fails. This is how to > easily and consistently reproduce. > START: > $pyspark --master local --jars $SYSTEMML_HOME/SystemML.jar --driver-memory 8G > --executor-memory 2G > PYTHON SCRIPT: > from systemml import MLContext, dml > import pandas as pd > sc.version > ml = MLContext(sc) > print "Spark Version:", sc.version > print "SystemML Version:", ml.version() > print "SystemML Built-Time:", ml.buildTime() > # !! number of rows 23100 works, while 46900 fails > nr = 46900 > X_pd = pd.DataFrame(range(1, (nr*784)+1,1),dtype=float).values.reshape(nr,784) > script =""" > write(X, $Xfile, format="csv") > """ > prog = dml(script).input(X=X_pd).input(**{"$Xfile":"/tmp/X_pd.csv"}) > ml.execute(prog) > OUTPUT: > Spark Version: 2.1.0 > SystemML Version: 0.14.0-incubating-SNAPSHOT > SystemML Built-Time: 2017-03-03 07:33:40 UTC > --------------------------------------------------------------------------- > Py4JError Traceback (most recent call last) > ....... > Py4JError: An error occurred while calling > z:org.apache.sysml.runtime.instructions.spark.utils.RDDConverterUtilsExt.convertPy4JArrayToMB. > Trace: > java.lang.NegativeArraySizeException > at py4j.Base64.decode(Base64.java:321) > at py4j.Protocol.getBytes(Protocol.java:173) > at py4j.Protocol.getObject(Protocol.java:294) > at py4j.commands.AbstractCommand.getArguments(AbstractCommand.java:82) > at py4j.commands.CallCommand.execute(CallCommand.java:77) > at py4j.GatewayConnection.run(GatewayConnection.java:214) > at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA (v6.3.15#6346)