[jira] [Closed] (SYSTEMML-1261) Fix transitive Spark execution type selection for ba+*
[ https://issues.apache.org/jira/browse/SYSTEMML-1261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Boehm closed SYSTEMML-1261. > Fix transitive Spark execution type selection for ba+* > -- > > Key: SYSTEMML-1261 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1261 > Project: SystemML > Issue Type: Sub-task > Components: Test >Reporter: Matthias Boehm >Assignee: Matthias Boehm > Fix For: SystemML 0.13 > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Closed] (SYSTEMML-1242) Perftest: OutOfMemoryError in MultiLogReg for 80g sparse
[ https://issues.apache.org/jira/browse/SYSTEMML-1242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Boehm closed SYSTEMML-1242. > Perftest: OutOfMemoryError in MultiLogReg for 80g sparse > > > Key: SYSTEMML-1242 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1242 > Project: SystemML > Issue Type: Bug > Components: Test >Affects Versions: SystemML 0.11, SystemML 0.12 > Environment: spark 2.1.0 >Reporter: Imran Younus >Assignee: Matthias Boehm > Fix For: SystemML 0.13 > > Attachments: sparkDML.sh > > > when running {{runMultiLogReg.sh}} script, {{MultiLogReg.dml}} ends with > OutOfMemory error for the case of 10M_1K sparse data and {{icpt = 1}}. Here > is the end of the log file: > {code} > 17/02/04 17:20:33 INFO api.DMLScript: SystemML Statistics: > Total elapsed time: 697.694 sec. > Total compilation time: 2.543 sec. > Total execution time: 695.151 sec. > Number of compiled Spark inst:73. > Number of executed Spark inst:16. > Cache hits (Mem, WB, FS, HDFS): 46/9/1/7. > Cache writes (WB, FS, HDFS): 27/1/1. > Cache times (ACQr/m, RLS, EXP): 281.541/0.003/131.589/48.737 sec. > HOP DAGs recompiled (PRED, SB): 0/15. > HOP DAGs recompile time: 0.067 sec. > Spark ctx create time (lazy): 31.078 sec. > Spark trans counts (par,bc,col):5/4/0. > Spark trans times (par,bc,col): 46.748/0.392/0.000 secs. > Total JIT compile time: 151.254 sec. > Total JVM GC count: 144. > Total JVM GC time:220.671 sec. > Heavy hitter instructions (name, time, count): > -- 1) ba+*144.194 sec 3 > -- 2) rand109.939 sec 9 > -- 3) uark+ 105.011 sec 2 > -- 4) r' 100.933 sec 3 > -- 5) sp_/80.387 sec 1 > -- 6) sp_mapmm45.491 sec 2 > -- 7) sp_tak+*40.655 sec 1 > -- 8) append 9.480 sec 1 > -- 9) rangeReIndex7.347 sec 2 > -- 10)sp_-6.392 sec 3 > 17/02/04 17:20:33 INFO api.DMLScript: END DML run 02/04/2017 17:20:33 > Exception in thread "main" java.lang.OutOfMemoryError: Java heap space > at > org.apache.sysml.runtime.matrix.data.MatrixBlock.allocateDenseBlock(MatrixBlock.java:363) > at > org.apache.sysml.runtime.matrix.data.MatrixBlock.allocateDenseBlock(MatrixBlock.java:339) > at > org.apache.sysml.runtime.matrix.data.MatrixBlock.allocateDenseBlockUnsafe(MatrixBlock.java:408) > at > org.apache.sysml.runtime.io.MatrixReader.createOutputMatrixBlock(MatrixReader.java:107) > at > org.apache.sysml.runtime.io.ReaderBinaryBlockParallel.readMatrixFromHDFS(ReaderBinaryBlockParallel.java:59) > at > org.apache.sysml.runtime.util.DataConverter.readMatrixFromHDFS(DataConverter.java:203) > at > org.apache.sysml.runtime.util.DataConverter.readMatrixFromHDFS(DataConverter.java:168) > at > org.apache.sysml.runtime.controlprogram.caching.MatrixObject.readBlobFromHDFS(MatrixObject.java:425) > at > org.apache.sysml.runtime.controlprogram.caching.MatrixObject.readBlobFromHDFS(MatrixObject.java:60) > at > org.apache.sysml.runtime.controlprogram.caching.CacheableData.readBlobFromHDFS(CacheableData.java:920) > at > org.apache.sysml.runtime.controlprogram.caching.MatrixObject.readBlobFromRDD(MatrixObject.java:478) > at > org.apache.sysml.runtime.controlprogram.caching.MatrixObject.readBlobFromRDD(MatrixObject.java:60) > at > org.apache.sysml.runtime.controlprogram.caching.CacheableData.acquireRead(CacheableData.java:411) > at > org.apache.sysml.runtime.controlprogram.context.ExecutionContext.getMatrixInput(ExecutionContext.java:209) > at > org.apache.sysml.runtime.instructions.cp.AggregateBinaryCPInstruction.processInstruction(AggregateBinaryCPInstruction.java:74) > at > org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:290) > at > org.apache.sysml.runtime.controlprogram.ProgramBlock.executeInstructions(ProgramBlock.java:221) > at > org.apache.sysml.runtime.controlprogram.ProgramBlock.execute(ProgramBlock.java:168) > at > org.apache.sysml.runtime.controlprogram.IfProgramBlock.execute(IfProgramBlock.java:139) > at > org.apache.sysml.runtime.controlprogram.WhileProgramBlock.execute(WhileProgramBlock.java:165) > at > org.apache.sysml.runtime.controlprogram.Program.execute(Program.java:123) > at org.apache.sysml.api.DMLScript.execute(DMLScript.java:684) > at org.apache.sysml.api.DMLScript.executeScript(DMLScript.java:360) > at org.apache.sysml.api.DMLScript.main(DMLScript.java:221) > at
[jira] [Resolved] (SYSTEMML-1242) Perftest: OutOfMemoryError in MultiLogReg for 80g sparse
[ https://issues.apache.org/jira/browse/SYSTEMML-1242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Boehm resolved SYSTEMML-1242. -- Resolution: Fixed I'm closing this issue as the new transitive execution type selection fixed the specific OOM - the other robustness features will be addressed in separate JIRAs. > Perftest: OutOfMemoryError in MultiLogReg for 80g sparse > > > Key: SYSTEMML-1242 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1242 > Project: SystemML > Issue Type: Bug > Components: Test >Affects Versions: SystemML 0.11, SystemML 0.12 > Environment: spark 2.1.0 >Reporter: Imran Younus >Assignee: Matthias Boehm > Fix For: SystemML 0.13 > > Attachments: sparkDML.sh > > > when running {{runMultiLogReg.sh}} script, {{MultiLogReg.dml}} ends with > OutOfMemory error for the case of 10M_1K sparse data and {{icpt = 1}}. Here > is the end of the log file: > {code} > 17/02/04 17:20:33 INFO api.DMLScript: SystemML Statistics: > Total elapsed time: 697.694 sec. > Total compilation time: 2.543 sec. > Total execution time: 695.151 sec. > Number of compiled Spark inst:73. > Number of executed Spark inst:16. > Cache hits (Mem, WB, FS, HDFS): 46/9/1/7. > Cache writes (WB, FS, HDFS): 27/1/1. > Cache times (ACQr/m, RLS, EXP): 281.541/0.003/131.589/48.737 sec. > HOP DAGs recompiled (PRED, SB): 0/15. > HOP DAGs recompile time: 0.067 sec. > Spark ctx create time (lazy): 31.078 sec. > Spark trans counts (par,bc,col):5/4/0. > Spark trans times (par,bc,col): 46.748/0.392/0.000 secs. > Total JIT compile time: 151.254 sec. > Total JVM GC count: 144. > Total JVM GC time:220.671 sec. > Heavy hitter instructions (name, time, count): > -- 1) ba+*144.194 sec 3 > -- 2) rand109.939 sec 9 > -- 3) uark+ 105.011 sec 2 > -- 4) r' 100.933 sec 3 > -- 5) sp_/80.387 sec 1 > -- 6) sp_mapmm45.491 sec 2 > -- 7) sp_tak+*40.655 sec 1 > -- 8) append 9.480 sec 1 > -- 9) rangeReIndex7.347 sec 2 > -- 10)sp_-6.392 sec 3 > 17/02/04 17:20:33 INFO api.DMLScript: END DML run 02/04/2017 17:20:33 > Exception in thread "main" java.lang.OutOfMemoryError: Java heap space > at > org.apache.sysml.runtime.matrix.data.MatrixBlock.allocateDenseBlock(MatrixBlock.java:363) > at > org.apache.sysml.runtime.matrix.data.MatrixBlock.allocateDenseBlock(MatrixBlock.java:339) > at > org.apache.sysml.runtime.matrix.data.MatrixBlock.allocateDenseBlockUnsafe(MatrixBlock.java:408) > at > org.apache.sysml.runtime.io.MatrixReader.createOutputMatrixBlock(MatrixReader.java:107) > at > org.apache.sysml.runtime.io.ReaderBinaryBlockParallel.readMatrixFromHDFS(ReaderBinaryBlockParallel.java:59) > at > org.apache.sysml.runtime.util.DataConverter.readMatrixFromHDFS(DataConverter.java:203) > at > org.apache.sysml.runtime.util.DataConverter.readMatrixFromHDFS(DataConverter.java:168) > at > org.apache.sysml.runtime.controlprogram.caching.MatrixObject.readBlobFromHDFS(MatrixObject.java:425) > at > org.apache.sysml.runtime.controlprogram.caching.MatrixObject.readBlobFromHDFS(MatrixObject.java:60) > at > org.apache.sysml.runtime.controlprogram.caching.CacheableData.readBlobFromHDFS(CacheableData.java:920) > at > org.apache.sysml.runtime.controlprogram.caching.MatrixObject.readBlobFromRDD(MatrixObject.java:478) > at > org.apache.sysml.runtime.controlprogram.caching.MatrixObject.readBlobFromRDD(MatrixObject.java:60) > at > org.apache.sysml.runtime.controlprogram.caching.CacheableData.acquireRead(CacheableData.java:411) > at > org.apache.sysml.runtime.controlprogram.context.ExecutionContext.getMatrixInput(ExecutionContext.java:209) > at > org.apache.sysml.runtime.instructions.cp.AggregateBinaryCPInstruction.processInstruction(AggregateBinaryCPInstruction.java:74) > at > org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:290) > at > org.apache.sysml.runtime.controlprogram.ProgramBlock.executeInstructions(ProgramBlock.java:221) > at > org.apache.sysml.runtime.controlprogram.ProgramBlock.execute(ProgramBlock.java:168) > at > org.apache.sysml.runtime.controlprogram.IfProgramBlock.execute(IfProgramBlock.java:139) > at > org.apache.sysml.runtime.controlprogram.WhileProgramBlock.execute(WhileProgramBlock.java:165) > at > org.apache.sysml.runtime.controlprogram.Program.execute(Program.java:123) > at org.apache.sysml.api.DMLScript.execute(DMLScript.java:684) >
[jira] [Commented] (SYSTEMML-1242) Perftest: OutOfMemoryError in MultiLogReg for 80g sparse
[ https://issues.apache.org/jira/browse/SYSTEMML-1242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15873013#comment-15873013 ] Matthias Boehm commented on SYSTEMML-1242: -- sounds good - thanks for confirming [~iyounus] > Perftest: OutOfMemoryError in MultiLogReg for 80g sparse > > > Key: SYSTEMML-1242 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1242 > Project: SystemML > Issue Type: Bug > Components: Test >Affects Versions: SystemML 0.11, SystemML 0.12 > Environment: spark 2.1.0 >Reporter: Imran Younus >Assignee: Matthias Boehm > Fix For: SystemML 0.13 > > Attachments: sparkDML.sh > > > when running {{runMultiLogReg.sh}} script, {{MultiLogReg.dml}} ends with > OutOfMemory error for the case of 10M_1K sparse data and {{icpt = 1}}. Here > is the end of the log file: > {code} > 17/02/04 17:20:33 INFO api.DMLScript: SystemML Statistics: > Total elapsed time: 697.694 sec. > Total compilation time: 2.543 sec. > Total execution time: 695.151 sec. > Number of compiled Spark inst:73. > Number of executed Spark inst:16. > Cache hits (Mem, WB, FS, HDFS): 46/9/1/7. > Cache writes (WB, FS, HDFS): 27/1/1. > Cache times (ACQr/m, RLS, EXP): 281.541/0.003/131.589/48.737 sec. > HOP DAGs recompiled (PRED, SB): 0/15. > HOP DAGs recompile time: 0.067 sec. > Spark ctx create time (lazy): 31.078 sec. > Spark trans counts (par,bc,col):5/4/0. > Spark trans times (par,bc,col): 46.748/0.392/0.000 secs. > Total JIT compile time: 151.254 sec. > Total JVM GC count: 144. > Total JVM GC time:220.671 sec. > Heavy hitter instructions (name, time, count): > -- 1) ba+*144.194 sec 3 > -- 2) rand109.939 sec 9 > -- 3) uark+ 105.011 sec 2 > -- 4) r' 100.933 sec 3 > -- 5) sp_/80.387 sec 1 > -- 6) sp_mapmm45.491 sec 2 > -- 7) sp_tak+*40.655 sec 1 > -- 8) append 9.480 sec 1 > -- 9) rangeReIndex7.347 sec 2 > -- 10)sp_-6.392 sec 3 > 17/02/04 17:20:33 INFO api.DMLScript: END DML run 02/04/2017 17:20:33 > Exception in thread "main" java.lang.OutOfMemoryError: Java heap space > at > org.apache.sysml.runtime.matrix.data.MatrixBlock.allocateDenseBlock(MatrixBlock.java:363) > at > org.apache.sysml.runtime.matrix.data.MatrixBlock.allocateDenseBlock(MatrixBlock.java:339) > at > org.apache.sysml.runtime.matrix.data.MatrixBlock.allocateDenseBlockUnsafe(MatrixBlock.java:408) > at > org.apache.sysml.runtime.io.MatrixReader.createOutputMatrixBlock(MatrixReader.java:107) > at > org.apache.sysml.runtime.io.ReaderBinaryBlockParallel.readMatrixFromHDFS(ReaderBinaryBlockParallel.java:59) > at > org.apache.sysml.runtime.util.DataConverter.readMatrixFromHDFS(DataConverter.java:203) > at > org.apache.sysml.runtime.util.DataConverter.readMatrixFromHDFS(DataConverter.java:168) > at > org.apache.sysml.runtime.controlprogram.caching.MatrixObject.readBlobFromHDFS(MatrixObject.java:425) > at > org.apache.sysml.runtime.controlprogram.caching.MatrixObject.readBlobFromHDFS(MatrixObject.java:60) > at > org.apache.sysml.runtime.controlprogram.caching.CacheableData.readBlobFromHDFS(CacheableData.java:920) > at > org.apache.sysml.runtime.controlprogram.caching.MatrixObject.readBlobFromRDD(MatrixObject.java:478) > at > org.apache.sysml.runtime.controlprogram.caching.MatrixObject.readBlobFromRDD(MatrixObject.java:60) > at > org.apache.sysml.runtime.controlprogram.caching.CacheableData.acquireRead(CacheableData.java:411) > at > org.apache.sysml.runtime.controlprogram.context.ExecutionContext.getMatrixInput(ExecutionContext.java:209) > at > org.apache.sysml.runtime.instructions.cp.AggregateBinaryCPInstruction.processInstruction(AggregateBinaryCPInstruction.java:74) > at > org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:290) > at > org.apache.sysml.runtime.controlprogram.ProgramBlock.executeInstructions(ProgramBlock.java:221) > at > org.apache.sysml.runtime.controlprogram.ProgramBlock.execute(ProgramBlock.java:168) > at > org.apache.sysml.runtime.controlprogram.IfProgramBlock.execute(IfProgramBlock.java:139) > at > org.apache.sysml.runtime.controlprogram.WhileProgramBlock.execute(WhileProgramBlock.java:165) > at > org.apache.sysml.runtime.controlprogram.Program.execute(Program.java:123) > at org.apache.sysml.api.DMLScript.execute(DMLScript.java:684) > at org.apache.sysml.api.DMLScript.executeScript(DMLScript.java:360) > at
[jira] [Commented] (SYSTEMML-1211) Verify dependencies for Spark 2
[ https://issues.apache.org/jira/browse/SYSTEMML-1211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15872854#comment-15872854 ] Deron Eriksson commented on SYSTEMML-1211: -- Thank you [~gweidner]! [PR400|https://github.com/apache/incubator-systemml/pull/400] addressed the Windows Hadoop 2.6.0 issue. > Verify dependencies for Spark 2 > --- > > Key: SYSTEMML-1211 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1211 > Project: SystemML > Issue Type: Sub-task > Components: Build >Reporter: Deron Eriksson >Assignee: Deron Eriksson > > With the migration to Spark 2, we should verify that the artifact assemblies > are properly handling all dependencies. > Also, we should verify that that artifact licenses properly include all > dependencies following the Spark 2 migration. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (SYSTEMML-1283) Out of memory error
[ https://issues.apache.org/jira/browse/SYSTEMML-1283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brendan Dwyer updated SYSTEMML-1283: Description: Possibly related to [SYSTEMML-1281] When a matrix X containing ~13,000 rows and ~30 unique values are passed into the following DML scripts it errors out on my laptop but passes in my 5 node cluster. {code} # # encode dml function for one hot encoding encode_onehot = function(matrix[double] X) return(matrix[double] Y) { N = nrow(X) Y = table(seq(1, N, 1), X) } # a dummy read, which allows sysML to attach variables X = read("") col_idx = $onehot_index nc = ncol(X) if (col_idx < 1 | col_idx > nc) { stop("one hot index out of range") } Y = matrix(0, rows=1, cols=1) oneHot = encode_onehot(X[,col_idx:col_idx]) if (col_idx == 1) { if (col_idx < nc) { X_tmp = X[, col_idx+1:nc] Y = append(oneHot, X_tmp) } else { Y = oneHot } } else if (1 < col_idx & col_idx < nc) { Y = append(append(X[,1:col_idx-1], oneHot), X[, col_idx+1:nc]) } else { # col_idx == nc Y = append(X[,1:col_idx-1], oneHot) } # a dummy write, which allows sysML to attach varibles write(Y, "") {code} Error: {code} 17/02/17 16:57:35 ERROR Executor: Exception in task 0.0 in stage 63.0 (TID 1739) java.lang.OutOfMemoryError: GC overhead limit exceeded at java.lang.Double.valueOf(Double.java:519) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificSafeProjection.apply_853$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificSafeProjection.apply(Unknown Source) at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) at org.apache.spark.util.Utils$$anon$4.next(Utils.scala:1778) at org.apache.spark.util.Utils$$anon$4.next(Utils.scala:1772) at scala.collection.convert.Wrappers$IteratorWrapper.next(Wrappers.scala:31) at org.apache.sysml.runtime.instructions.spark.utils.FrameRDDConverterUtils$DataFrameToBinaryBlockFunction.call(FrameRDDConverterUtils.java:748) at org.apache.sysml.runtime.instructions.spark.utils.FrameRDDConverterUtils$DataFrameToBinaryBlockFunction.call(FrameRDDConverterUtils.java:715) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:186) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:186) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:796) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:796) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) at org.apache.spark.scheduler.Task.run(Task.scala:99) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 17/02/17 16:57:35 ERROR TaskSetManager: Task 0 in stage 63.0 failed 1 times; aborting job 17/02/17 16:57:36 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[Executor task launch worker-20,5,main] java.lang.OutOfMemoryError: GC overhead limit exceeded at java.lang.Double.valueOf(Double.java:519) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificSafeProjection.apply_853$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificSafeProjection.apply(Unknown Source) at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) at org.apache.spark.util.Utils$$anon$4.next(Utils.scala:1778) at org.apache.spark.util.Utils$$anon$4.next(Utils.scala:1772) at scala.collection.convert.Wrappers$IteratorWrapper.next(Wrappers.scala:31) at org.apache.sysml.runtime.instructions.spark.utils.FrameRDDConverterUtils$DataFrameToBinaryBlockFunction.call(FrameRDDConverterUtils.java:748) at
[jira] [Commented] (SYSTEMML-1281) OOM Error On Binary Write
[ https://issues.apache.org/jira/browse/SYSTEMML-1281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15872842#comment-15872842 ] Mike Dusenberry commented on SYSTEMML-1281: --- Thanks for attempting to reproduce this. Just for clarity, in my case, I have 100GB executors with 48 cores, thus a single executor per machine. Also, are you using a DataFrame with a {{Vector}} column? I'm attempting the write now with a constrained number of cores (24). Perhaps you have a different setup? As for writing with Spark, it's not currently possible to save a DataFrame with a {{Vector}} column to CSV format (Parquet works though). As for the assumption, yes it's not hardcoded in the system, but it's been an implicit bias that has caused many problems and should be eradicated. :) > OOM Error On Binary Write > - > > Key: SYSTEMML-1281 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1281 > Project: SystemML > Issue Type: Bug >Affects Versions: SystemML 0.13 >Reporter: Mike Dusenberry >Priority: Blocker > > I'm running into the following heap space OOM error while attempting to save > a large Spark DataFrame to a SystemML binary format via DML {{write}} > statements. > Script: > {code} > tr_sample_filename = os.path.join("data", "train_{}{}.parquet".format(size, > "_grayscale" if grayscale else "")) > val_sample_filename = os.path.join("data", "val_{}{}.parquet".format(size, > "_grayscale" if grayscale else "")) > train_df = sqlContext.read.load(tr_sample_filename) > val_df = sqlContext.read.load(val_sample_filename) > train_df, val_df > # Note: Must use the row index column, or X may not > # necessarily correspond correctly to Y > X_df = train_df.select("__INDEX", "sample") > X_val_df = val_df.select("__INDEX", "sample") > y_df = train_df.select("__INDEX", "tumor_score") > y_val_df = val_df.select("__INDEX", "tumor_score") > X_df, X_val_df, y_df, y_val_df > script = """ > # Scale images to [-1,1] > X = X / 255 > X_val = X_val / 255 > X = X * 2 - 1 > X_val = X_val * 2 - 1 > # One-hot encode the labels > num_tumor_classes = 3 > n = nrow(y) > n_val = nrow(y_val) > Y = table(seq(1, n), y, n, num_tumor_classes) > Y_val = table(seq(1, n_val), y_val, n_val, num_tumor_classes) > """ > outputs = ("X", "X_val", "Y", "Y_val") > script = dml(script).input(X=X_df, X_val=X_val_df, y=y_df, > y_val=y_val_df).output(*outputs) > X, X_val, Y, Y_val = ml.execute(script).get(*outputs) > X, X_val, Y, Y_val > script = """ > write(X, "data/systemml/X_"+size+"_"+c+"_binary", format="binary") > write(Y, "data/systemml/Y_"+size+"_"+c+"_binary", format="binary") > write(X_val, "data/systemml/X_val_"+size+"_"+c+"_binary", format="binary") > write(Y_val, "data/systemml/Y_val_"+size+"_"+c+"_binary", format="binary") > """ > script = dml(script).input(X=X, X_val=X_val, Y=Y, Y_val=Y_val, size=size, c=c) > ml.execute(script) > {code} > General error: > {code} > Caused by: org.apache.sysml.api.mlcontext.MLContextException: Exception > occurred while executing runtime program > at > org.apache.sysml.api.mlcontext.ScriptExecutor.executeRuntimeProgram(ScriptExecutor.java:371) > at > org.apache.sysml.api.mlcontext.ScriptExecutor.execute(ScriptExecutor.java:292) > at org.apache.sysml.api.mlcontext.MLContext.execute(MLContext.java:293) > ... 12 more > Caused by: org.apache.sysml.runtime.DMLRuntimeException: > org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error in program > block generated from statement block between lines 1 and 11 -- Error > evaluating instruction: CP°mvvar°X°¶_Var49¶°binaryblock > at > org.apache.sysml.runtime.controlprogram.Program.execute(Program.java:130) > at > org.apache.sysml.api.mlcontext.ScriptExecutor.executeRuntimeProgram(ScriptExecutor.java:369) > ... 14 more > Caused by: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error > in program block generated from statement block between lines 1 and 11 -- > Error evaluating instruction: CP°mvvar°X°¶_Var49¶°binaryblock > at > org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:320) > at > org.apache.sysml.runtime.controlprogram.ProgramBlock.executeInstructions(ProgramBlock.java:221) > at > org.apache.sysml.runtime.controlprogram.ProgramBlock.execute(ProgramBlock.java:168) > at > org.apache.sysml.runtime.controlprogram.Program.execute(Program.java:123) > ... 15 more > Caused by: org.apache.sysml.runtime.controlprogram.caching.CacheException: > Move to data/systemml/X_256_3_binary failed. > at > org.apache.sysml.runtime.controlprogram.caching.CacheableData.moveData(CacheableData.java:1329) > at > org.apache.sysml.runtime.instructions.cp.VariableCPInstruction.processMoveInstruction(VariableCPInstruction.java:706) > at
[jira] [Commented] (SYSTEMML-1281) OOM Error On Binary Write
[ https://issues.apache.org/jira/browse/SYSTEMML-1281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15872824#comment-15872824 ] Matthias Boehm commented on SYSTEMML-1281: -- Ok, I just tried to reproduce this error with csv-binaryblock (and dataset-binaryblock) conversions of dimension 100,000 x 200,000, dense but both work fine for me. Could it be that there is some side effect (in terms of memory consumption) to the data being in parquet format? Let's either (1) write it out to csv with spark, or (2) configure spark with more head room for user space and write it to binary. Once this is done, I'd like to have a look at the data set. Btw, there is no such assumption of ~1000 columns; we aim at the general case of a wide range of matrix shapes (that's one of the reasons why we have squared blocks) but of course we optimized for typically encountered matrix shapes of tall and skinny matrices. So, yes there is room for improving the support of those kinds of wide matrices. > OOM Error On Binary Write > - > > Key: SYSTEMML-1281 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1281 > Project: SystemML > Issue Type: Bug >Affects Versions: SystemML 0.13 >Reporter: Mike Dusenberry >Priority: Blocker > > I'm running into the following heap space OOM error while attempting to save > a large Spark DataFrame to a SystemML binary format via DML {{write}} > statements. > Script: > {code} > tr_sample_filename = os.path.join("data", "train_{}{}.parquet".format(size, > "_grayscale" if grayscale else "")) > val_sample_filename = os.path.join("data", "val_{}{}.parquet".format(size, > "_grayscale" if grayscale else "")) > train_df = sqlContext.read.load(tr_sample_filename) > val_df = sqlContext.read.load(val_sample_filename) > train_df, val_df > # Note: Must use the row index column, or X may not > # necessarily correspond correctly to Y > X_df = train_df.select("__INDEX", "sample") > X_val_df = val_df.select("__INDEX", "sample") > y_df = train_df.select("__INDEX", "tumor_score") > y_val_df = val_df.select("__INDEX", "tumor_score") > X_df, X_val_df, y_df, y_val_df > script = """ > # Scale images to [-1,1] > X = X / 255 > X_val = X_val / 255 > X = X * 2 - 1 > X_val = X_val * 2 - 1 > # One-hot encode the labels > num_tumor_classes = 3 > n = nrow(y) > n_val = nrow(y_val) > Y = table(seq(1, n), y, n, num_tumor_classes) > Y_val = table(seq(1, n_val), y_val, n_val, num_tumor_classes) > """ > outputs = ("X", "X_val", "Y", "Y_val") > script = dml(script).input(X=X_df, X_val=X_val_df, y=y_df, > y_val=y_val_df).output(*outputs) > X, X_val, Y, Y_val = ml.execute(script).get(*outputs) > X, X_val, Y, Y_val > script = """ > write(X, "data/systemml/X_"+size+"_"+c+"_binary", format="binary") > write(Y, "data/systemml/Y_"+size+"_"+c+"_binary", format="binary") > write(X_val, "data/systemml/X_val_"+size+"_"+c+"_binary", format="binary") > write(Y_val, "data/systemml/Y_val_"+size+"_"+c+"_binary", format="binary") > """ > script = dml(script).input(X=X, X_val=X_val, Y=Y, Y_val=Y_val, size=size, c=c) > ml.execute(script) > {code} > General error: > {code} > Caused by: org.apache.sysml.api.mlcontext.MLContextException: Exception > occurred while executing runtime program > at > org.apache.sysml.api.mlcontext.ScriptExecutor.executeRuntimeProgram(ScriptExecutor.java:371) > at > org.apache.sysml.api.mlcontext.ScriptExecutor.execute(ScriptExecutor.java:292) > at org.apache.sysml.api.mlcontext.MLContext.execute(MLContext.java:293) > ... 12 more > Caused by: org.apache.sysml.runtime.DMLRuntimeException: > org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error in program > block generated from statement block between lines 1 and 11 -- Error > evaluating instruction: CP°mvvar°X°¶_Var49¶°binaryblock > at > org.apache.sysml.runtime.controlprogram.Program.execute(Program.java:130) > at > org.apache.sysml.api.mlcontext.ScriptExecutor.executeRuntimeProgram(ScriptExecutor.java:369) > ... 14 more > Caused by: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error > in program block generated from statement block between lines 1 and 11 -- > Error evaluating instruction: CP°mvvar°X°¶_Var49¶°binaryblock > at > org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:320) > at > org.apache.sysml.runtime.controlprogram.ProgramBlock.executeInstructions(ProgramBlock.java:221) > at > org.apache.sysml.runtime.controlprogram.ProgramBlock.execute(ProgramBlock.java:168) > at > org.apache.sysml.runtime.controlprogram.Program.execute(Program.java:123) > ... 15 more > Caused by: org.apache.sysml.runtime.controlprogram.caching.CacheException: > Move to data/systemml/X_256_3_binary failed. > at >
[jira] [Commented] (SYSTEMML-1281) OOM Error On Binary Write
[ https://issues.apache.org/jira/browse/SYSTEMML-1281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15872784#comment-15872784 ] Mike Dusenberry commented on SYSTEMML-1281: --- Writing directly to CSV from the DataFrame inputs also failed. {code} script = """ write(X, "data/systemml/X_"+size+"_"+c+".csv", format="csv") write(Y, "data/systemml/Y_"+size+"_"+c+".csv", format="csv") write(X_val, "data/systemml/X_val_"+size+"_"+c+".csv", format="csv") write(Y_val, "data/systemml/Y_val_"+size+"_"+c+".csv", format="csv") """ script = dml(script).input(X=X_df, X_val=X_val_df, Y=y_df, Y_val=y_val_df, size=size, c=c) ml.execute(script) {code} > OOM Error On Binary Write > - > > Key: SYSTEMML-1281 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1281 > Project: SystemML > Issue Type: Bug >Affects Versions: SystemML 0.13 >Reporter: Mike Dusenberry >Priority: Blocker > > I'm running into the following heap space OOM error while attempting to save > a large Spark DataFrame to a SystemML binary format via DML {{write}} > statements. > Script: > {code} > tr_sample_filename = os.path.join("data", "train_{}{}.parquet".format(size, > "_grayscale" if grayscale else "")) > val_sample_filename = os.path.join("data", "val_{}{}.parquet".format(size, > "_grayscale" if grayscale else "")) > train_df = sqlContext.read.load(tr_sample_filename) > val_df = sqlContext.read.load(val_sample_filename) > train_df, val_df > # Note: Must use the row index column, or X may not > # necessarily correspond correctly to Y > X_df = train_df.select("__INDEX", "sample") > X_val_df = val_df.select("__INDEX", "sample") > y_df = train_df.select("__INDEX", "tumor_score") > y_val_df = val_df.select("__INDEX", "tumor_score") > X_df, X_val_df, y_df, y_val_df > script = """ > # Scale images to [-1,1] > X = X / 255 > X_val = X_val / 255 > X = X * 2 - 1 > X_val = X_val * 2 - 1 > # One-hot encode the labels > num_tumor_classes = 3 > n = nrow(y) > n_val = nrow(y_val) > Y = table(seq(1, n), y, n, num_tumor_classes) > Y_val = table(seq(1, n_val), y_val, n_val, num_tumor_classes) > """ > outputs = ("X", "X_val", "Y", "Y_val") > script = dml(script).input(X=X_df, X_val=X_val_df, y=y_df, > y_val=y_val_df).output(*outputs) > X, X_val, Y, Y_val = ml.execute(script).get(*outputs) > X, X_val, Y, Y_val > script = """ > write(X, "data/systemml/X_"+size+"_"+c+"_binary", format="binary") > write(Y, "data/systemml/Y_"+size+"_"+c+"_binary", format="binary") > write(X_val, "data/systemml/X_val_"+size+"_"+c+"_binary", format="binary") > write(Y_val, "data/systemml/Y_val_"+size+"_"+c+"_binary", format="binary") > """ > script = dml(script).input(X=X, X_val=X_val, Y=Y, Y_val=Y_val, size=size, c=c) > ml.execute(script) > {code} > General error: > {code} > Caused by: org.apache.sysml.api.mlcontext.MLContextException: Exception > occurred while executing runtime program > at > org.apache.sysml.api.mlcontext.ScriptExecutor.executeRuntimeProgram(ScriptExecutor.java:371) > at > org.apache.sysml.api.mlcontext.ScriptExecutor.execute(ScriptExecutor.java:292) > at org.apache.sysml.api.mlcontext.MLContext.execute(MLContext.java:293) > ... 12 more > Caused by: org.apache.sysml.runtime.DMLRuntimeException: > org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error in program > block generated from statement block between lines 1 and 11 -- Error > evaluating instruction: CP°mvvar°X°¶_Var49¶°binaryblock > at > org.apache.sysml.runtime.controlprogram.Program.execute(Program.java:130) > at > org.apache.sysml.api.mlcontext.ScriptExecutor.executeRuntimeProgram(ScriptExecutor.java:369) > ... 14 more > Caused by: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error > in program block generated from statement block between lines 1 and 11 -- > Error evaluating instruction: CP°mvvar°X°¶_Var49¶°binaryblock > at > org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:320) > at > org.apache.sysml.runtime.controlprogram.ProgramBlock.executeInstructions(ProgramBlock.java:221) > at > org.apache.sysml.runtime.controlprogram.ProgramBlock.execute(ProgramBlock.java:168) > at > org.apache.sysml.runtime.controlprogram.Program.execute(Program.java:123) > ... 15 more > Caused by: org.apache.sysml.runtime.controlprogram.caching.CacheException: > Move to data/systemml/X_256_3_binary failed. > at > org.apache.sysml.runtime.controlprogram.caching.CacheableData.moveData(CacheableData.java:1329) > at > org.apache.sysml.runtime.instructions.cp.VariableCPInstruction.processMoveInstruction(VariableCPInstruction.java:706) > at > org.apache.sysml.runtime.instructions.cp.VariableCPInstruction.processInstruction(VariableCPInstruction.java:511) > at >
[jira] [Commented] (SYSTEMML-1238) Python test failing for LinearRegCG
[ https://issues.apache.org/jira/browse/SYSTEMML-1238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15872777#comment-15872777 ] Niketan Pansare commented on SYSTEMML-1238: --- Thanks Imran :) > Python test failing for LinearRegCG > --- > > Key: SYSTEMML-1238 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1238 > Project: SystemML > Issue Type: Bug > Components: Algorithms, APIs >Affects Versions: SystemML 0.13 >Reporter: Imran Younus >Assignee: Niketan Pansare > Fix For: SystemML 0.13 > > Attachments: python_LinearReg_test_spark.1.6.log, > python_LinearReg_test_spark.2.1.log > > > [~deron] discovered that the one of the python test ({{test_mllearn_df.py}}) > with spark 2.1.0 was failing because the test score from linear regression > was very low ({{~ 0.24}}). I did a some investigation and it turns out the > the model parameters computed by the dml script are incorrect. In > systemml.12, the values of betas from linear regression model are > {{\[152.919, 938.237\]}}. This is what we expect from normal equation. (I > also tested this with sklearn). But the values of betas from systemml.13 > (with spark 2.1.0) come out to be {{\[153.146, 458.489\]}}. These are not > correct and therefore the test score is much lower than expected. The data > going into DML script is correct. I printed out the valued of {{X}} and {{Y}} > in dml and I didn't see any issue there. > Attached are the log files for two different tests (systemml0.12 and 0.13) > with explain flag. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (SYSTEMML-1242) Perftest: OutOfMemoryError in MultiLogReg for 80g sparse
[ https://issues.apache.org/jira/browse/SYSTEMML-1242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15872752#comment-15872752 ] Imran Younus commented on SYSTEMML-1242: [~mboehm7] I ran this test again after your fix, and it completed successfully. Should we close this jira now? > Perftest: OutOfMemoryError in MultiLogReg for 80g sparse > > > Key: SYSTEMML-1242 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1242 > Project: SystemML > Issue Type: Bug > Components: Test >Affects Versions: SystemML 0.11, SystemML 0.12 > Environment: spark 2.1.0 >Reporter: Imran Younus >Assignee: Matthias Boehm > Fix For: SystemML 0.13 > > Attachments: sparkDML.sh > > > when running {{runMultiLogReg.sh}} script, {{MultiLogReg.dml}} ends with > OutOfMemory error for the case of 10M_1K sparse data and {{icpt = 1}}. Here > is the end of the log file: > {code} > 17/02/04 17:20:33 INFO api.DMLScript: SystemML Statistics: > Total elapsed time: 697.694 sec. > Total compilation time: 2.543 sec. > Total execution time: 695.151 sec. > Number of compiled Spark inst:73. > Number of executed Spark inst:16. > Cache hits (Mem, WB, FS, HDFS): 46/9/1/7. > Cache writes (WB, FS, HDFS): 27/1/1. > Cache times (ACQr/m, RLS, EXP): 281.541/0.003/131.589/48.737 sec. > HOP DAGs recompiled (PRED, SB): 0/15. > HOP DAGs recompile time: 0.067 sec. > Spark ctx create time (lazy): 31.078 sec. > Spark trans counts (par,bc,col):5/4/0. > Spark trans times (par,bc,col): 46.748/0.392/0.000 secs. > Total JIT compile time: 151.254 sec. > Total JVM GC count: 144. > Total JVM GC time:220.671 sec. > Heavy hitter instructions (name, time, count): > -- 1) ba+*144.194 sec 3 > -- 2) rand109.939 sec 9 > -- 3) uark+ 105.011 sec 2 > -- 4) r' 100.933 sec 3 > -- 5) sp_/80.387 sec 1 > -- 6) sp_mapmm45.491 sec 2 > -- 7) sp_tak+*40.655 sec 1 > -- 8) append 9.480 sec 1 > -- 9) rangeReIndex7.347 sec 2 > -- 10)sp_-6.392 sec 3 > 17/02/04 17:20:33 INFO api.DMLScript: END DML run 02/04/2017 17:20:33 > Exception in thread "main" java.lang.OutOfMemoryError: Java heap space > at > org.apache.sysml.runtime.matrix.data.MatrixBlock.allocateDenseBlock(MatrixBlock.java:363) > at > org.apache.sysml.runtime.matrix.data.MatrixBlock.allocateDenseBlock(MatrixBlock.java:339) > at > org.apache.sysml.runtime.matrix.data.MatrixBlock.allocateDenseBlockUnsafe(MatrixBlock.java:408) > at > org.apache.sysml.runtime.io.MatrixReader.createOutputMatrixBlock(MatrixReader.java:107) > at > org.apache.sysml.runtime.io.ReaderBinaryBlockParallel.readMatrixFromHDFS(ReaderBinaryBlockParallel.java:59) > at > org.apache.sysml.runtime.util.DataConverter.readMatrixFromHDFS(DataConverter.java:203) > at > org.apache.sysml.runtime.util.DataConverter.readMatrixFromHDFS(DataConverter.java:168) > at > org.apache.sysml.runtime.controlprogram.caching.MatrixObject.readBlobFromHDFS(MatrixObject.java:425) > at > org.apache.sysml.runtime.controlprogram.caching.MatrixObject.readBlobFromHDFS(MatrixObject.java:60) > at > org.apache.sysml.runtime.controlprogram.caching.CacheableData.readBlobFromHDFS(CacheableData.java:920) > at > org.apache.sysml.runtime.controlprogram.caching.MatrixObject.readBlobFromRDD(MatrixObject.java:478) > at > org.apache.sysml.runtime.controlprogram.caching.MatrixObject.readBlobFromRDD(MatrixObject.java:60) > at > org.apache.sysml.runtime.controlprogram.caching.CacheableData.acquireRead(CacheableData.java:411) > at > org.apache.sysml.runtime.controlprogram.context.ExecutionContext.getMatrixInput(ExecutionContext.java:209) > at > org.apache.sysml.runtime.instructions.cp.AggregateBinaryCPInstruction.processInstruction(AggregateBinaryCPInstruction.java:74) > at > org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:290) > at > org.apache.sysml.runtime.controlprogram.ProgramBlock.executeInstructions(ProgramBlock.java:221) > at > org.apache.sysml.runtime.controlprogram.ProgramBlock.execute(ProgramBlock.java:168) > at > org.apache.sysml.runtime.controlprogram.IfProgramBlock.execute(IfProgramBlock.java:139) > at > org.apache.sysml.runtime.controlprogram.WhileProgramBlock.execute(WhileProgramBlock.java:165) > at > org.apache.sysml.runtime.controlprogram.Program.execute(Program.java:123) > at org.apache.sysml.api.DMLScript.execute(DMLScript.java:684) > at
[jira] [Commented] (SYSTEMML-1281) OOM Error On Binary Write
[ https://issues.apache.org/jira/browse/SYSTEMML-1281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15872746#comment-15872746 ] Mike Dusenberry commented on SYSTEMML-1281: --- I was attempting to write after the matrix transformations. In general, this is yet another case that underscores the need to improve our engine by removing the assumption of ~1000-column matrices, and instead assume matrices of any number of rows/columns. Challenging, of course, but important if we want to truly support scalable ML. > OOM Error On Binary Write > - > > Key: SYSTEMML-1281 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1281 > Project: SystemML > Issue Type: Bug >Affects Versions: SystemML 0.13 >Reporter: Mike Dusenberry >Priority: Blocker > > I'm running into the following heap space OOM error while attempting to save > a large Spark DataFrame to a SystemML binary format via DML {{write}} > statements. > Script: > {code} > tr_sample_filename = os.path.join("data", "train_{}{}.parquet".format(size, > "_grayscale" if grayscale else "")) > val_sample_filename = os.path.join("data", "val_{}{}.parquet".format(size, > "_grayscale" if grayscale else "")) > train_df = sqlContext.read.load(tr_sample_filename) > val_df = sqlContext.read.load(val_sample_filename) > train_df, val_df > # Note: Must use the row index column, or X may not > # necessarily correspond correctly to Y > X_df = train_df.select("__INDEX", "sample") > X_val_df = val_df.select("__INDEX", "sample") > y_df = train_df.select("__INDEX", "tumor_score") > y_val_df = val_df.select("__INDEX", "tumor_score") > X_df, X_val_df, y_df, y_val_df > script = """ > # Scale images to [-1,1] > X = X / 255 > X_val = X_val / 255 > X = X * 2 - 1 > X_val = X_val * 2 - 1 > # One-hot encode the labels > num_tumor_classes = 3 > n = nrow(y) > n_val = nrow(y_val) > Y = table(seq(1, n), y, n, num_tumor_classes) > Y_val = table(seq(1, n_val), y_val, n_val, num_tumor_classes) > """ > outputs = ("X", "X_val", "Y", "Y_val") > script = dml(script).input(X=X_df, X_val=X_val_df, y=y_df, > y_val=y_val_df).output(*outputs) > X, X_val, Y, Y_val = ml.execute(script).get(*outputs) > X, X_val, Y, Y_val > script = """ > write(X, "data/systemml/X_"+size+"_"+c+"_binary", format="binary") > write(Y, "data/systemml/Y_"+size+"_"+c+"_binary", format="binary") > write(X_val, "data/systemml/X_val_"+size+"_"+c+"_binary", format="binary") > write(Y_val, "data/systemml/Y_val_"+size+"_"+c+"_binary", format="binary") > """ > script = dml(script).input(X=X, X_val=X_val, Y=Y, Y_val=Y_val, size=size, c=c) > ml.execute(script) > {code} > General error: > {code} > Caused by: org.apache.sysml.api.mlcontext.MLContextException: Exception > occurred while executing runtime program > at > org.apache.sysml.api.mlcontext.ScriptExecutor.executeRuntimeProgram(ScriptExecutor.java:371) > at > org.apache.sysml.api.mlcontext.ScriptExecutor.execute(ScriptExecutor.java:292) > at org.apache.sysml.api.mlcontext.MLContext.execute(MLContext.java:293) > ... 12 more > Caused by: org.apache.sysml.runtime.DMLRuntimeException: > org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error in program > block generated from statement block between lines 1 and 11 -- Error > evaluating instruction: CP°mvvar°X°¶_Var49¶°binaryblock > at > org.apache.sysml.runtime.controlprogram.Program.execute(Program.java:130) > at > org.apache.sysml.api.mlcontext.ScriptExecutor.executeRuntimeProgram(ScriptExecutor.java:369) > ... 14 more > Caused by: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error > in program block generated from statement block between lines 1 and 11 -- > Error evaluating instruction: CP°mvvar°X°¶_Var49¶°binaryblock > at > org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:320) > at > org.apache.sysml.runtime.controlprogram.ProgramBlock.executeInstructions(ProgramBlock.java:221) > at > org.apache.sysml.runtime.controlprogram.ProgramBlock.execute(ProgramBlock.java:168) > at > org.apache.sysml.runtime.controlprogram.Program.execute(Program.java:123) > ... 15 more > Caused by: org.apache.sysml.runtime.controlprogram.caching.CacheException: > Move to data/systemml/X_256_3_binary failed. > at > org.apache.sysml.runtime.controlprogram.caching.CacheableData.moveData(CacheableData.java:1329) > at > org.apache.sysml.runtime.instructions.cp.VariableCPInstruction.processMoveInstruction(VariableCPInstruction.java:706) > at > org.apache.sysml.runtime.instructions.cp.VariableCPInstruction.processInstruction(VariableCPInstruction.java:511) > at > org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:290) > ... 18 more > Caused by:
[jira] [Resolved] (SYSTEMML-1277) DataFrames With `mllib.Vector` Columns Are No Longer Converted to Matrices.
[ https://issues.apache.org/jira/browse/SYSTEMML-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Dusenberry resolved SYSTEMML-1277. --- Resolution: Fixed Fix Version/s: SystemML 0.13 This fixed my my real-world case. Thanks, [~deron]! > DataFrames With `mllib.Vector` Columns Are No Longer Converted to Matrices. > --- > > Key: SYSTEMML-1277 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1277 > Project: SystemML > Issue Type: Bug >Affects Versions: SystemML 0.13 >Reporter: Mike Dusenberry >Assignee: Deron Eriksson >Priority: Blocker > Fix For: SystemML 0.13 > > > Recently, we made the switch from the old {{mllib.Vector}} to the new > {{ml.Vector}} type. Unfortunately, this leaves us with the issue of no > longer recognizing DataFrames with {{mllib.Vector}} columns during > conversion, and thus, we (1) do not correctly convert to SystemML {{Matrix}} > objects, (2) instead fall back on conversion to {{Frame}} objects, and then > (3) fail completely when the ensuing DML script is expecting to operated on > matrices. > Given a Spark {{DataFrame}} {{X_df}} of type {{DataFrame\[__INDEX: int, > sample: vector\]}}, where {{vector}} is of type {{mllib.Vector}}, the > following script will now fail (did not previously): > {code} > script = """ > # Scale images to [-1,1] > X = X / 255 > X = X * 2 - 1 > """ > outputs = ("X") > script = dml(script).input(X=X_df).output(*outputs) > X = ml.execute(script).get(*outputs) > X > {code} > {code} > Caused by: org.apache.sysml.api.mlcontext.MLContextException: Exception > occurred while validating script > at > org.apache.sysml.api.mlcontext.ScriptExecutor.validateScript(ScriptExecutor.java:487) > at > org.apache.sysml.api.mlcontext.ScriptExecutor.execute(ScriptExecutor.java:280) > at org.apache.sysml.api.mlcontext.MLContext.execute(MLContext.java:293) > ... 12 more > Caused by: org.apache.sysml.parser.LanguageException: Invalid Parameters : > ERROR: null -- line 4, column 4 -- Invalid Datatypes for operation FRAME > SCALAR > at > org.apache.sysml.parser.Expression.raiseValidateError(Expression.java:549) > at > org.apache.sysml.parser.Expression.computeDataType(Expression.java:415) > at > org.apache.sysml.parser.Expression.computeDataType(Expression.java:386) > at > org.apache.sysml.parser.BinaryExpression.validateExpression(BinaryExpression.java:130) > at > org.apache.sysml.parser.StatementBlock.validate(StatementBlock.java:567) > at > org.apache.sysml.parser.DMLTranslator.validateParseTree(DMLTranslator.java:140) > at > org.apache.sysml.api.mlcontext.ScriptExecutor.validateScript(ScriptExecutor.java:485) > ... 14 more > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Closed] (SYSTEMML-1277) DataFrames With `mllib.Vector` Columns Are No Longer Converted to Matrices.
[ https://issues.apache.org/jira/browse/SYSTEMML-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Dusenberry closed SYSTEMML-1277. - > DataFrames With `mllib.Vector` Columns Are No Longer Converted to Matrices. > --- > > Key: SYSTEMML-1277 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1277 > Project: SystemML > Issue Type: Bug >Affects Versions: SystemML 0.13 >Reporter: Mike Dusenberry >Assignee: Deron Eriksson >Priority: Blocker > Fix For: SystemML 0.13 > > > Recently, we made the switch from the old {{mllib.Vector}} to the new > {{ml.Vector}} type. Unfortunately, this leaves us with the issue of no > longer recognizing DataFrames with {{mllib.Vector}} columns during > conversion, and thus, we (1) do not correctly convert to SystemML {{Matrix}} > objects, (2) instead fall back on conversion to {{Frame}} objects, and then > (3) fail completely when the ensuing DML script is expecting to operated on > matrices. > Given a Spark {{DataFrame}} {{X_df}} of type {{DataFrame\[__INDEX: int, > sample: vector\]}}, where {{vector}} is of type {{mllib.Vector}}, the > following script will now fail (did not previously): > {code} > script = """ > # Scale images to [-1,1] > X = X / 255 > X = X * 2 - 1 > """ > outputs = ("X") > script = dml(script).input(X=X_df).output(*outputs) > X = ml.execute(script).get(*outputs) > X > {code} > {code} > Caused by: org.apache.sysml.api.mlcontext.MLContextException: Exception > occurred while validating script > at > org.apache.sysml.api.mlcontext.ScriptExecutor.validateScript(ScriptExecutor.java:487) > at > org.apache.sysml.api.mlcontext.ScriptExecutor.execute(ScriptExecutor.java:280) > at org.apache.sysml.api.mlcontext.MLContext.execute(MLContext.java:293) > ... 12 more > Caused by: org.apache.sysml.parser.LanguageException: Invalid Parameters : > ERROR: null -- line 4, column 4 -- Invalid Datatypes for operation FRAME > SCALAR > at > org.apache.sysml.parser.Expression.raiseValidateError(Expression.java:549) > at > org.apache.sysml.parser.Expression.computeDataType(Expression.java:415) > at > org.apache.sysml.parser.Expression.computeDataType(Expression.java:386) > at > org.apache.sysml.parser.BinaryExpression.validateExpression(BinaryExpression.java:130) > at > org.apache.sysml.parser.StatementBlock.validate(StatementBlock.java:567) > at > org.apache.sysml.parser.DMLTranslator.validateParseTree(DMLTranslator.java:140) > at > org.apache.sysml.api.mlcontext.ScriptExecutor.validateScript(ScriptExecutor.java:485) > ... 14 more > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (SYSTEMML-1238) Python test failing for LinearRegCG
[ https://issues.apache.org/jira/browse/SYSTEMML-1238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Niketan Pansare resolved SYSTEMML-1238. --- Resolution: Fixed Fix Version/s: SystemML 0.13 Fixed in the commit https://github.com/apache/incubator-systemml/commit/9d0087cbbd250c9b486923555b450602f816cf19 by setting regularization to 0 (similar to that of scikit-learn). > Python test failing for LinearRegCG > --- > > Key: SYSTEMML-1238 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1238 > Project: SystemML > Issue Type: Bug > Components: Algorithms, APIs >Affects Versions: SystemML 0.13 >Reporter: Imran Younus >Assignee: Niketan Pansare > Fix For: SystemML 0.13 > > Attachments: python_LinearReg_test_spark.1.6.log, > python_LinearReg_test_spark.2.1.log > > > [~deron] discovered that the one of the python test ({{test_mllearn_df.py}}) > with spark 2.1.0 was failing because the test score from linear regression > was very low ({{~ 0.24}}). I did a some investigation and it turns out the > the model parameters computed by the dml script are incorrect. In > systemml.12, the values of betas from linear regression model are > {{\[152.919, 938.237\]}}. This is what we expect from normal equation. (I > also tested this with sklearn). But the values of betas from systemml.13 > (with spark 2.1.0) come out to be {{\[153.146, 458.489\]}}. These are not > correct and therefore the test score is much lower than expected. The data > going into DML script is correct. I printed out the valued of {{X}} and {{Y}} > in dml and I didn't see any issue there. > Attached are the log files for two different tests (systemml0.12 and 0.13) > with explain flag. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (SYSTEMML-1252) Performance stratstats script
[ https://issues.apache.org/jira/browse/SYSTEMML-1252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Boehm resolved SYSTEMML-1252. -- Resolution: Done Fix Version/s: SystemML 0.13 > Performance stratstats script > -- > > Key: SYSTEMML-1252 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1252 > Project: SystemML > Issue Type: Task >Reporter: Matthias Boehm >Assignee: Matthias Boehm > Fix For: SystemML 0.13 > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (SYSTEMML-1255) New fused operator tack+* in CP and Spark
[ https://issues.apache.org/jira/browse/SYSTEMML-1255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Boehm resolved SYSTEMML-1255. -- Resolution: Done Assignee: Matthias Boehm Fix Version/s: SystemML 0.13 > New fused operator tack+* in CP and Spark > - > > Key: SYSTEMML-1255 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1255 > Project: SystemML > Issue Type: Sub-task > Components: Compiler >Reporter: Matthias Boehm >Assignee: Matthias Boehm > Fix For: SystemML 0.13 > > > Similar to the existing tak+* operator, this new tack+* operator fused two or > three binary multiply operations and final column-wise aggregation > colSums(X*Y*Z) in order to avoid materializing the intermediates which is > very expensive compared to the cheap multiply and sum operations. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (SYSTEMML-1281) OOM Error On Binary Write
[ https://issues.apache.org/jira/browse/SYSTEMML-1281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15872653#comment-15872653 ] Matthias Boehm commented on SYSTEMML-1281: -- Did you try to write the ORIGINAL dataset to csv or after the matrix transformations? The latter can't help because we would convert to binary block for these operations and it is failing from dataset-binary block conversion. > OOM Error On Binary Write > - > > Key: SYSTEMML-1281 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1281 > Project: SystemML > Issue Type: Bug >Affects Versions: SystemML 0.13 >Reporter: Mike Dusenberry >Priority: Blocker > > I'm running into the following heap space OOM error while attempting to save > a large Spark DataFrame to a SystemML binary format via DML {{write}} > statements. > Script: > {code} > tr_sample_filename = os.path.join("data", "train_{}{}.parquet".format(size, > "_grayscale" if grayscale else "")) > val_sample_filename = os.path.join("data", "val_{}{}.parquet".format(size, > "_grayscale" if grayscale else "")) > train_df = sqlContext.read.load(tr_sample_filename) > val_df = sqlContext.read.load(val_sample_filename) > train_df, val_df > # Note: Must use the row index column, or X may not > # necessarily correspond correctly to Y > X_df = train_df.select("__INDEX", "sample") > X_val_df = val_df.select("__INDEX", "sample") > y_df = train_df.select("__INDEX", "tumor_score") > y_val_df = val_df.select("__INDEX", "tumor_score") > X_df, X_val_df, y_df, y_val_df > script = """ > # Scale images to [-1,1] > X = X / 255 > X_val = X_val / 255 > X = X * 2 - 1 > X_val = X_val * 2 - 1 > # One-hot encode the labels > num_tumor_classes = 3 > n = nrow(y) > n_val = nrow(y_val) > Y = table(seq(1, n), y, n, num_tumor_classes) > Y_val = table(seq(1, n_val), y_val, n_val, num_tumor_classes) > """ > outputs = ("X", "X_val", "Y", "Y_val") > script = dml(script).input(X=X_df, X_val=X_val_df, y=y_df, > y_val=y_val_df).output(*outputs) > X, X_val, Y, Y_val = ml.execute(script).get(*outputs) > X, X_val, Y, Y_val > script = """ > write(X, "data/systemml/X_"+size+"_"+c+"_binary", format="binary") > write(Y, "data/systemml/Y_"+size+"_"+c+"_binary", format="binary") > write(X_val, "data/systemml/X_val_"+size+"_"+c+"_binary", format="binary") > write(Y_val, "data/systemml/Y_val_"+size+"_"+c+"_binary", format="binary") > """ > script = dml(script).input(X=X, X_val=X_val, Y=Y, Y_val=Y_val, size=size, c=c) > ml.execute(script) > {code} > General error: > {code} > Caused by: org.apache.sysml.api.mlcontext.MLContextException: Exception > occurred while executing runtime program > at > org.apache.sysml.api.mlcontext.ScriptExecutor.executeRuntimeProgram(ScriptExecutor.java:371) > at > org.apache.sysml.api.mlcontext.ScriptExecutor.execute(ScriptExecutor.java:292) > at org.apache.sysml.api.mlcontext.MLContext.execute(MLContext.java:293) > ... 12 more > Caused by: org.apache.sysml.runtime.DMLRuntimeException: > org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error in program > block generated from statement block between lines 1 and 11 -- Error > evaluating instruction: CP°mvvar°X°¶_Var49¶°binaryblock > at > org.apache.sysml.runtime.controlprogram.Program.execute(Program.java:130) > at > org.apache.sysml.api.mlcontext.ScriptExecutor.executeRuntimeProgram(ScriptExecutor.java:369) > ... 14 more > Caused by: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error > in program block generated from statement block between lines 1 and 11 -- > Error evaluating instruction: CP°mvvar°X°¶_Var49¶°binaryblock > at > org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:320) > at > org.apache.sysml.runtime.controlprogram.ProgramBlock.executeInstructions(ProgramBlock.java:221) > at > org.apache.sysml.runtime.controlprogram.ProgramBlock.execute(ProgramBlock.java:168) > at > org.apache.sysml.runtime.controlprogram.Program.execute(Program.java:123) > ... 15 more > Caused by: org.apache.sysml.runtime.controlprogram.caching.CacheException: > Move to data/systemml/X_256_3_binary failed. > at > org.apache.sysml.runtime.controlprogram.caching.CacheableData.moveData(CacheableData.java:1329) > at > org.apache.sysml.runtime.instructions.cp.VariableCPInstruction.processMoveInstruction(VariableCPInstruction.java:706) > at > org.apache.sysml.runtime.instructions.cp.VariableCPInstruction.processInstruction(VariableCPInstruction.java:511) > at > org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:290) > ... 18 more > Caused by: org.apache.sysml.runtime.controlprogram.caching.CacheException: > Export to data/systemml/X_256_3_binary failed. >
[jira] [Commented] (SYSTEMML-1281) OOM Error On Binary Write
[ https://issues.apache.org/jira/browse/SYSTEMML-1281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15872641#comment-15872641 ] Mike Dusenberry commented on SYSTEMML-1281: --- Yeah I tried to write it to CSV using DML, but still ran into the OOM error. I can try writing to CSV with Spark directly. > OOM Error On Binary Write > - > > Key: SYSTEMML-1281 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1281 > Project: SystemML > Issue Type: Bug >Affects Versions: SystemML 0.13 >Reporter: Mike Dusenberry >Priority: Blocker > > I'm running into the following heap space OOM error while attempting to save > a large Spark DataFrame to a SystemML binary format via DML {{write}} > statements. > Script: > {code} > tr_sample_filename = os.path.join("data", "train_{}{}.parquet".format(size, > "_grayscale" if grayscale else "")) > val_sample_filename = os.path.join("data", "val_{}{}.parquet".format(size, > "_grayscale" if grayscale else "")) > train_df = sqlContext.read.load(tr_sample_filename) > val_df = sqlContext.read.load(val_sample_filename) > train_df, val_df > # Note: Must use the row index column, or X may not > # necessarily correspond correctly to Y > X_df = train_df.select("__INDEX", "sample") > X_val_df = val_df.select("__INDEX", "sample") > y_df = train_df.select("__INDEX", "tumor_score") > y_val_df = val_df.select("__INDEX", "tumor_score") > X_df, X_val_df, y_df, y_val_df > script = """ > # Scale images to [-1,1] > X = X / 255 > X_val = X_val / 255 > X = X * 2 - 1 > X_val = X_val * 2 - 1 > # One-hot encode the labels > num_tumor_classes = 3 > n = nrow(y) > n_val = nrow(y_val) > Y = table(seq(1, n), y, n, num_tumor_classes) > Y_val = table(seq(1, n_val), y_val, n_val, num_tumor_classes) > """ > outputs = ("X", "X_val", "Y", "Y_val") > script = dml(script).input(X=X_df, X_val=X_val_df, y=y_df, > y_val=y_val_df).output(*outputs) > X, X_val, Y, Y_val = ml.execute(script).get(*outputs) > X, X_val, Y, Y_val > script = """ > write(X, "data/systemml/X_"+size+"_"+c+"_binary", format="binary") > write(Y, "data/systemml/Y_"+size+"_"+c+"_binary", format="binary") > write(X_val, "data/systemml/X_val_"+size+"_"+c+"_binary", format="binary") > write(Y_val, "data/systemml/Y_val_"+size+"_"+c+"_binary", format="binary") > """ > script = dml(script).input(X=X, X_val=X_val, Y=Y, Y_val=Y_val, size=size, c=c) > ml.execute(script) > {code} > General error: > {code} > Caused by: org.apache.sysml.api.mlcontext.MLContextException: Exception > occurred while executing runtime program > at > org.apache.sysml.api.mlcontext.ScriptExecutor.executeRuntimeProgram(ScriptExecutor.java:371) > at > org.apache.sysml.api.mlcontext.ScriptExecutor.execute(ScriptExecutor.java:292) > at org.apache.sysml.api.mlcontext.MLContext.execute(MLContext.java:293) > ... 12 more > Caused by: org.apache.sysml.runtime.DMLRuntimeException: > org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error in program > block generated from statement block between lines 1 and 11 -- Error > evaluating instruction: CP°mvvar°X°¶_Var49¶°binaryblock > at > org.apache.sysml.runtime.controlprogram.Program.execute(Program.java:130) > at > org.apache.sysml.api.mlcontext.ScriptExecutor.executeRuntimeProgram(ScriptExecutor.java:369) > ... 14 more > Caused by: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error > in program block generated from statement block between lines 1 and 11 -- > Error evaluating instruction: CP°mvvar°X°¶_Var49¶°binaryblock > at > org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:320) > at > org.apache.sysml.runtime.controlprogram.ProgramBlock.executeInstructions(ProgramBlock.java:221) > at > org.apache.sysml.runtime.controlprogram.ProgramBlock.execute(ProgramBlock.java:168) > at > org.apache.sysml.runtime.controlprogram.Program.execute(Program.java:123) > ... 15 more > Caused by: org.apache.sysml.runtime.controlprogram.caching.CacheException: > Move to data/systemml/X_256_3_binary failed. > at > org.apache.sysml.runtime.controlprogram.caching.CacheableData.moveData(CacheableData.java:1329) > at > org.apache.sysml.runtime.instructions.cp.VariableCPInstruction.processMoveInstruction(VariableCPInstruction.java:706) > at > org.apache.sysml.runtime.instructions.cp.VariableCPInstruction.processInstruction(VariableCPInstruction.java:511) > at > org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:290) > ... 18 more > Caused by: org.apache.sysml.runtime.controlprogram.caching.CacheException: > Export to data/systemml/X_256_3_binary failed. > at >
[jira] [Commented] (SYSTEMML-1211) Verify dependencies for Spark 2
[ https://issues.apache.org/jira/browse/SYSTEMML-1211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15872614#comment-15872614 ] Glenn Weidner commented on SYSTEMML-1211: - Yes [~mboehm7] - I also can reproduce the test failures on my system and will look into updating the hadoop_bin_windows. > Verify dependencies for Spark 2 > --- > > Key: SYSTEMML-1211 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1211 > Project: SystemML > Issue Type: Sub-task > Components: Build >Reporter: Deron Eriksson >Assignee: Deron Eriksson > > With the migration to Spark 2, we should verify that the artifact assemblies > are properly handling all dependencies. > Also, we should verify that that artifact licenses properly include all > dependencies following the Spark 2 migration. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (SYSTEMML-1211) Verify dependencies for Spark 2
[ https://issues.apache.org/jira/browse/SYSTEMML-1211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15872463#comment-15872463 ] Deron Eriksson commented on SYSTEMML-1211: -- License for standalone jar artifact updated by https://github.com/apache/incubator-systemml/commit/184e02dac008ff1aa524b32455d0cb391d7cb484 > Verify dependencies for Spark 2 > --- > > Key: SYSTEMML-1211 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1211 > Project: SystemML > Issue Type: Sub-task > Components: Build >Reporter: Deron Eriksson >Assignee: Deron Eriksson > > With the migration to Spark 2, we should verify that the artifact assemblies > are properly handling all dependencies. > Also, we should verify that that artifact licenses properly include all > dependencies following the Spark 2 migration. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (SYSTEMML-1281) OOM Error On Binary Write
[ https://issues.apache.org/jira/browse/SYSTEMML-1281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15872451#comment-15872451 ] Mike Dusenberry commented on SYSTEMML-1281: --- cc [~fschueler], [~acs_s], [~nakul02], [~niketanpansare], [~mboehm7], [~reinw...@us.ibm.com] > OOM Error On Binary Write > - > > Key: SYSTEMML-1281 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1281 > Project: SystemML > Issue Type: Bug >Affects Versions: SystemML 0.13 >Reporter: Mike Dusenberry >Priority: Blocker > > I'm running into the following heap space OOM error while attempting to save > a large Spark DataFrame to a SystemML binary format via DML {{write}} > statements. > Script: > {code} > tr_sample_filename = os.path.join("data", "train_{}{}.parquet".format(size, > "_grayscale" if grayscale else "")) > val_sample_filename = os.path.join("data", "val_{}{}.parquet".format(size, > "_grayscale" if grayscale else "")) > train_df = sqlContext.read.load(tr_sample_filename) > val_df = sqlContext.read.load(val_sample_filename) > train_df, val_df > # Note: Must use the row index column, or X may not > # necessarily correspond correctly to Y > X_df = train_df.select("__INDEX", "sample") > X_val_df = val_df.select("__INDEX", "sample") > y_df = train_df.select("__INDEX", "tumor_score") > y_val_df = val_df.select("__INDEX", "tumor_score") > X_df, X_val_df, y_df, y_val_df > script = """ > # Scale images to [-1,1] > X = X / 255 > X_val = X_val / 255 > X = X * 2 - 1 > X_val = X_val * 2 - 1 > # One-hot encode the labels > num_tumor_classes = 3 > n = nrow(y) > n_val = nrow(y_val) > Y = table(seq(1, n), y, n, num_tumor_classes) > Y_val = table(seq(1, n_val), y_val, n_val, num_tumor_classes) > """ > outputs = ("X", "X_val", "Y", "Y_val") > script = dml(script).input(X=X_df, X_val=X_val_df, y=y_df, > y_val=y_val_df).output(*outputs) > X, X_val, Y, Y_val = ml.execute(script).get(*outputs) > X, X_val, Y, Y_val > script = """ > write(X, "data/systemml/X_"+size+"_"+c+"_binary", format="binary") > write(Y, "data/systemml/Y_"+size+"_"+c+"_binary", format="binary") > write(X_val, "data/systemml/X_val_"+size+"_"+c+"_binary", format="binary") > write(Y_val, "data/systemml/Y_val_"+size+"_"+c+"_binary", format="binary") > """ > script = dml(script).input(X=X, X_val=X_val, Y=Y, Y_val=Y_val, size=size, c=c) > ml.execute(script) > {code} > General error: > {code} > Caused by: org.apache.sysml.api.mlcontext.MLContextException: Exception > occurred while executing runtime program > at > org.apache.sysml.api.mlcontext.ScriptExecutor.executeRuntimeProgram(ScriptExecutor.java:371) > at > org.apache.sysml.api.mlcontext.ScriptExecutor.execute(ScriptExecutor.java:292) > at org.apache.sysml.api.mlcontext.MLContext.execute(MLContext.java:293) > ... 12 more > Caused by: org.apache.sysml.runtime.DMLRuntimeException: > org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error in program > block generated from statement block between lines 1 and 11 -- Error > evaluating instruction: CP°mvvar°X°¶_Var49¶°binaryblock > at > org.apache.sysml.runtime.controlprogram.Program.execute(Program.java:130) > at > org.apache.sysml.api.mlcontext.ScriptExecutor.executeRuntimeProgram(ScriptExecutor.java:369) > ... 14 more > Caused by: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error > in program block generated from statement block between lines 1 and 11 -- > Error evaluating instruction: CP°mvvar°X°¶_Var49¶°binaryblock > at > org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:320) > at > org.apache.sysml.runtime.controlprogram.ProgramBlock.executeInstructions(ProgramBlock.java:221) > at > org.apache.sysml.runtime.controlprogram.ProgramBlock.execute(ProgramBlock.java:168) > at > org.apache.sysml.runtime.controlprogram.Program.execute(Program.java:123) > ... 15 more > Caused by: org.apache.sysml.runtime.controlprogram.caching.CacheException: > Move to data/systemml/X_256_3_binary failed. > at > org.apache.sysml.runtime.controlprogram.caching.CacheableData.moveData(CacheableData.java:1329) > at > org.apache.sysml.runtime.instructions.cp.VariableCPInstruction.processMoveInstruction(VariableCPInstruction.java:706) > at > org.apache.sysml.runtime.instructions.cp.VariableCPInstruction.processInstruction(VariableCPInstruction.java:511) > at > org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:290) > ... 18 more > Caused by: org.apache.sysml.runtime.controlprogram.caching.CacheException: > Export to data/systemml/X_256_3_binary failed. > at > org.apache.sysml.runtime.controlprogram.caching.CacheableData.exportData(CacheableData.java:800) > at >
[jira] [Comment Edited] (SYSTEMML-1281) OOM Error On Binary Write
[ https://issues.apache.org/jira/browse/SYSTEMML-1281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15872447#comment-15872447 ] Mike Dusenberry edited comment on SYSTEMML-1281 at 2/17/17 8:13 PM: Well, I found it while working on the deep learning breast cancer project. However, the specific code is actually not tied specifically to deep learning, and affects any code that wishes to write a DataFrame out to SystemML binary format. was (Author: mwdus...@us.ibm.com): Well, I found it while working on the deep learning breast cancer project. However, the specific code is actual not tied specifically to deep learning, and affects any code that wishes to write a DataFrame out to SystemML binary format. > OOM Error On Binary Write > - > > Key: SYSTEMML-1281 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1281 > Project: SystemML > Issue Type: Bug >Affects Versions: SystemML 0.13 >Reporter: Mike Dusenberry >Priority: Blocker > > I'm running into the following heap space OOM error while attempting to save > a large Spark DataFrame to a SystemML binary format via DML {{write}} > statements. > Script: > {code} > tr_sample_filename = os.path.join("data", "train_{}{}.parquet".format(size, > "_grayscale" if grayscale else "")) > val_sample_filename = os.path.join("data", "val_{}{}.parquet".format(size, > "_grayscale" if grayscale else "")) > train_df = sqlContext.read.load(tr_sample_filename) > val_df = sqlContext.read.load(val_sample_filename) > train_df, val_df > # Note: Must use the row index column, or X may not > # necessarily correspond correctly to Y > X_df = train_df.select("__INDEX", "sample") > X_val_df = val_df.select("__INDEX", "sample") > y_df = train_df.select("__INDEX", "tumor_score") > y_val_df = val_df.select("__INDEX", "tumor_score") > X_df, X_val_df, y_df, y_val_df > script = """ > # Scale images to [-1,1] > X = X / 255 > X_val = X_val / 255 > X = X * 2 - 1 > X_val = X_val * 2 - 1 > # One-hot encode the labels > num_tumor_classes = 3 > n = nrow(y) > n_val = nrow(y_val) > Y = table(seq(1, n), y, n, num_tumor_classes) > Y_val = table(seq(1, n_val), y_val, n_val, num_tumor_classes) > """ > outputs = ("X", "X_val", "Y", "Y_val") > script = dml(script).input(X=X_df, X_val=X_val_df, y=y_df, > y_val=y_val_df).output(*outputs) > X, X_val, Y, Y_val = ml.execute(script).get(*outputs) > X, X_val, Y, Y_val > script = """ > write(X, "data/systemml/X_"+size+"_"+c+"_binary", format="binary") > write(Y, "data/systemml/Y_"+size+"_"+c+"_binary", format="binary") > write(X_val, "data/systemml/X_val_"+size+"_"+c+"_binary", format="binary") > write(Y_val, "data/systemml/Y_val_"+size+"_"+c+"_binary", format="binary") > """ > script = dml(script).input(X=X, X_val=X_val, Y=Y, Y_val=Y_val, size=size, c=c) > ml.execute(script) > {code} > General error: > {code} > Caused by: org.apache.sysml.api.mlcontext.MLContextException: Exception > occurred while executing runtime program > at > org.apache.sysml.api.mlcontext.ScriptExecutor.executeRuntimeProgram(ScriptExecutor.java:371) > at > org.apache.sysml.api.mlcontext.ScriptExecutor.execute(ScriptExecutor.java:292) > at org.apache.sysml.api.mlcontext.MLContext.execute(MLContext.java:293) > ... 12 more > Caused by: org.apache.sysml.runtime.DMLRuntimeException: > org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error in program > block generated from statement block between lines 1 and 11 -- Error > evaluating instruction: CP°mvvar°X°¶_Var49¶°binaryblock > at > org.apache.sysml.runtime.controlprogram.Program.execute(Program.java:130) > at > org.apache.sysml.api.mlcontext.ScriptExecutor.executeRuntimeProgram(ScriptExecutor.java:369) > ... 14 more > Caused by: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error > in program block generated from statement block between lines 1 and 11 -- > Error evaluating instruction: CP°mvvar°X°¶_Var49¶°binaryblock > at > org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:320) > at > org.apache.sysml.runtime.controlprogram.ProgramBlock.executeInstructions(ProgramBlock.java:221) > at > org.apache.sysml.runtime.controlprogram.ProgramBlock.execute(ProgramBlock.java:168) > at > org.apache.sysml.runtime.controlprogram.Program.execute(Program.java:123) > ... 15 more > Caused by: org.apache.sysml.runtime.controlprogram.caching.CacheException: > Move to data/systemml/X_256_3_binary failed. > at > org.apache.sysml.runtime.controlprogram.caching.CacheableData.moveData(CacheableData.java:1329) > at > org.apache.sysml.runtime.instructions.cp.VariableCPInstruction.processMoveInstruction(VariableCPInstruction.java:706) > at >
[jira] [Commented] (SYSTEMML-1281) OOM Error On Binary Write
[ https://issues.apache.org/jira/browse/SYSTEMML-1281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15872447#comment-15872447 ] Mike Dusenberry commented on SYSTEMML-1281: --- Well, I found it while working on the deep learning breast cancer project. However, the specific code is actual not tied specifically to deep learning, and affects any code that wishes to write a DataFrame out to SystemML binary format. > OOM Error On Binary Write > - > > Key: SYSTEMML-1281 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1281 > Project: SystemML > Issue Type: Bug >Affects Versions: SystemML 0.13 >Reporter: Mike Dusenberry >Priority: Blocker > > I'm running into the following heap space OOM error while attempting to save > a large Spark DataFrame to a SystemML binary format via DML {{write}} > statements. > Script: > {code} > tr_sample_filename = os.path.join("data", "train_{}{}.parquet".format(size, > "_grayscale" if grayscale else "")) > val_sample_filename = os.path.join("data", "val_{}{}.parquet".format(size, > "_grayscale" if grayscale else "")) > train_df = sqlContext.read.load(tr_sample_filename) > val_df = sqlContext.read.load(val_sample_filename) > train_df, val_df > # Note: Must use the row index column, or X may not > # necessarily correspond correctly to Y > X_df = train_df.select("__INDEX", "sample") > X_val_df = val_df.select("__INDEX", "sample") > y_df = train_df.select("__INDEX", "tumor_score") > y_val_df = val_df.select("__INDEX", "tumor_score") > X_df, X_val_df, y_df, y_val_df > script = """ > # Scale images to [-1,1] > X = X / 255 > X_val = X_val / 255 > X = X * 2 - 1 > X_val = X_val * 2 - 1 > # One-hot encode the labels > num_tumor_classes = 3 > n = nrow(y) > n_val = nrow(y_val) > Y = table(seq(1, n), y, n, num_tumor_classes) > Y_val = table(seq(1, n_val), y_val, n_val, num_tumor_classes) > """ > outputs = ("X", "X_val", "Y", "Y_val") > script = dml(script).input(X=X_df, X_val=X_val_df, y=y_df, > y_val=y_val_df).output(*outputs) > X, X_val, Y, Y_val = ml.execute(script).get(*outputs) > X, X_val, Y, Y_val > script = """ > write(X, "data/systemml/X_"+size+"_"+c+"_binary", format="binary") > write(Y, "data/systemml/Y_"+size+"_"+c+"_binary", format="binary") > write(X_val, "data/systemml/X_val_"+size+"_"+c+"_binary", format="binary") > write(Y_val, "data/systemml/Y_val_"+size+"_"+c+"_binary", format="binary") > """ > script = dml(script).input(X=X, X_val=X_val, Y=Y, Y_val=Y_val, size=size, c=c) > ml.execute(script) > {code} > General error: > {code} > Caused by: org.apache.sysml.api.mlcontext.MLContextException: Exception > occurred while executing runtime program > at > org.apache.sysml.api.mlcontext.ScriptExecutor.executeRuntimeProgram(ScriptExecutor.java:371) > at > org.apache.sysml.api.mlcontext.ScriptExecutor.execute(ScriptExecutor.java:292) > at org.apache.sysml.api.mlcontext.MLContext.execute(MLContext.java:293) > ... 12 more > Caused by: org.apache.sysml.runtime.DMLRuntimeException: > org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error in program > block generated from statement block between lines 1 and 11 -- Error > evaluating instruction: CP°mvvar°X°¶_Var49¶°binaryblock > at > org.apache.sysml.runtime.controlprogram.Program.execute(Program.java:130) > at > org.apache.sysml.api.mlcontext.ScriptExecutor.executeRuntimeProgram(ScriptExecutor.java:369) > ... 14 more > Caused by: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error > in program block generated from statement block between lines 1 and 11 -- > Error evaluating instruction: CP°mvvar°X°¶_Var49¶°binaryblock > at > org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:320) > at > org.apache.sysml.runtime.controlprogram.ProgramBlock.executeInstructions(ProgramBlock.java:221) > at > org.apache.sysml.runtime.controlprogram.ProgramBlock.execute(ProgramBlock.java:168) > at > org.apache.sysml.runtime.controlprogram.Program.execute(Program.java:123) > ... 15 more > Caused by: org.apache.sysml.runtime.controlprogram.caching.CacheException: > Move to data/systemml/X_256_3_binary failed. > at > org.apache.sysml.runtime.controlprogram.caching.CacheableData.moveData(CacheableData.java:1329) > at > org.apache.sysml.runtime.instructions.cp.VariableCPInstruction.processMoveInstruction(VariableCPInstruction.java:706) > at > org.apache.sysml.runtime.instructions.cp.VariableCPInstruction.processInstruction(VariableCPInstruction.java:511) > at > org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:290) > ... 18 more > Caused by: org.apache.sysml.runtime.controlprogram.caching.CacheException: > Export to data/systemml/X_256_3_binary
[jira] [Commented] (SYSTEMML-1277) DataFrames With `mllib.Vector` Columns Are No Longer Converted to Matrices.
[ https://issues.apache.org/jira/browse/SYSTEMML-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15872364#comment-15872364 ] Xin Wu commented on SYSTEMML-1277: -- Is this issue also for Deep Learning? > DataFrames With `mllib.Vector` Columns Are No Longer Converted to Matrices. > --- > > Key: SYSTEMML-1277 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1277 > Project: SystemML > Issue Type: Bug >Affects Versions: SystemML 0.13 >Reporter: Mike Dusenberry >Assignee: Deron Eriksson >Priority: Blocker > > Recently, we made the switch from the old {{mllib.Vector}} to the new > {{ml.Vector}} type. Unfortunately, this leaves us with the issue of no > longer recognizing DataFrames with {{mllib.Vector}} columns during > conversion, and thus, we (1) do not correctly convert to SystemML {{Matrix}} > objects, (2) instead fall back on conversion to {{Frame}} objects, and then > (3) fail completely when the ensuing DML script is expecting to operated on > matrices. > Given a Spark {{DataFrame}} {{X_df}} of type {{DataFrame\[__INDEX: int, > sample: vector\]}}, where {{vector}} is of type {{mllib.Vector}}, the > following script will now fail (did not previously): > {code} > script = """ > # Scale images to [-1,1] > X = X / 255 > X = X * 2 - 1 > """ > outputs = ("X") > script = dml(script).input(X=X_df).output(*outputs) > X = ml.execute(script).get(*outputs) > X > {code} > {code} > Caused by: org.apache.sysml.api.mlcontext.MLContextException: Exception > occurred while validating script > at > org.apache.sysml.api.mlcontext.ScriptExecutor.validateScript(ScriptExecutor.java:487) > at > org.apache.sysml.api.mlcontext.ScriptExecutor.execute(ScriptExecutor.java:280) > at org.apache.sysml.api.mlcontext.MLContext.execute(MLContext.java:293) > ... 12 more > Caused by: org.apache.sysml.parser.LanguageException: Invalid Parameters : > ERROR: null -- line 4, column 4 -- Invalid Datatypes for operation FRAME > SCALAR > at > org.apache.sysml.parser.Expression.raiseValidateError(Expression.java:549) > at > org.apache.sysml.parser.Expression.computeDataType(Expression.java:415) > at > org.apache.sysml.parser.Expression.computeDataType(Expression.java:386) > at > org.apache.sysml.parser.BinaryExpression.validateExpression(BinaryExpression.java:130) > at > org.apache.sysml.parser.StatementBlock.validate(StatementBlock.java:567) > at > org.apache.sysml.parser.DMLTranslator.validateParseTree(DMLTranslator.java:140) > at > org.apache.sysml.api.mlcontext.ScriptExecutor.validateScript(ScriptExecutor.java:485) > ... 14 more > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (SYSTEMML-1281) OOM Error On Binary Write
[ https://issues.apache.org/jira/browse/SYSTEMML-1281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15872358#comment-15872358 ] Xin Wu commented on SYSTEMML-1281: -- This issue is related to Deep learning, right? > OOM Error On Binary Write > - > > Key: SYSTEMML-1281 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1281 > Project: SystemML > Issue Type: Bug >Affects Versions: SystemML 0.13 >Reporter: Mike Dusenberry >Priority: Blocker > > I'm running into the following heap space OOM error while attempting to save > a large Spark DataFrame to a SystemML binary format via DML {{write}} > statements. > Script: > {code} > tr_sample_filename = os.path.join("data", "train_{}{}.parquet".format(size, > "_grayscale" if grayscale else "")) > val_sample_filename = os.path.join("data", "val_{}{}.parquet".format(size, > "_grayscale" if grayscale else "")) > train_df = sqlContext.read.load(tr_sample_filename) > val_df = sqlContext.read.load(val_sample_filename) > train_df, val_df > # Note: Must use the row index column, or X may not > # necessarily correspond correctly to Y > X_df = train_df.select("__INDEX", "sample") > X_val_df = val_df.select("__INDEX", "sample") > y_df = train_df.select("__INDEX", "tumor_score") > y_val_df = val_df.select("__INDEX", "tumor_score") > X_df, X_val_df, y_df, y_val_df > script = """ > # Scale images to [-1,1] > X = X / 255 > X_val = X_val / 255 > X = X * 2 - 1 > X_val = X_val * 2 - 1 > # One-hot encode the labels > num_tumor_classes = 3 > n = nrow(y) > n_val = nrow(y_val) > Y = table(seq(1, n), y, n, num_tumor_classes) > Y_val = table(seq(1, n_val), y_val, n_val, num_tumor_classes) > """ > outputs = ("X", "X_val", "Y", "Y_val") > script = dml(script).input(X=X_df, X_val=X_val_df, y=y_df, > y_val=y_val_df).output(*outputs) > X, X_val, Y, Y_val = ml.execute(script).get(*outputs) > X, X_val, Y, Y_val > script = """ > write(X, "data/systemml/X_"+size+"_"+c+"_binary", format="binary") > write(Y, "data/systemml/Y_"+size+"_"+c+"_binary", format="binary") > write(X_val, "data/systemml/X_val_"+size+"_"+c+"_binary", format="binary") > write(Y_val, "data/systemml/Y_val_"+size+"_"+c+"_binary", format="binary") > """ > script = dml(script).input(X=X, X_val=X_val, Y=Y, Y_val=Y_val, size=size, c=c) > ml.execute(script) > {code} > General error: > {code} > Caused by: org.apache.sysml.api.mlcontext.MLContextException: Exception > occurred while executing runtime program > at > org.apache.sysml.api.mlcontext.ScriptExecutor.executeRuntimeProgram(ScriptExecutor.java:371) > at > org.apache.sysml.api.mlcontext.ScriptExecutor.execute(ScriptExecutor.java:292) > at org.apache.sysml.api.mlcontext.MLContext.execute(MLContext.java:293) > ... 12 more > Caused by: org.apache.sysml.runtime.DMLRuntimeException: > org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error in program > block generated from statement block between lines 1 and 11 -- Error > evaluating instruction: CP°mvvar°X°¶_Var49¶°binaryblock > at > org.apache.sysml.runtime.controlprogram.Program.execute(Program.java:130) > at > org.apache.sysml.api.mlcontext.ScriptExecutor.executeRuntimeProgram(ScriptExecutor.java:369) > ... 14 more > Caused by: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error > in program block generated from statement block between lines 1 and 11 -- > Error evaluating instruction: CP°mvvar°X°¶_Var49¶°binaryblock > at > org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:320) > at > org.apache.sysml.runtime.controlprogram.ProgramBlock.executeInstructions(ProgramBlock.java:221) > at > org.apache.sysml.runtime.controlprogram.ProgramBlock.execute(ProgramBlock.java:168) > at > org.apache.sysml.runtime.controlprogram.Program.execute(Program.java:123) > ... 15 more > Caused by: org.apache.sysml.runtime.controlprogram.caching.CacheException: > Move to data/systemml/X_256_3_binary failed. > at > org.apache.sysml.runtime.controlprogram.caching.CacheableData.moveData(CacheableData.java:1329) > at > org.apache.sysml.runtime.instructions.cp.VariableCPInstruction.processMoveInstruction(VariableCPInstruction.java:706) > at > org.apache.sysml.runtime.instructions.cp.VariableCPInstruction.processInstruction(VariableCPInstruction.java:511) > at > org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:290) > ... 18 more > Caused by: org.apache.sysml.runtime.controlprogram.caching.CacheException: > Export to data/systemml/X_256_3_binary failed. > at > org.apache.sysml.runtime.controlprogram.caching.CacheableData.exportData(CacheableData.java:800) > at >
[jira] [Created] (SYSTEMML-1282) Determine required avro jars for bin artifacts
Deron Eriksson created SYSTEMML-1282: Summary: Determine required avro jars for bin artifacts Key: SYSTEMML-1282 URL: https://issues.apache.org/jira/browse/SYSTEMML-1282 Project: SystemML Issue Type: Task Components: Build Reporter: Deron Eriksson Priority: Minor The current -bin (tgz and zip) artifacts have the following avro jars in them: {code} avro-1.7.4.jar avro-ipc-1.7.7-tests.jar avro-ipc-1.7.7.jar avro-mapred-1.7.7-hadoop2.jar {code} Determine if avro-ipc-1.7.7-tests.jar, avro-ipc-1.7.7.jar, and avro-mapred-1.7.7-hadoop2.jar are needed. If not, exclude them from bin artifacts. If any are needed, determine if a single version (1.7.4 or 1.7.7) should be used, and use that version. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (SYSTEMML-1243) Perftest: OutOfMemoryError in stratstats.dml for 800MB case
[ https://issues.apache.org/jira/browse/SYSTEMML-1243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15871492#comment-15871492 ] Matthias Boehm commented on SYSTEMML-1243: -- Just to clarify our default 100k scenario with 100 features runs fine. This scenario here however, uses 1000 features which makes stratstats more challenging. I was able to reproduce this OOM, even after the recent changes which already reduces memory pressure. The core problem comes form several matrix multiplication of the following form, where we've chosen mapmm (with repartitioning at runtime level in order to overcome Spark's 2GB limitation per partition). {code} mapmm: rdd [10 x 1000, nnz=95000819, blocks (1000 x 1000)] 800MB mapmm: bc [1000 x 100, nnz=100, blocks (1000 x 1000)] 172MB --> output: 10 x 100 {code} However, because the RDD has only 100 block - this gives us an upper bound on the maximum number of input partitions, hindering us from repartition this RDD to our preferred number of partitions which causes too large outputs per task (partition). I can think of three potential directions going forward: 1) Flip RDD and broadcast during runtime if we detect that it would be beneficial for repartitioning (in this case changing the upper bound by 10x) 2) Alternative matrix multiplication operations: Traditionally, we would have applied RMM for these scenarios but replication can similarly lead to large task outputs. Alternatively, we could consider enabling pmapmm for production use. 3) Extended permutation matrix multiply pmm: So far, we only support selection but no permutation matrices and we're only able to detect this within a DAG which would not apply here. One option would be to keep track of special producing operations and flag intermediates. > Perftest: OutOfMemoryError in stratstats.dml for 800MB case > --- > > Key: SYSTEMML-1243 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1243 > Project: SystemML > Issue Type: Bug > Components: Test >Affects Versions: SystemML 0.13 > Environment: spark 2.1.0 >Reporter: Imran Younus >Assignee: Matthias Boehm > Fix For: SystemML 0.13 > > Attachments: sparkDML.sh > > > when running {{runAllStats.sh}} script, {{stratstats.dml}} ends with > OutOfMemory error for 100k_1k data set. Here is end of log file: > {code} > 17/02/06 16:09:25 INFO api.DMLScript: SystemML Statistics: > Total elapsed time: 1435.880 sec. > Total compilation time: 2.433 sec. > Total execution time: 1433.447 sec. > Number of compiled Spark inst:190. > Number of executed Spark inst:3. > Cache hits (Mem, WB, FS, HDFS): 72343/3/4/7. > Cache writes (WB, FS, HDFS): 10419/5/0. > Cache times (ACQr/m, RLS, EXP): 387.598/0.039/277.658/0.000 sec. > HOP DAGs recompiled (PRED, SB): 0/107. > HOP DAGs recompile time: 0.207 sec. > Functions recompiled: 3. > Functions recompile time: 0.026 sec. > Spark ctx create time (lazy): 36.537 sec. > Spark trans counts (par,bc,col):3/3/0. > Spark trans times (par,bc,col): 0.404/0.147/0.000 secs. > Total JIT compile time: 63.262 sec. > Total JVM GC count: 57. > Total JVM GC time:34.538 sec. > Heavy hitter instructions (name, time, count): > -- 1) wdivmm 1078.568 sec5 > -- 2) ba+*286.854 sec 22 > -- 3) sp_mapmm37.244 sec 3 > -- 4) fStat_tailprob 2.071 sec 3 > -- 5) rangeReIndex1.608 sec 30601 > -- 6) == 0.974 sec 11 > -- 7) ^2 0.793 sec 13 > -- 8) cdf 0.603 sec 10200 > -- 9) replace 0.349 sec 10 > -- 10)r' 0.278 sec 106 > 17/02/06 16:09:25 INFO api.DMLScript: END DML run 02/06/2017 16:09:25 > Exception in thread "main" java.lang.OutOfMemoryError: Java heap space > at > org.apache.sysml.runtime.matrix.data.MatrixBlock.allocateDenseBlock(MatrixBlock.java:363) > at > org.apache.sysml.runtime.matrix.data.MatrixBlock.allocateDenseBlock(MatrixBlock.java:339) > at > org.apache.sysml.runtime.matrix.data.MatrixBlock.allocateDenseOrSparseBlock(MatrixBlock.java:346) > at > org.apache.sysml.runtime.matrix.data.LibMatrixMult.matrixMultWDivMM(LibMatrixMult.java:752) > at > org.apache.sysml.runtime.matrix.data.MatrixBlock.quaternaryOperations(MatrixBlock.java:5475) > at > org.apache.sysml.runtime.instructions.cp.QuaternaryCPInstruction.processInstruction(QuaternaryCPInstruction.java:128) > at > org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:290) > at >