There are no practically relevant size restrictions. Also, if there are issues, please share some more information on it. Thanks.
Regards, Matthias From: Wenjie Zhuang <ka...@vt.edu> To: Matthias Boehm/Almaden/IBM@IBMUS Cc: dev@systemml.incubator.apache.org Date: 04/04/2016 04:37 AM Subject: Re: Gxuides about running SystemML by spark cluster Hi, Matthias Thanks again. I used genLinearRegressionData.dml yeasterday. However, when I set number of sample as 60G, itt reports error. Do you what the maximum input size that SystemML allows? Besides, I also try to run dml by standalone mode. But when i use ./runStandaloneSystemML.sh, it shows error: : Could not find or load main class org.apache.sysml.api.DMLScript. I download SystemML from github and mvn it again after you update. https://github.com/apache/incubator-systemml/blob/master/scripts/algorithms/StepLinearRegDS.dml Have a good week! On Sun, Apr 3, 2016 at 9:28 AM, Wenjie Zhuang <ka...@vt.edu> wrote: Thanks a lot. I also have some other questions. Could you please help me figure them out? 1. If I want the input size is 30G, how can I set it? I guess I should change parameters X, Y and B. But I'm not sure which script I can use. 2. Do you know how to control the partition number when I run StepLinearRgDS.dml on Spark? Is there a configuration file where I can set partition number? 3. What should the correct result be after running StepLinearRgDS.dml? When the program ends, what can we get? Thanks & Have a nice day! 2016年4月3日 1:08 AM,"Matthias Boehm" <mbo...@us.ibm.com>写道: thanks again for catching https://issues.apache.org/jira/browse/SYSTEMML-609, yes the change is in SystemML head now, so please rebuild SystemML or use one of our nightly builds (https://sparktc.ibmcloud.com/repo/latest/). Thanks. For running SystemML on Spark, you have multiple options ( http://apache.github.io/incubator-systemml/#running-systemml). Either use MLContext or spark-submit. Since our documentation does not show many examples for spark-submit yet, here is a typical command line invocation: ../spark/bin/spark-submit \ --class org.apache.sysml.api.DMLScript \ --master yarn-client \ --num-executors 10 \ --driver-memory 20g \ --executor-memory 60g \ --executor-cores 24 \ --queue default \ --conf spark.driver.maxResultSize=0 \ ./SystemML.jar \ -f test.dml -stats -exec hybrid_spark -nvargs ... Everything else is similar to the hadoop invocation. We also provide you a script that simplifies this configuration: https://github.com/apache/incubator-systemml/blob/master/scripts/sparkDML.sh . Keep in mind that if you want to run in yarn-cluster, you should put the DML script and potentially SystemML-config into HDFS too. Regards, Matthias Inactive hide details for Wenjie Zhuang ---04/02/2016 07:50:35 PM---Hi, I try to run StepLinearRegDS.dml by spark yarn mode todWenjie Zhuang ---04/02/2016 07:50:35 PM---Hi, I try to run StepLinearRegDS.dml by spark yarn mode today. And I get the From: Wenjie Zhuang <ka...@vt.edu> To: dev@systemml.incubator.apache.org Cc: Matthias Boehm/Almaden/IBM@IBMUS Date: 04/02/2016 07:50 PM Subject: Re: Gxuides about running SystemML by spark cluster Hi, I try to run StepLinearRegDS.dml by spark yarn mode today. And I get the following result. Is it correct? Thanks. BEGIN STEPWISE LINEAR REGRESSION SCRIPT Reading X and Y... Best AIC without any features: 4123.134539784949 Best AIC 4068.2916533784332 achieved with feature: 22 Running linear regression with selected features... Computing the statistics... Writing the output matrix... On Sat, Apr 2, 2016 at 8:37 AM, Wenjie Zhuang <ka...@vt.edu> wrote: Hi, I am now trying to run experiments about SystemML on spark cluster. Could you please share some guides about how to run StepLinearRegDS.dml by spark cluster? The official guide I find is most about hadoop. Thanks & Have a good weekend!