Re: Gxuides about running SystemML by spark cluster

Matthias Boehm Mon, 04 Apr 2016 10:25:38 -0700

There are no practically relevant size restrictions. Also, if there are
issues, please share some more information on it. Thanks.

Regards,
Matthias

From:   Wenjie Zhuang <ka...@vt.edu>
To:     Matthias Boehm/Almaden/IBM@IBMUS
Cc:     dev@systemml.incubator.apache.org
Date:   04/04/2016 04:37 AM
Subject:        Re: Gxuides about running SystemML by spark cluster

Hi, Matthias

Thanks again. I used genLinearRegressionData.dml yeasterday. However, when
I set number of sample as 60G, itt reports error. Do you what the maximum
input size that SystemML allows?

Besides, I also try to run dml by standalone mode. But when i
use ./runStandaloneSystemML.sh, it shows error: : Could not find or load
main class org.apache.sysml.api.DMLScript.

I download SystemML from github and mvn it again after you update.
https://github.com/apache/incubator-systemml/blob/master/scripts/algorithms/StepLinearRegDS.dml

Have a good week!

On Sun, Apr 3, 2016 at 9:28 AM, Wenjie Zhuang <ka...@vt.edu> wrote:
  Thanks a lot. I also have some other  questions. Could you please help me
  figure them out?

  1.  If I want the input size is 30G, how can I set it? I guess I should
  change parameters X, Y and B. But I'm not sure which script I can use.

  2. Do you know how to control the partition number when I run
  StepLinearRgDS.dml on Spark? Is there a configuration file where I can
  set partition number?

  3. What should the correct result be after  running StepLinearRgDS.dml?
  When the program ends, what can we get?

  Thanks & Have a nice day!

  2016年4月3日 1:08 AM，"Matthias Boehm" <mbo...@us.ibm.com>写道：
   thanks again for catching
   https://issues.apache.org/jira/browse/SYSTEMML-609, yes the change is in
   SystemML head now, so please rebuild SystemML or use one of our nightly
   builds (https://sparktc.ibmcloud.com/repo/latest/). Thanks.

   For running SystemML on Spark, you have multiple options (
   http://apache.github.io/incubator-systemml/#running-systemml). Either
   use MLContext or spark-submit. Since our documentation does not show
   many examples for spark-submit yet, here is a typical command line
   invocation:

   ../spark/bin/spark-submit \
   --class org.apache.sysml.api.DMLScript \
   --master yarn-client \
   --num-executors 10 \
   --driver-memory 20g \
   --executor-memory 60g \
   --executor-cores 24 \
   --queue default \
   --conf spark.driver.maxResultSize=0 \
   ./SystemML.jar \
   -f test.dml -stats -exec hybrid_spark -nvargs ...

   Everything else is similar to the hadoop invocation. We also provide you
   a script that simplifies this configuration:
   https://github.com/apache/incubator-systemml/blob/master/scripts/sparkDML.sh
   . Keep in mind that if you want to run in yarn-cluster, you should put
   the DML script and potentially SystemML-config into HDFS too.

   Regards,
   Matthias

   Inactive hide details for Wenjie Zhuang ---04/02/2016 07:50:35 PM---Hi,
   I try to run StepLinearRegDS.dml by spark yarn mode todWenjie Zhuang
   ---04/02/2016 07:50:35 PM---Hi, I try to run StepLinearRegDS.dml by
   spark yarn mode today. And I get the

   From: Wenjie Zhuang <ka...@vt.edu>
   To: dev@systemml.incubator.apache.org
   Cc: Matthias Boehm/Almaden/IBM@IBMUS
   Date: 04/02/2016 07:50 PM
   Subject: Re: Gxuides about running SystemML by spark cluster

   Hi,

   I try to run StepLinearRegDS.dml by spark yarn mode today. And I get the
   following result. Is it correct?

   Thanks.

   BEGIN STEPWISE LINEAR REGRESSION SCRIPT
   Reading X and Y...
   Best AIC without any features: 4123.134539784949
   Best AIC 4068.2916533784332 achieved with feature: 22
   Running linear regression with selected features...
   Computing the statistics...
   Writing the output matrix...

   On Sat, Apr 2, 2016 at 8:37 AM, Wenjie Zhuang <ka...@vt.edu> wrote:
         Hi,

         I am now trying to run experiments about SystemML on spark
         cluster. Could you please share some guides about how to run
         StepLinearRegDS.dml by spark cluster?  The official guide I find
         is most about hadoop.

         Thanks & Have a good weekend!

Re: Gxuides about running SystemML by spark cluster

Reply via email to