Hello All,
I have some questions running systemml scripts on HDFS (with hybrid_spark
execution mode).
My Current Configuration:
Standalone HDFS on OSX (version 2.8)
and Spark Pre-Built for hadoop 2.7 (version 2.1.0)
*jps* out from my system
[image: Inline image 1]
Both of them have been installed separately.
As far as I understand, to enable hdfs support we need to run spark master
on yarn-client | yarn-cluster. (Is this understanding correct?)
My question:
I dont have access to a cluster, is there a way to set up a yarn-client /
yarn-cluster or my local system so that I can run systemml scripts on
hybrid_spark mode with HDFS?. If yes could you please point to some
documentation?.
Thank you so much,
Krishna
PS : sysout of what I have tried already attached below.
# Standalone System-ML jar
SCRIPT_DIR=$SYSTEMML_HOME/scripts/*
BUILD_DIR=$SYSTEMML_HOME/target/*
LIB_DIR=$SYSTEMML_HOME/target/lib/*
HADOOP_HOME=$SYSTEMML_HOME/target/lib/hadoop/*
SYSTEMML_JAR=$SYSTEMML_HOME/target/systemml-1.0.0-SNAPSHOT.jar
FORMAT="csv"
ALGO=/Users/krishna/open-source/incubator-systemml/scripts/datagen/genRandData4Kmeans.dml
java -cp $SCRIPT_DIR:$BUILD_DIR:$LIB_DIR:$HADOOP_HOME
org.apache.sysml.api.DMLScript
-Dlog4j.configuration=file:'$SYSTEMML_HOME/conf/log4j.properties' -f $ALGO
-exec hybrid_spark -nvargs nr=10000 nf=1000 nc=50 dc=10.0 dr=1.0 fbf=100.0
cbf=100.0 X=hdfs:///data/X.data C=hdfs:///data/C.data Y=hdfs:///data/Y.data
YbyC=hdfs:///data/YbyC.data fmt=$FORMAT
#### Logs
krishna@Krishna:~/open-source/scripts$ java -cp
$SCRIPT_DIR:$BUILD_DIR:$LIB_DIR:$HADOOP_HOME org.apache.sysml.api.DMLScript
-Dlog4j.configuration=file:'$SYSTEMML_HOME/conf/log4j.properties' -f $ALGO
-exec hybrid_spark -nvargs nr=10000 nf=1000 nc=50 dc=10.0 dr=1.0 fbf=100.0
cbf=100.0 X=hdfs:///data/X.data C=hdfs:///data/C.data Y=hdfs:///data/Y.data
YbyC=hdfs:///data/YbyC.data fmt=$FORMAT
log4j:WARN No appenders could be found for logger
(org.apache.hadoop.util.Shell).
log4j:WARN Please initialize the log4j system properly.
BEGIN K-MEANS GENERATOR SCRIPT
Generating cluster distribution (mixture) centroids...
Generating record-to-cluster assignments...
Generating within-cluster random shifts...
Generating records by shifting from centroids...
Computing record-to-cluster assignments by minimum centroid distance...
Computing useful statistics...
Writing out the resulting dataset...
Exception in thread "main" org.apache.sysml.api.DMLException:
org.apache.sysml.runtime.DMLRuntimeException:
org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error in program
block generated from statement block between lines 80 and 119 -- Error
evaluating instruction:
CP掳write掳C路MATRIX路DOUBLE掳hdfs:///data/C.data路SCALAR路STRING路true掳csv路SCALAR路STRING路true掳false掳,掳false掳路SCALAR路STRING路true
at org.apache.sysml.api.DMLScript.executeScript(DMLScript.java:360)
at org.apache.sysml.api.DMLScript.main(DMLScript.java:207)
Caused by: org.apache.sysml.runtime.DMLRuntimeException:
org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error in program
block generated from statement block between lines 80 and 119 -- Error
evaluating instruction:
CP掳write掳C路MATRIX路DOUBLE掳hdfs:///data/C.data路SCALAR路STRING路true掳csv路SCALAR路STRING路true掳false掳,掳false掳路SCALAR路STRING路true
at
org.apache.sysml.runtime.controlprogram.Program.execute(Program.java:130)
at org.apache.sysml.api.DMLScript.execute(DMLScript.java:665)
at org.apache.sysml.api.DMLScript.executeScript(DMLScript.java:346)
... 1 more
Caused by: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error
in program block generated from statement block between lines 80 and 119 --
Error evaluating instruction:
CP掳write掳C路MATRIX路DOUBLE掳hdfs:///data/C.data路SCALAR路STRING路true掳csv路SCALAR路STRING路true掳false掳,掳false掳路SCALAR路STRING路true
at
org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:320)
at
org.apache.sysml.runtime.controlprogram.ProgramBlock.executeInstructions(ProgramBlock.java:221)
at
org.apache.sysml.runtime.controlprogram.ProgramBlock.execute(ProgramBlock.java:168)
at
org.apache.sysml.runtime.controlprogram.Program.execute(Program.java:123)
... 3 more
Caused by: org.apache.sysml.runtime.controlprogram.caching.CacheException:
Export to hdfs:///data/C.data failed.
at
org.apache.sysml.runtime.controlprogram.caching.CacheableData.exportData(CacheableData.java:779)
at
org.apache.sysml.runtime.controlprogram.caching.CacheableData.exportData(CacheableData.java:694)
at
org.apache.sysml.runtime.instructions.cp.VariableCPInstruction.writeCSVFile(VariableCPInstruction.java:826)
at
org.apache.sysml.runtime.instructions.cp.VariableCPInstruction.processWriteInstruction(VariableCPInstruction.java:773)
at
org.apache.sysml.runtime.instructions.cp.VariableCPInstruction.processInstruction(VariableCPInstruction.java:642)
at
org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:290)
... 6 more
Caused by: java.lang.IllegalArgumentException: Wrong FS: hdfs:/data, expected:
file:///
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645)
at
org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:80)
at
org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:423)
at
org.apache.hadoop.fs.ChecksumFileSystem.mkdirs(ChecksumFileSystem.java:590)
at
org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:441)
at
org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:428)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:908)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:889)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:786)
at
org.apache.sysml.runtime.util.MapReduceTool.writeMetaDataFile(MapReduceTool.java:390)
at
org.apache.sysml.runtime.controlprogram.caching.CacheableData.writeMetaData(CacheableData.java:960)
at
org.apache.sysml.runtime.controlprogram.caching.CacheableData.exportData(CacheableData.java:772)
... 11 more