Thanks to help from the team, we fixed a hadoop classpath configuration so 
dml successfully invokes MapReduce jobs.

I'm carrying the discussion here in case other people ran into the same 
problem.

----Problem description----
I was running a simple dml to carry out data transformation on a hadoop 
cluster (hadoop 2.0.0 cdh4.2.1). The script ran successfully on 1GB data, 
but throws an error on ~30GB of data. 

It looks like SystemML didn't need to invoke MapReduce jobs on the small 
data set with console output ' Number of executed MR Jobs: 0'. On the 
larger data it attempted to run MR and threw the following error:

...
Caused by: java.lang.ClassNotFoundException: Class 
com.hadoop.compression.lzo.LzoCodec not found
        at 
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1493)
        at 
org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:127)
        ... 38 more


----Solution----
The missing class com.hadoop.compression.lzo.LzoCodec is contained in the 
lzo-hadoop jar file:
http://search.maven.org/#search%7Cga%7C1%7Cfc%3A%22com.hadoop.compression.lzo.LzoCodec%22

Installation and configuration information of LZO Parcel can be found 
here:
http://www.cloudera.com/documentation/archive/manager/4-x/4-7-3/Cloudera-Manager-Installation-Guide/cmig_install_LZO_Compression.html
and this stackoverflow solution:
http://stackoverflow.com/questions/23441142/class-com-hadoop-compression-lzo-lzocodec-not-found-for-spark-on-cdh-5

For my case it turns out we have the lzo jar but it was not included in 
the classpath. Explicitly pointing to the jar at dml job submission via 
-libjars (https://hadoop.apache.org/docs/r1.2.1/commands_manual.html#jar) 
did the trick: 

hadoop jar ./SystemML.jar -libjars <path to lzo jar>/hadoop-lzo-0.4.15.jar 
-f ./transform.dml -nvargs X=<path on HDFS>/file-to-transform

Ethan

Reply via email to