Ethan, thank you for posting the fix to the LZO configuration issue. Deron
On Thu, Feb 4, 2016 at 9:45 AM, Ethan Xu <[email protected]> wrote: > Thanks to help from the team, we fixed a hadoop classpath configuration so > dml successfully invokes MapReduce jobs. > > I'm carrying the discussion here in case other people ran into the same > problem. > > ----Problem description---- > I was running a simple dml to carry out data transformation on a hadoop > cluster (hadoop 2.0.0 cdh4.2.1). The script ran successfully on 1GB data, > but throws an error on ~30GB of data. > > It looks like SystemML didn't need to invoke MapReduce jobs on the small > data set with console output ' Number of executed MR Jobs: 0'. On the > larger data it attempted to run MR and threw the following error: > > ... > Caused by: java.lang.ClassNotFoundException: Class > com.hadoop.compression.lzo.LzoCodec not found > at > > org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1493) > at > > org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:127) > ... 38 more > > > ----Solution---- > The missing class com.hadoop.compression.lzo.LzoCodec is contained in the > lzo-hadoop jar file: > > http://search.maven.org/#search%7Cga%7C1%7Cfc%3A%22com.hadoop.compression.lzo.LzoCodec%22 > > Installation and configuration information of LZO Parcel can be found > here: > > http://www.cloudera.com/documentation/archive/manager/4-x/4-7-3/Cloudera-Manager-Installation-Guide/cmig_install_LZO_Compression.html > and this stackoverflow solution: > > http://stackoverflow.com/questions/23441142/class-com-hadoop-compression-lzo-lzocodec-not-found-for-spark-on-cdh-5 > > For my case it turns out we have the lzo jar but it was not included in > the classpath. Explicitly pointing to the jar at dml job submission via > -libjars (https://hadoop.apache.org/docs/r1.2.1/commands_manual.html#jar) > did the trick: > > hadoop jar ./SystemML.jar -libjars <path to lzo jar>/hadoop-lzo-0.4.15.jar > -f ./transform.dml -nvargs X=<path on HDFS>/file-to-transform > > Ethan > >
