[jira] Resolved: (HADOOP-2360) hadoop::RecordReader::read() throws exception in HadoopPipes::RecordWriter
[ https://issues.apache.org/jira/browse/HADOOP-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yiping Han resolved HADOOP-2360. Resolution: Cannot Reproduce > hadoop::RecordReader::read() throws exception in HadoopPipes::RecordWriter > -- > > Key: HADOOP-2360 > URL: https://issues.apache.org/jira/browse/HADOOP-2360 > Project: Hadoop > Issue Type: Bug >Affects Versions: 0.14.3 >Reporter: Yiping Han >Priority: Minor > > The jute record is in format: > class SampleValue > { >ustring data; > } > And in HadoopPipes::RecordWriter::emit(), has code like this: > void SampleRecordWriterC::emit(const std::string& key, const std::string& > value) > { > if (key.empty() || value.empty()) { > return; > } > hadoop::StringInStream key_in_stream(const_cast(key)); > hadoop::RecordReader key_record_reader(key_in_stream, hadoop::kCSV); > EmitKeyT emit_key; > key_record_reader.read(emit_key); > hadoop::StringInStream value_in_stream(const_cast(value)); > hadoop::RecordReader value_record_reader(value_in_stream, hadoop::kCSV); > EmitValueT emit_value; > value_record_reader.read(emit_value); > return; > } > And the code throw hadoop::IOException at the read() line. > In the mapper, I have faked record emitted by the following code: > std::string value; > EmitValueT emit_value; > emit_value.getData().assign("FakeData"); > hadoop::StringOutStream value_out_stream(value); > hadoop::RecordWriter value_record_writer(value_out_stream, hadoop::kCSV); > value_record_writer.write(emit_value); > We haven't update to the up-to-date version of hadoop. But I've searched the > tickets and didn't find one issuing this problem. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2360) hadoop::RecordReader::read() throws exception in HadoopPipes::RecordWriter
[ https://issues.apache.org/jira/browse/HADOOP-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yiping Han updated HADOOP-2360: --- Priority: Minor (was: Blocker) > hadoop::RecordReader::read() throws exception in HadoopPipes::RecordWriter > -- > > Key: HADOOP-2360 > URL: https://issues.apache.org/jira/browse/HADOOP-2360 > Project: Hadoop > Issue Type: Bug >Affects Versions: 0.14.3 >Reporter: Yiping Han >Priority: Minor > > The jute record is in format: > class SampleValue > { >ustring data; > } > And in HadoopPipes::RecordWriter::emit(), has code like this: > void SampleRecordWriterC::emit(const std::string& key, const std::string& > value) > { > if (key.empty() || value.empty()) { > return; > } > hadoop::StringInStream key_in_stream(const_cast(key)); > hadoop::RecordReader key_record_reader(key_in_stream, hadoop::kCSV); > EmitKeyT emit_key; > key_record_reader.read(emit_key); > hadoop::StringInStream value_in_stream(const_cast(value)); > hadoop::RecordReader value_record_reader(value_in_stream, hadoop::kCSV); > EmitValueT emit_value; > value_record_reader.read(emit_value); > return; > } > And the code throw hadoop::IOException at the read() line. > In the mapper, I have faked record emitted by the following code: > std::string value; > EmitValueT emit_value; > emit_value.getData().assign("FakeData"); > hadoop::StringOutStream value_out_stream(value); > hadoop::RecordWriter value_record_writer(value_out_stream, hadoop::kCSV); > value_record_writer.write(emit_value); > We haven't update to the up-to-date version of hadoop. But I've searched the > tickets and didn't find one issuing this problem. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HADOOP-2360) hadoop::RecordReader::read() throws exception in HadoopPipes::RecordWriter
hadoop::RecordReader::read() throws exception in HadoopPipes::RecordWriter -- Key: HADOOP-2360 URL: https://issues.apache.org/jira/browse/HADOOP-2360 Project: Hadoop Issue Type: Bug Affects Versions: 0.14.3 Reporter: Yiping Han Priority: Blocker The jute record is in format: class SampleValue { ustring data; } And in HadoopPipes::RecordWriter::emit(), has code like this: void SampleRecordWriterC::emit(const std::string& key, const std::string& value) { if (key.empty() || value.empty()) { return; } hadoop::StringInStream key_in_stream(const_cast(key)); hadoop::RecordReader key_record_reader(key_in_stream, hadoop::kCSV); EmitKeyT emit_key; key_record_reader.read(emit_key); hadoop::StringInStream value_in_stream(const_cast(value)); hadoop::RecordReader value_record_reader(value_in_stream, hadoop::kCSV); EmitValueT emit_value; value_record_reader.read(emit_value); return; } And the code throw hadoop::IOException at the read() line. In the mapper, I have faked record emitted by the following code: std::string value; EmitValueT emit_value; emit_value.getData().assign("FakeData"); hadoop::StringOutStream value_out_stream(value); hadoop::RecordWriter value_record_writer(value_out_stream, hadoop::kCSV); value_record_writer.write(emit_value); We haven't update to the up-to-date version of hadoop. But I've searched the tickets and didn't find one issuing this problem. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HADOOP-2162) Provide last failure point when retry a mapper task
Provide last failure point when retry a mapper task --- Key: HADOOP-2162 URL: https://issues.apache.org/jira/browse/HADOOP-2162 Project: Hadoop Issue Type: New Feature Reporter: Yiping Han Currently when a mapper failed and get restarted, the restarted mapper can find out if itself is a retry or the first try from the task name. We want to also know where in the input the last try failed. With the last failure point, our mapper can then do something different for the particular input. The failure point does not necessary to be very accurate. The reason we ask for this instead of let hadoop to deal with the failure record is, in such a way we can do something special for the failure record instead of simply skip it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-1864) Support for big jar file (>2G)
[ https://issues.apache.org/jira/browse/HADOOP-1864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12537755 ] Yiping Han commented on HADOOP-1864: Milind, Yes. Either this issue or 2019 should satisfy our requirement. > Support for big jar file (>2G) > -- > > Key: HADOOP-1864 > URL: https://issues.apache.org/jira/browse/HADOOP-1864 > Project: Hadoop > Issue Type: Bug > Components: contrib/streaming >Affects Versions: 0.14.1 >Reporter: Yiping Han >Priority: Critical > > We have huge size binary that need to be distributed onto tasktracker nodes > in Hadoop streaming mode. We've tried both -file option and -cacheArchive > option. It seems the tasktracker node cannot unjar jar files bigger than 2G. > We are considering split our binaries into multiple jars, but with -file, it > seems we cannot do it. Also, we would prefer -cacheArchive option for > performance issue, but it seems -cacheArchive does not allow more than > appearance in the streaming options. Even if -cacheArchive support multiple > jars, we still need a way to put the jars into a single directory tree, > instead of using multiple symbolic links. > So, in general, we need a feasible and efficient way to update large size > (>2G) binaries for Hadoop streaming. Don't know if there is an existing > solution that we either didn't find or took it wrong. Or there should be some > extra work to provide a solution? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1864) Support for big jar file (>2G)
[ https://issues.apache.org/jira/browse/HADOOP-1864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yiping Han updated HADOOP-1864: --- Priority: Critical (was: Major) Affects Version/s: 0.14.1 We are now raising our expectation for the fix. I've confirmed Java v1.6 does not solve this problem. > Support for big jar file (>2G) > -- > > Key: HADOOP-1864 > URL: https://issues.apache.org/jira/browse/HADOOP-1864 > Project: Hadoop > Issue Type: Bug > Components: contrib/streaming >Affects Versions: 0.14.1 >Reporter: Yiping Han >Priority: Critical > > We have huge size binary that need to be distributed onto tasktracker nodes > in Hadoop streaming mode. We've tried both -file option and -cacheArchive > option. It seems the tasktracker node cannot unjar jar files bigger than 2G. > We are considering split our binaries into multiple jars, but with -file, it > seems we cannot do it. Also, we would prefer -cacheArchive option for > performance issue, but it seems -cacheArchive does not allow more than > appearance in the streaming options. Even if -cacheArchive support multiple > jars, we still need a way to put the jars into a single directory tree, > instead of using multiple symbolic links. > So, in general, we need a feasible and efficient way to update large size > (>2G) binaries for Hadoop streaming. Don't know if there is an existing > solution that we either didn't find or took it wrong. Or there should be some > extra work to provide a solution? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1865) "org.apache.hadoop.metrics.jvm.EventCounter" not instantiate error
[ https://issues.apache.org/jira/browse/HADOOP-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yiping Han updated HADOOP-1865: --- Description: Hi, I got the "org.apache.hadoop.metrics.jvm.EventCounter" not instantiate error. This error happens for every hadoop command. But it seems it does not block any operation to success. Don't know if anyone has an idea? was: Hi, I got the following error for every command I run on hadoop. But it seems the command still work. Can you help to find out what's wrong here? Thanks! bash-3.00$ bin/start-all.sh starting namenode, logging to /export/crawlspace/yhan/hadoop/hadoop-0.13.1/bin/../logs/hadoop-yhan-namenode-idev43.out log4j:ERROR Could not instantiate class [org.apache.hadoop.metrics.jvm.EventCounter]. java.lang.ClassNotFoundException: org.apache.hadoop.metrics.jvm.EventCounter at java.net.URLClassLoader$1.run(URLClassLoader.java:200) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:188) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:276) at java.lang.ClassLoader.loadClass(ClassLoader.java:251) at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319) at java.lang.Class.forName0(Native Method) > "org.apache.hadoop.metrics.jvm.EventCounter" not instantiate error > -- > > Key: HADOOP-1865 > URL: https://issues.apache.org/jira/browse/HADOOP-1865 > Project: Hadoop > Issue Type: Bug >Affects Versions: 0.13.1 >Reporter: Yiping Han >Priority: Critical > > Hi, > I got the "org.apache.hadoop.metrics.jvm.EventCounter" not instantiate error. > This error happens for every hadoop command. But it seems it does not block > any operation to success. Don't know if anyone has an idea? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HADOOP-1865) "org.apache.hadoop.metrics.jvm.EventCounter" not instantiate error
[ https://issues.apache.org/jira/browse/HADOOP-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yiping Han resolved HADOOP-1865. Resolution: Fixed This seems to related to the configuration files that I modified. > "org.apache.hadoop.metrics.jvm.EventCounter" not instantiate error > -- > > Key: HADOOP-1865 > URL: https://issues.apache.org/jira/browse/HADOOP-1865 > Project: Hadoop > Issue Type: Bug >Affects Versions: 0.13.1 >Reporter: Yiping Han >Priority: Critical > > Hi, > I got the following error for every command I run on hadoop. But it seems the > command still work. Can you help to find out what's wrong here? Thanks! > bash-3.00$ bin/start-all.sh > starting namenode, logging to > /export/crawlspace/yhan/hadoop/hadoop-0.13.1/bin/../logs/hadoop-yhan-namenode-idev43.out > log4j:ERROR Could not instantiate class > [org.apache.hadoop.metrics.jvm.EventCounter]. > java.lang.ClassNotFoundException: org.apache.hadoop.metrics.jvm.EventCounter > at java.net.URLClassLoader$1.run(URLClassLoader.java:200) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:188) > at java.lang.ClassLoader.loadClass(ClassLoader.java:306) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:276) > at java.lang.ClassLoader.loadClass(ClassLoader.java:251) > at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319) > at java.lang.Class.forName0(Native Method) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HADOOP-1865) "org.apache.hadoop.metrics.jvm.EventCounter" not instantiate error
"org.apache.hadoop.metrics.jvm.EventCounter" not instantiate error -- Key: HADOOP-1865 URL: https://issues.apache.org/jira/browse/HADOOP-1865 Project: Hadoop Issue Type: Bug Affects Versions: 0.13.1 Reporter: Yiping Han Priority: Critical Hi, I got the following error for every command I run on hadoop. But it seems the command still work. Can you help to find out what's wrong here? Thanks! bash-3.00$ bin/start-all.sh starting namenode, logging to /export/crawlspace/yhan/hadoop/hadoop-0.13.1/bin/../logs/hadoop-yhan-namenode-idev43.out log4j:ERROR Could not instantiate class [org.apache.hadoop.metrics.jvm.EventCounter]. java.lang.ClassNotFoundException: org.apache.hadoop.metrics.jvm.EventCounter at java.net.URLClassLoader$1.run(URLClassLoader.java:200) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:188) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:276) at java.lang.ClassLoader.loadClass(ClassLoader.java:251) at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319) at java.lang.Class.forName0(Native Method) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HADOOP-1864) Support for big jar file (>2G)
Support for big jar file (>2G) -- Key: HADOOP-1864 URL: https://issues.apache.org/jira/browse/HADOOP-1864 Project: Hadoop Issue Type: Bug Components: contrib/streaming Reporter: Yiping Han We have huge size binary that need to be distributed onto tasktracker nodes in Hadoop streaming mode. We've tried both -file option and -cacheArchive option. It seems the tasktracker node cannot unjar jar files bigger than 2G. We are considering split our binaries into multiple jars, but with -file, it seems we cannot do it. Also, we would prefer -cacheArchive option for performance issue, but it seems -cacheArchive does not allow more than appearance in the streaming options. Even if -cacheArchive support multiple jars, we still need a way to put the jars into a single directory tree, instead of using multiple symbolic links. So, in general, we need a feasible and efficient way to update large size (>2G) binaries for Hadoop streaming. Don't know if there is an existing solution that we either didn't find or took it wrong. Or there should be some extra work to provide a solution? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.