Re: java.io.FileNotFoundException
Hi, I had the same problem. This worked for me: property namemapred.child.tmp/name valueD:\tmp/value /property Kind regards, Aleksandar Stupar. From: Carlos Eduardo Moreira dos Santos cem...@gmail.com To: common-user common-user@hadoop.apache.org Sent: Sun, May 2, 2010 9:10:03 PM Subject: Re: java.io.FileNotFoundException Yes, I can create it: $ ls E:/tmp/hadoop-SYSTEM/mapred/local/taskTracker/jobcache/job_201005020105_0001/ ls: cannot access E:/tmp/hadoop-SYSTEM/mapred/local/taskTracker/jobcache/job_201005020105_0001/: No such file or directory $ mkdir -p E:/tmp/hadoop-SYSTEM/mapred/local/taskTracker/jobcache/job_201005020105_0001/attempt_201005020105_0001/attempt_201005020105_0001_m_02_0/work/tmp $ ls E:/tmp/hadoop-SYSTEM/mapred/local/taskTracker/jobcache/job_201005020105_0001/attempt_201005020105_0001/attempt_201005020105_0001_m_02_0/work/ tmp On Sun, May 2, 2010 at 10:39 AM, Ted Yu yuzhih...@gmail.com wrote: Looks like localFs.mkdirs(tmpDir) failed. Can you check whether you can manually create E:/tmp/hadoop-SYSTEM/mapred/local/taskTracker/jobcache/job_201005020105_0001/attempt_201005020105_0001_m_02_0/work/tmp ? Also, what do you set mapred.local.dir to ? Try not using /tmp. I didn't set it. It has its default value: ${hadoop.tmp.dir}/mapred/local On Sat, May 1, 2010 at 9:42 PM, Carlos Eduardo Moreira dos Santos c...@cemshost.com.br wrote: Hadoop is working fine in Linux. In Windows (using cygwin) I can't get mapred to work, though hdfs is ok. This is the stacktrace: java.io.FileNotFoundException: File E:/tmp/hadoop-SYSTEM/mapred/local/taskTracker/jobcache/job_201005020105_0001/attempt_201005020105_0001_m_02_0/work/tmp does not exist. at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:361) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245) at org.apache.hadoop.mapred.TaskRunner.setupWorkDir(TaskRunner.java:519) at org.apache.hadoop.mapred.Child.main(Child.java:155) The beginning of the path (E:/) seems strange (I was hoping for something like /cygdrive/e or just /tmp). I read TaskRunner.java:519 and I tried to use an absolute path in mapred.child.tmp in conf/mapred-site.xml, but it keeps looking for the same path even if I restart mapred. The path exists until jobcache The tasktracker log shows: 2010-05-02 01:06:23,765 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction (registerTask): attempt_201005020105_0001_m_02_0 task's state:UNASSIGNED^M 2010-05-02 01:06:23,765 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction (registerTask): attempt_201005020105_0001_m_03_0 task's state:UNASSIGNED^M 2010-05-02 01:06:23,765 INFO org.apache.hadoop.mapred.TaskTracker: Trying to launch : attempt_201005020105_0001_m_02_0^M 2010-05-02 01:06:23,765 INFO org.apache.hadoop.mapred.TaskTracker: In TaskLauncher, current free slots : 2 and trying to launch attempt_201005020105_0001_m_02_0^M 2010-05-02 01:06:37,562 INFO org.apache.hadoop.mapred.TaskTracker: Trying to launch : attempt_201005020105_0001_m_03_0^M 2010-05-02 01:06:37,562 INFO org.apache.hadoop.mapred.TaskTracker: In TaskLauncher, current free slots : 1 and trying to launch attempt_201005020105_0001_m_03_0^M 2010-05-02 01:06:37,625 INFO org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM ID: jvm_201005020105_0001_m_518928642^M 2010-05-02 01:06:37,625 INFO org.apache.hadoop.mapred.JvmManager: JVM Runner jvm_201005020105_0001_m_518928642 spawned.^M 2010-05-02 01:06:37,921 INFO org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM ID: jvm_201005020105_0001_m_1918177803^M 2010-05-02 01:06:37,921 INFO org.apache.hadoop.mapred.JvmManager: JVM Runner jvm_201005020105_0001_m_1918177803 spawned.^M 2010-05-02 01:06:40,312 INFO org.apache.hadoop.mapred.TaskTracker: JVM with ID: jvm_201005020105_0001_m_518928642 given task: attempt_201005020105_0001_m_02_0^M 2010-05-02 01:06:40,578 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201005020105_0001_m_02_0 0.0% ^M 2010-05-02 01:06:40,687 INFO org.apache.hadoop.mapred.JvmManager: JVM : jvm_201005020105_0001_m_518928642 exited. Number of tasks it ran: 0^M 2010-05-02 01:06:41,046 INFO org.apache.hadoop.mapred.TaskTracker: JVM with ID: jvm_201005020105_0001_m_1918177803 given task: attempt_201005020105_0001_m_03_0^M 2010-05-02 01:06:41,265 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201005020105_0001_m_03_0 0.0% ^M 2010-05-02 01:06:41,421 INFO org.apache.hadoop.mapred.JvmManager: JVM : jvm_201005020105_0001_m_1918177803 exited. Number of tasks it ran: 0^M 2010-05-02 01:06:43,687 INFO org.apache.hadoop.mapred.TaskRunner: attempt_201005020105_0001_m_02_0 done; removing files.^M 2010-05-02 01:06:43,687 INFO org.apache.hadoop.mapred.TaskTracker: addFreeSlot : current free slots : 1^M 2010-05-02 01:06:44,421
Assertions
Hi all, is there a way to enable Java assertions inside a map/reduce function? I tried setting the -enableassertions switch in hadoop-env.sh using the HADOOP_TASKTRACKER_OPTS variable but it didn't work. I tried also setting a property in mapred-site.xml property namemapred.child.java.opts/name value-enableassertions/value /property but I get a Java heap space error. Thanks, Gianmarco
Re: Assertions
Gianmarco, You might want to increase the heap size. It's a property that can be set: try setting mapred.child.java.opts to -Xmx1024M. Mithila On Mon, May 3, 2010 at 8:04 AM, Gianmarco gianmarco@gmail.com wrote: Hi all, is there a way to enable Java assertions inside a map/reduce function? I tried setting the -enableassertions switch in hadoop-env.sh using the HADOOP_TASKTRACKER_OPTS variable but it didn't work. I tried also setting a property in mapred-site.xml property namemapred.child.java.opts/name value-enableassertions/value /property but I get a Java heap space error. Thanks, Gianmarco
Re: problem w/ data load
Hi Susanne, Hadoop uses the file extension to detect that a file is compressed. I believe Hive does too. Did you store the compressed file in HDFS with a .gz extension? Cheers, Tom BTW It's best to send Hive questions like these to the hive-user@ list. On Sun, May 2, 2010 at 11:22 AM, Susanne Lehmann susanne.lehm...@metamarketsgroup.com wrote: Hi, I want to load data from HDFS to Hive, the data is in compressed files. The data is stored in flat files, the delimiter is ^A (ctrl-A). As long as I use de-compressed files everything is working fine. Since ctrl-A is the default delimiter I even don't need a specification for it. I do the following: hadoop dfs -put /test/file new hive DROP TABLE test_new; OK Time taken: 0.057 seconds hive CREATE TABLE test_new( bla int, bla string, etc bla string); OK Time taken: 0.035 seconds hive LOAD DATA INPATH /test/file INTO TABLE test_new; Loading data to table test_new OK Time taken: 0.063 seconds But if I do the same with the same file compressed it's not working anymore. I tried tons of different table definitions with the delimiter specified, but it doesn't go. The load itself works, but the data is always NULL, so there is a delimiter problem I conclude. Any help is greatly appreciated!
HDF5 and Hadoop
Does anyone know of any existing work integrating HDF5 (http://www.hdfgroup.org/HDF5/whatishdf5.html) with Hadoop? I don't know much about HDF5 but it was recently brought to my attention as a way to store high-density scientific data. Since I've confirmed that having Hadoop dramatically speeds up our analysis, it seems like marrying the two might have some benefits. I've done some searches on google and it doesn't turn up much. Thanks! --Andrew
Re: HDF5 and Hadoop
Hi Andrew, There has been some work in the Tika [1] project recently on looking at NetCDF4 [2] and HDF4/5 [3] and extracting metadata/text content from them. Though this doesn't directly apply to your question below, it might be worth perhaps looking at how to marry Tika and Hadoop in that regard. HTH! Cheers, Chris [1] http://lucene.apache.org/tika/ [2] http://issues.apache.org/jira/browse/TIKA-400 [3] https://issues.apache.org/jira/browse/TIKA-399 On 5/3/10 10:36 AM, Andrew Nguyen andrew-lists-had...@ucsfcti.org wrote: Does anyone know of any existing work integrating HDF5 (http://www.hdfgroup.org/HDF5/whatishdf5.html) with Hadoop? I don't know much about HDF5 but it was recently brought to my attention as a way to store high-density scientific data. Since I've confirmed that having Hadoop dramatically speeds up our analysis, it seems like marrying the two might have some benefits. I've done some searches on google and it doesn't turn up much. Thanks! --Andrew ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.mattm...@jpl.nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++
Re: problem w/ data load
Hi Tom, Yes. I store the file in HDFS with a .gz extension. Do i need to tell somehow Hive that it is a compressed file? Best, Susanne PS: Thanks for the tip with the list, I will use the other list for further questions if necessary. I wasn't sure which one to use. On Mon, May 3, 2010 at 9:52 AM, Tom White t...@cloudera.com wrote: Hi Susanne, Hadoop uses the file extension to detect that a file is compressed. I believe Hive does too. Did you store the compressed file in HDFS with a .gz extension? Cheers, Tom BTW It's best to send Hive questions like these to the hive-user@ list. On Sun, May 2, 2010 at 11:22 AM, Susanne Lehmann susanne.lehm...@metamarketsgroup.com wrote: Hi, I want to load data from HDFS to Hive, the data is in compressed files. The data is stored in flat files, the delimiter is ^A (ctrl-A). As long as I use de-compressed files everything is working fine. Since ctrl-A is the default delimiter I even don't need a specification for it. I do the following: hadoop dfs -put /test/file new hive DROP TABLE test_new; OK Time taken: 0.057 seconds hive CREATE TABLE test_new( bla int, bla string, etc bla string); OK Time taken: 0.035 seconds hive LOAD DATA INPATH /test/file INTO TABLE test_new; Loading data to table test_new OK Time taken: 0.063 seconds But if I do the same with the same file compressed it's not working anymore. I tried tons of different table definitions with the delimiter specified, but it doesn't go. The load itself works, but the data is always NULL, so there is a delimiter problem I conclude. Any help is greatly appreciated!
Re: problem w/ data load
On Mon, May 3, 2010 at 2:00 PM, Susanne Lehmann susanne.lehm...@metamarketsgroup.com wrote: Hi Tom, Yes. I store the file in HDFS with a .gz extension. Do i need to tell somehow Hive that it is a compressed file? Best, Susanne PS: Thanks for the tip with the list, I will use the other list for further questions if necessary. I wasn't sure which one to use. On Mon, May 3, 2010 at 9:52 AM, Tom White t...@cloudera.com wrote: Hi Susanne, Hadoop uses the file extension to detect that a file is compressed. I believe Hive does too. Did you store the compressed file in HDFS with a .gz extension? Cheers, Tom BTW It's best to send Hive questions like these to the hive-user@ list. On Sun, May 2, 2010 at 11:22 AM, Susanne Lehmann susanne.lehm...@metamarketsgroup.com wrote: Hi, I want to load data from HDFS to Hive, the data is in compressed files. The data is stored in flat files, the delimiter is ^A (ctrl-A). As long as I use de-compressed files everything is working fine. Since ctrl-A is the default delimiter I even don't need a specification for it. I do the following: hadoop dfs -put /test/file new hive DROP TABLE test_new; OK Time taken: 0.057 seconds hiveCREATE TABLE test_new( bla int, blastring, etc bla string); OK Time taken: 0.035 seconds hive LOAD DATA INPATH /test/file INTO TABLE test_new; Loading data to table test_new OK Time taken: 0.063 seconds But if I do the same with the same file compressed it's not working anymore. I tried tons of different table definitions with the delimiter specified, but it doesn't go. The load itself works, but the data is always NULL, so there is a delimiter problem I conclude. Any help is greatly appreciated! If your file is a text file that is simply gzipped you create your table as normal create table stored as textfile. If your file is a sequence file using block compression (gzip) you create table stored as sequencefile.
Re: problem w/ data load
yep, hive will work fine if you point it to the .gz file just note though that if this is one large gz file then it will only use one mapper and one reducer, it will not get parallelized. -- amr On 5/3/2010 11:29 AM, Edward Capriolo wrote: On Mon, May 3, 2010 at 2:00 PM, Susanne Lehmann susanne.lehm...@metamarketsgroup.com wrote: Hi Tom, Yes. I store the file in HDFS with a .gz extension. Do i need to tell somehow Hive that it is a compressed file? Best, Susanne PS: Thanks for the tip with the list, I will use the other list for further questions if necessary. I wasn't sure which one to use. On Mon, May 3, 2010 at 9:52 AM, Tom Whitet...@cloudera.com wrote: Hi Susanne, Hadoop uses the file extension to detect that a file is compressed. I believe Hive does too. Did you store the compressed file in HDFS with a .gz extension? Cheers, Tom BTW It's best to send Hive questions like these to the hive-user@ list. On Sun, May 2, 2010 at 11:22 AM, Susanne Lehmann susanne.lehm...@metamarketsgroup.com wrote: Hi, I want to load data from HDFS to Hive, the data is in compressed files. The data is stored in flat files, the delimiter is ^A (ctrl-A). As long as I use de-compressed files everything is working fine. Since ctrl-A is the default delimiter I even don't need a specification for it. I do the following: hadoop dfs -put /test/file new hive DROP TABLE test_new; OK Time taken: 0.057 seconds hive CREATE TABLE test_new( bla int, blastring, etc bla string); OK Time taken: 0.035 seconds hive LOAD DATA INPATH /test/file INTO TABLE test_new; Loading data to table test_new OK Time taken: 0.063 seconds But if I do the same with the same file compressed it's not working anymore. I tried tons of different table definitions with the delimiter specified, but it doesn't go. The load itself works, but the data is always NULL, so there is a delimiter problem I conclude. Any help is greatly appreciated! If your file is a text file that is simply gzipped you create your table as normal create table stored as textfile. If your file is a sequence file using block compression (gzip) you create table stored as sequencefile.
Re: java.io.FileNotFoundException
I tried E:\tmp and also /cygdrive/e/tmp, but the error message keeps the same, except the job ids. I think the file conf/mapred-site.xml is ignored, is it possible (I restarted hdfs after conf changes)? This is the file: ?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? !-- Put site-specific property overrides in this file. -- configuration property namemapred.job.tracker/name valuehadoop-cemsbr:9001/value /property property namemapred.child.tmp/name valueE:\tmp/value /property /configuration Thank you, Carlos Eduardo On Mon, May 3, 2010 at 4:38 AM, Aleksandar Stupar stupar.aleksan...@yahoo.com wrote: Hi, I had the same problem. This worked for me: property namemapred.child.tmp/name valueD:\tmp/value /property Kind regards, Aleksandar Stupar. -- *From:* Carlos Eduardo Moreira dos Santos cem...@gmail.com *To:* common-user common-user@hadoop.apache.org *Sent:* Sun, May 2, 2010 9:10:03 PM *Subject:* Re: java.io.FileNotFoundException Yes, I can create it: $ ls E:/tmp/hadoop-SYSTEM/mapred/local/taskTracker/jobcache/job_201005020105_0001/ ls: cannot access E:/tmp/hadoop-SYSTEM/mapred/local/taskTracker/jobcache/job_201005020105_0001/: No such file or directory $ mkdir -p E:/tmp/hadoop-SYSTEM/mapred/local/taskTracker/jobcache/job_201005020105_0001/attempt_201005020105_0001/attempt_201005020105_0001_m_02_0/work/tmp $ ls E:/tmp/hadoop-SYSTEM/mapred/local/taskTracker/jobcache/job_201005020105_0001/attempt_201005020105_0001/attempt_201005020105_0001_m_02_0/work/ tmp On Sun, May 2, 2010 at 10:39 AM, Ted Yu yuzhih...@gmail.com wrote: Looks like localFs.mkdirs(tmpDir) failed. Can you check whether you can manually create E:/tmp/hadoop-SYSTEM/mapred/local/taskTracker/jobcache/job_201005020105_0001/attempt_201005020105_0001_m_02_0/work/tmp ? Also, what do you set mapred.local.dir to ? Try not using /tmp. I didn't set it. It has its default value: ${hadoop.tmp.dir}/mapred/local On Sat, May 1, 2010 at 9:42 PM, Carlos Eduardo Moreira dos Santos c...@cemshost.com.br wrote: Hadoop is working fine in Linux. In Windows (using cygwin) I can't get mapred to work, though hdfs is ok. This is the stacktrace: java.io http://java.io.Fi.FileNotFoundException: File E:/tmp/hadoop-SYSTEM/mapred/local/taskTracker/jobcache/job_201005020105_0001/attempt_201005020105_0001_m_02_0/work/tmp does not exist. at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:361) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245) at org.apache.hadoop.mapred.TaskRunner.setupWorkDir(TaskRunner.java:519) at org.apache.hadoop.mapred.Child.main(Child.java:155) The beginning of the path (E:/) seems strange (I was hoping for something like /cygdrive/e or just /tmp). I read TaskRunner.java:519 and I tried to use an absolute path in mapred.child.tmp in conf/mapred-site.xml, but it keeps looking for the same path even if I restart mapred. The path exists until jobcache The tasktracker log shows: 2010-05-02 01:06:23,765 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction (registerTask): attempt_201005020105_0001_m_02_0 task's state:UNASSIGNED^M 2010-05-02 01:06:23,765 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction (registerTask): attempt_201005020105_0001_m_03_0 task's state:UNASSIGNED^M 2010-05-02 01:06:23,765 INFO org.apache.hadoop.mapred.TaskTracker: Trying to launch : attempt_201005020105_0001_m_02_0^M 2010-05-02 01:06:23,765 INFO org.apache.hadoop.mapred.TaskTracker: In TaskLauncher, current free slots : 2 and trying to launch attempt_201005020105_0001_m_02_0^M 2010-05-02 01:06:37,562 INFO org.apache.hadoop.mapred.TaskTracker: Trying to launch : attempt_201005020105_0001_m_03_0^M 2010-05-02 01:06:37,562 INFO org.apache.hadoop.mapred.TaskTracker: In TaskLauncher, current free slots : 1 and trying to launch attempt_201005020105_0001_m_03_0^M 2010-05-02 01:06:37,625 INFO org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM ID: jvm_201005020105_0001_m_518928642^M 2010-05-02 01:06:37,625 INFO org.apache.hadoop.mapred.JvmManager: JVM Runner jvm_201005020105_0001_m_518928642 spawned.^M 2010-05-02 01:06:37,921 INFO org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM ID: jvm_201005020105_0001_m_1918177803^M 2010-05-02 01:06:37,921 INFO org.apache.hadoop.mapred.JvmManager: JVM Runner jvm_201005020105_0001_m_1918177803 spawned.^M 2010-05-02 01:06:40,312 INFO org.apache.hadoop.mapred.TaskTracker: JVM with ID: jvm_201005020105_0001_m_518928642 given task: attempt_201005020105_0001_m_02_0^M 2010-05-02 01:06:40,578 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201005020105_0001_m_02_0 0.0% ^M 2010-05-02 01:06:40,687 INFO org.apache.hadoop.mapred.JvmManager: JVM :
Re: HDF5 and Hadoop
Chris, Thanks for the heads up! --Andrew On May 3, 2010, at 10:45 AM, Mattmann, Chris A (388J) wrote: Hi Andrew, There has been some work in the Tika [1] project recently on looking at NetCDF4 [2] and HDF4/5 [3] and extracting metadata/text content from them. Though this doesn't directly apply to your question below, it might be worth perhaps looking at how to marry Tika and Hadoop in that regard. HTH! Cheers, Chris [1] http://lucene.apache.org/tika/ [2] http://issues.apache.org/jira/browse/TIKA-400 [3] https://issues.apache.org/jira/browse/TIKA-399