Re: java.io.FileNotFoundException

2010-05-03 Thread Aleksandar Stupar
Hi,

I had the same problem. This worked for me:

property
namemapred.child.tmp/name
valueD:\tmp/value 
/property

Kind regards,
Aleksandar Stupar.





From: Carlos Eduardo Moreira dos Santos cem...@gmail.com
To: common-user common-user@hadoop.apache.org
Sent: Sun, May 2, 2010 9:10:03 PM
Subject: Re: java.io.FileNotFoundException

Yes, I can create it:

$ ls 
E:/tmp/hadoop-SYSTEM/mapred/local/taskTracker/jobcache/job_201005020105_0001/
ls: cannot access
E:/tmp/hadoop-SYSTEM/mapred/local/taskTracker/jobcache/job_201005020105_0001/:
No such file or directory

$ mkdir -p 
E:/tmp/hadoop-SYSTEM/mapred/local/taskTracker/jobcache/job_201005020105_0001/attempt_201005020105_0001/attempt_201005020105_0001_m_02_0/work/tmp

$ ls 
E:/tmp/hadoop-SYSTEM/mapred/local/taskTracker/jobcache/job_201005020105_0001/attempt_201005020105_0001/attempt_201005020105_0001_m_02_0/work/
tmp

On Sun, May 2, 2010 at 10:39 AM, Ted Yu yuzhih...@gmail.com wrote:
 Looks like localFs.mkdirs(tmpDir) failed. Can you check whether you can
 manually create
 E:/tmp/hadoop-SYSTEM/mapred/local/taskTracker/jobcache/job_201005020105_0001/attempt_201005020105_0001_m_02_0/work/tmp
 ?

 Also, what do you set mapred.local.dir to ? Try not using /tmp.

I didn't set it. It has its default value: ${hadoop.tmp.dir}/mapred/local

 On Sat, May 1, 2010 at 9:42 PM, Carlos Eduardo Moreira dos Santos 
 c...@cemshost.com.br wrote:

 Hadoop is working fine in Linux. In Windows (using cygwin) I can't get
 mapred to work, though hdfs is ok. This is the stacktrace:

 java.io.FileNotFoundException: File

 E:/tmp/hadoop-SYSTEM/mapred/local/taskTracker/jobcache/job_201005020105_0001/attempt_201005020105_0001_m_02_0/work/tmp
 does not exist.
at
 org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:361)
at
 org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
at
 org.apache.hadoop.mapred.TaskRunner.setupWorkDir(TaskRunner.java:519)
at org.apache.hadoop.mapred.Child.main(Child.java:155)

 The beginning of the path (E:/) seems strange (I was hoping for
 something like /cygdrive/e or just /tmp). I read TaskRunner.java:519
 and I tried to use an absolute path in mapred.child.tmp in
 conf/mapred-site.xml, but it keeps looking for the same path even if I
 restart mapred. The path exists until jobcache

 The tasktracker log shows:

 2010-05-02 01:06:23,765 INFO org.apache.hadoop.mapred.TaskTracker:
 LaunchTaskAction (registerTask): attempt_201005020105_0001_m_02_0
 task's state:UNASSIGNED^M
 2010-05-02 01:06:23,765 INFO org.apache.hadoop.mapred.TaskTracker:
 LaunchTaskAction (registerTask): attempt_201005020105_0001_m_03_0
 task's state:UNASSIGNED^M
 2010-05-02 01:06:23,765 INFO org.apache.hadoop.mapred.TaskTracker:
 Trying to launch : attempt_201005020105_0001_m_02_0^M
 2010-05-02 01:06:23,765 INFO org.apache.hadoop.mapred.TaskTracker: In
 TaskLauncher, current free slots : 2 and trying to launch
 attempt_201005020105_0001_m_02_0^M
 2010-05-02 01:06:37,562 INFO org.apache.hadoop.mapred.TaskTracker:
 Trying to launch : attempt_201005020105_0001_m_03_0^M
 2010-05-02 01:06:37,562 INFO org.apache.hadoop.mapred.TaskTracker: In
 TaskLauncher, current free slots : 1 and trying to launch
 attempt_201005020105_0001_m_03_0^M
 2010-05-02 01:06:37,625 INFO org.apache.hadoop.mapred.JvmManager: In
 JvmRunner constructed JVM ID: jvm_201005020105_0001_m_518928642^M
 2010-05-02 01:06:37,625 INFO org.apache.hadoop.mapred.JvmManager: JVM
 Runner jvm_201005020105_0001_m_518928642 spawned.^M
 2010-05-02 01:06:37,921 INFO org.apache.hadoop.mapred.JvmManager: In
 JvmRunner constructed JVM ID: jvm_201005020105_0001_m_1918177803^M
 2010-05-02 01:06:37,921 INFO org.apache.hadoop.mapred.JvmManager: JVM
 Runner jvm_201005020105_0001_m_1918177803 spawned.^M
 2010-05-02 01:06:40,312 INFO org.apache.hadoop.mapred.TaskTracker: JVM
 with ID: jvm_201005020105_0001_m_518928642 given task:
 attempt_201005020105_0001_m_02_0^M
 2010-05-02 01:06:40,578 INFO org.apache.hadoop.mapred.TaskTracker:
 attempt_201005020105_0001_m_02_0 0.0% ^M
 2010-05-02 01:06:40,687 INFO org.apache.hadoop.mapred.JvmManager: JVM
 : jvm_201005020105_0001_m_518928642 exited. Number of tasks it ran:
 0^M
 2010-05-02 01:06:41,046 INFO org.apache.hadoop.mapred.TaskTracker: JVM
 with ID: jvm_201005020105_0001_m_1918177803 given task:
 attempt_201005020105_0001_m_03_0^M
 2010-05-02 01:06:41,265 INFO org.apache.hadoop.mapred.TaskTracker:
 attempt_201005020105_0001_m_03_0 0.0% ^M
 2010-05-02 01:06:41,421 INFO org.apache.hadoop.mapred.JvmManager: JVM
 : jvm_201005020105_0001_m_1918177803 exited. Number of tasks it ran:
 0^M
 2010-05-02 01:06:43,687 INFO org.apache.hadoop.mapred.TaskRunner:
 attempt_201005020105_0001_m_02_0 done; removing files.^M
 2010-05-02 01:06:43,687 INFO org.apache.hadoop.mapred.TaskTracker:
 addFreeSlot : current free slots : 1^M
 2010-05-02 01:06:44,421 

Assertions

2010-05-03 Thread Gianmarco
Hi all,
is there a way to enable Java assertions inside a map/reduce function?
I tried setting the -enableassertions switch in hadoop-env.sh using the
HADOOP_TASKTRACKER_OPTS variable but it didn't work.
I tried also setting a property in mapred-site.xml

property
  namemapred.child.java.opts/name
  value-enableassertions/value
/property

but I get a Java heap space error.

Thanks,
Gianmarco


Re: Assertions

2010-05-03 Thread Mithila Nagendra
Gianmarco,

You might want to increase the heap size. It's a property that can be set:
try setting mapred.child.java.opts to  -Xmx1024M.

Mithila

On Mon, May 3, 2010 at 8:04 AM, Gianmarco gianmarco@gmail.com wrote:

 Hi all,
 is there a way to enable Java assertions inside a map/reduce function?
 I tried setting the -enableassertions switch in hadoop-env.sh using the
 HADOOP_TASKTRACKER_OPTS variable but it didn't work.
 I tried also setting a property in mapred-site.xml

 property
  namemapred.child.java.opts/name
  value-enableassertions/value
 /property

 but I get a Java heap space error.

 Thanks,
 Gianmarco



Re: problem w/ data load

2010-05-03 Thread Tom White
Hi Susanne,

Hadoop uses the file extension to detect that a file is compressed. I
believe Hive does too. Did you store the compressed file in HDFS with
a .gz extension?

Cheers,
Tom

BTW It's best to send Hive questions like these to the hive-user@ list.

On Sun, May 2, 2010 at 11:22 AM, Susanne Lehmann
susanne.lehm...@metamarketsgroup.com wrote:
 Hi,

 I want to load data from HDFS to Hive, the data is in compressed files.
 The data is stored in flat files, the delimiter is ^A (ctrl-A).
 As long as I use de-compressed files everything is working fine. Since
 ctrl-A is the default delimiter I even don't need a specification for
 it.  I do the following:


 hadoop dfs -put /test/file new

 hive  DROP TABLE test_new;
 OK
 Time taken: 0.057 seconds
 hive    CREATE TABLE test_new(
            bla  int,
            bla            string,
            etc
            bla      string);
 OK
 Time taken: 0.035 seconds
 hive LOAD DATA INPATH /test/file INTO TABLE test_new;
 Loading data to table test_new
 OK
 Time taken: 0.063 seconds

 But if I do the same with the same file compressed it's not working
 anymore. I tried tons of different table definitions with the
 delimiter specified, but it doesn't go. The load itself works, but the
 data is always NULL, so there is a delimiter problem I conclude.

  Any help is greatly appreciated!



HDF5 and Hadoop

2010-05-03 Thread Andrew Nguyen
Does anyone know of any existing work integrating HDF5 
(http://www.hdfgroup.org/HDF5/whatishdf5.html) with Hadoop?

I don't know much about HDF5 but it was recently brought to my attention as a 
way to store high-density scientific data.  Since I've confirmed that having 
Hadoop dramatically speeds up our analysis, it seems like marrying the two 
might have some benefits.

I've done some searches on google and it doesn't turn up much.

Thanks!

--Andrew

Re: HDF5 and Hadoop

2010-05-03 Thread Mattmann, Chris A (388J)
Hi Andrew,

There has been some work in the Tika [1] project recently on looking at NetCDF4 
[2] and HDF4/5 [3] and extracting metadata/text content from them. Though this 
doesn't directly apply to your question below, it might be worth perhaps 
looking at how to marry Tika and Hadoop in that regard.

HTH!

Cheers,
Chris

[1] http://lucene.apache.org/tika/
[2] http://issues.apache.org/jira/browse/TIKA-400
[3] https://issues.apache.org/jira/browse/TIKA-399


On 5/3/10 10:36 AM, Andrew Nguyen andrew-lists-had...@ucsfcti.org wrote:

Does anyone know of any existing work integrating HDF5 
(http://www.hdfgroup.org/HDF5/whatishdf5.html) with Hadoop?

I don't know much about HDF5 but it was recently brought to my attention as a 
way to store high-density scientific data.  Since I've confirmed that having 
Hadoop dramatically speeds up our analysis, it seems like marrying the two 
might have some benefits.

I've done some searches on google and it doesn't turn up much.

Thanks!

--Andrew



++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.mattm...@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Re: problem w/ data load

2010-05-03 Thread Susanne Lehmann
Hi Tom,

Yes. I store the file in HDFS with a .gz extension. Do i need to
tell somehow Hive that it is a compressed file?

Best,
Susanne

PS: Thanks for the tip with the list, I will use the other list for
further questions if necessary. I wasn't sure which one to use.

On Mon, May 3, 2010 at 9:52 AM, Tom White t...@cloudera.com wrote:
 Hi Susanne,

 Hadoop uses the file extension to detect that a file is compressed. I
 believe Hive does too. Did you store the compressed file in HDFS with
 a .gz extension?

 Cheers,
 Tom

 BTW It's best to send Hive questions like these to the hive-user@ list.

 On Sun, May 2, 2010 at 11:22 AM, Susanne Lehmann
 susanne.lehm...@metamarketsgroup.com wrote:
 Hi,

 I want to load data from HDFS to Hive, the data is in compressed files.
 The data is stored in flat files, the delimiter is ^A (ctrl-A).
 As long as I use de-compressed files everything is working fine. Since
 ctrl-A is the default delimiter I even don't need a specification for
 it.  I do the following:


 hadoop dfs -put /test/file new

 hive  DROP TABLE test_new;
 OK
 Time taken: 0.057 seconds
 hive    CREATE TABLE test_new(
            bla  int,
            bla            string,
            etc
            bla      string);
 OK
 Time taken: 0.035 seconds
 hive LOAD DATA INPATH /test/file INTO TABLE test_new;
 Loading data to table test_new
 OK
 Time taken: 0.063 seconds

 But if I do the same with the same file compressed it's not working
 anymore. I tried tons of different table definitions with the
 delimiter specified, but it doesn't go. The load itself works, but the
 data is always NULL, so there is a delimiter problem I conclude.

  Any help is greatly appreciated!




Re: problem w/ data load

2010-05-03 Thread Edward Capriolo
On Mon, May 3, 2010 at 2:00 PM, Susanne Lehmann 
susanne.lehm...@metamarketsgroup.com wrote:

 Hi Tom,

 Yes. I store the file in HDFS with a .gz extension. Do i need to
 tell somehow Hive that it is a compressed file?

 Best,
 Susanne

 PS: Thanks for the tip with the list, I will use the other list for
 further questions if necessary. I wasn't sure which one to use.

 On Mon, May 3, 2010 at 9:52 AM, Tom White t...@cloudera.com wrote:
  Hi Susanne,
 
  Hadoop uses the file extension to detect that a file is compressed. I
  believe Hive does too. Did you store the compressed file in HDFS with
  a .gz extension?
 
  Cheers,
  Tom
 
  BTW It's best to send Hive questions like these to the hive-user@ list.
 
  On Sun, May 2, 2010 at 11:22 AM, Susanne Lehmann
  susanne.lehm...@metamarketsgroup.com wrote:
  Hi,
 
  I want to load data from HDFS to Hive, the data is in compressed files.
  The data is stored in flat files, the delimiter is ^A (ctrl-A).
  As long as I use de-compressed files everything is working fine. Since
  ctrl-A is the default delimiter I even don't need a specification for
  it.  I do the following:
 
 
  hadoop dfs -put /test/file new
 
  hive  DROP TABLE test_new;
  OK
  Time taken: 0.057 seconds
  hiveCREATE TABLE test_new(
 bla  int,
 blastring,
 etc
 bla  string);
  OK
  Time taken: 0.035 seconds
  hive LOAD DATA INPATH /test/file INTO TABLE test_new;
  Loading data to table test_new
  OK
  Time taken: 0.063 seconds
 
  But if I do the same with the same file compressed it's not working
  anymore. I tried tons of different table definitions with the
  delimiter specified, but it doesn't go. The load itself works, but the
  data is always NULL, so there is a delimiter problem I conclude.
 
   Any help is greatly appreciated!
 
 


If your file is a text file that is simply gzipped you create your table as
normal

create table  stored as textfile.

If your file is a sequence file using block compression (gzip) you

create table  stored as sequencefile.


Re: problem w/ data load

2010-05-03 Thread Amr Awadallah

yep, hive will work fine if you point it to the .gz file

just note though that if this is one large gz file then it will only use 
one mapper and one reducer, it will not get parallelized.


-- amr

On 5/3/2010 11:29 AM, Edward Capriolo wrote:

On Mon, May 3, 2010 at 2:00 PM, Susanne Lehmann
susanne.lehm...@metamarketsgroup.com  wrote:

   

Hi Tom,

Yes. I store the file in HDFS with a .gz extension. Do i need to
tell somehow Hive that it is a compressed file?

Best,
Susanne

PS: Thanks for the tip with the list, I will use the other list for
further questions if necessary. I wasn't sure which one to use.

On Mon, May 3, 2010 at 9:52 AM, Tom Whitet...@cloudera.com  wrote:
 

Hi Susanne,

Hadoop uses the file extension to detect that a file is compressed. I
believe Hive does too. Did you store the compressed file in HDFS with
a .gz extension?

Cheers,
Tom

BTW It's best to send Hive questions like these to the hive-user@ list.

On Sun, May 2, 2010 at 11:22 AM, Susanne Lehmann
susanne.lehm...@metamarketsgroup.com  wrote:
   

Hi,

I want to load data from HDFS to Hive, the data is in compressed files.
The data is stored in flat files, the delimiter is ^A (ctrl-A).
As long as I use de-compressed files everything is working fine. Since
ctrl-A is the default delimiter I even don't need a specification for
it.  I do the following:


hadoop dfs -put /test/file new

hive   DROP TABLE test_new;
OK
Time taken: 0.057 seconds
hive CREATE TABLE test_new(
 bla  int,
 blastring,
 etc
 bla  string);
OK
Time taken: 0.035 seconds
hive  LOAD DATA INPATH /test/file INTO TABLE test_new;
Loading data to table test_new
OK
Time taken: 0.063 seconds

But if I do the same with the same file compressed it's not working
anymore. I tried tons of different table definitions with the
delimiter specified, but it doesn't go. The load itself works, but the
data is always NULL, so there is a delimiter problem I conclude.

  Any help is greatly appreciated!

 
   
 

If your file is a text file that is simply gzipped you create your table as
normal

create table  stored as textfile.

If your file is a sequence file using block compression (gzip) you

create table  stored as sequencefile.

   


Re: java.io.FileNotFoundException

2010-05-03 Thread Carlos Eduardo Moreira dos Santos
I tried E:\tmp and also /cygdrive/e/tmp, but the error message keeps the
same, except the job ids. I think the file conf/mapred-site.xml is ignored,
is it possible (I restarted hdfs after conf changes)? This is the file:

?xml version=1.0?
?xml-stylesheet type=text/xsl href=configuration.xsl?

!-- Put site-specific property overrides in this file. --

configuration

property
namemapred.job.tracker/name
 valuehadoop-cemsbr:9001/value
/property

 property
namemapred.child.tmp/name
valueE:\tmp/value
 /property

/configuration

Thank you,
Carlos Eduardo

On Mon, May 3, 2010 at 4:38 AM, Aleksandar Stupar 
stupar.aleksan...@yahoo.com wrote:

 Hi,

 I had the same problem. This worked for me:

 property
 namemapred.child.tmp/name
 valueD:\tmp/value
 /property

 Kind regards,
 Aleksandar Stupar.

 --
 *From:* Carlos Eduardo Moreira dos Santos cem...@gmail.com
 *To:* common-user common-user@hadoop.apache.org
 *Sent:* Sun, May 2, 2010 9:10:03 PM
 *Subject:* Re: java.io.FileNotFoundException

 Yes, I can create it:

 $ ls
 E:/tmp/hadoop-SYSTEM/mapred/local/taskTracker/jobcache/job_201005020105_0001/
 ls: cannot access

 E:/tmp/hadoop-SYSTEM/mapred/local/taskTracker/jobcache/job_201005020105_0001/:
 No such file or directory

 $ mkdir -p
 E:/tmp/hadoop-SYSTEM/mapred/local/taskTracker/jobcache/job_201005020105_0001/attempt_201005020105_0001/attempt_201005020105_0001_m_02_0/work/tmp

 $ ls
 E:/tmp/hadoop-SYSTEM/mapred/local/taskTracker/jobcache/job_201005020105_0001/attempt_201005020105_0001/attempt_201005020105_0001_m_02_0/work/
 tmp

 On Sun, May 2, 2010 at 10:39 AM, Ted Yu yuzhih...@gmail.com wrote:
  Looks like localFs.mkdirs(tmpDir) failed. Can you check whether you can
  manually create
 
 E:/tmp/hadoop-SYSTEM/mapred/local/taskTracker/jobcache/job_201005020105_0001/attempt_201005020105_0001_m_02_0/work/tmp
  ?
 
  Also, what do you set mapred.local.dir to ? Try not using /tmp.
 
 I didn't set it. It has its default value: ${hadoop.tmp.dir}/mapred/local

  On Sat, May 1, 2010 at 9:42 PM, Carlos Eduardo Moreira dos Santos 
  c...@cemshost.com.br wrote:
 
  Hadoop is working fine in Linux. In Windows (using cygwin) I can't get
  mapred to work, though hdfs is ok. This is the stacktrace:
 
  java.io http://java.io.Fi.FileNotFoundException: File
 
 
 E:/tmp/hadoop-SYSTEM/mapred/local/taskTracker/jobcache/job_201005020105_0001/attempt_201005020105_0001_m_02_0/work/tmp
  does not exist.
 at
 
 org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:361)
 at
 
 org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
 at
  org.apache.hadoop.mapred.TaskRunner.setupWorkDir(TaskRunner.java:519)
 at org.apache.hadoop.mapred.Child.main(Child.java:155)
 
  The beginning of the path (E:/) seems strange (I was hoping for
  something like /cygdrive/e or just /tmp). I read TaskRunner.java:519
  and I tried to use an absolute path in mapred.child.tmp in
  conf/mapred-site.xml, but it keeps looking for the same path even if I
  restart mapred. The path exists until jobcache
 
  The tasktracker log shows:
 
  2010-05-02 01:06:23,765 INFO org.apache.hadoop.mapred.TaskTracker:
  LaunchTaskAction (registerTask): attempt_201005020105_0001_m_02_0
  task's state:UNASSIGNED^M
  2010-05-02 01:06:23,765 INFO org.apache.hadoop.mapred.TaskTracker:
  LaunchTaskAction (registerTask): attempt_201005020105_0001_m_03_0
  task's state:UNASSIGNED^M
  2010-05-02 01:06:23,765 INFO org.apache.hadoop.mapred.TaskTracker:
  Trying to launch : attempt_201005020105_0001_m_02_0^M
  2010-05-02 01:06:23,765 INFO org.apache.hadoop.mapred.TaskTracker: In
  TaskLauncher, current free slots : 2 and trying to launch
  attempt_201005020105_0001_m_02_0^M
  2010-05-02 01:06:37,562 INFO org.apache.hadoop.mapred.TaskTracker:
  Trying to launch : attempt_201005020105_0001_m_03_0^M
  2010-05-02 01:06:37,562 INFO org.apache.hadoop.mapred.TaskTracker: In
  TaskLauncher, current free slots : 1 and trying to launch
  attempt_201005020105_0001_m_03_0^M
  2010-05-02 01:06:37,625 INFO org.apache.hadoop.mapred.JvmManager: In
  JvmRunner constructed JVM ID: jvm_201005020105_0001_m_518928642^M
  2010-05-02 01:06:37,625 INFO org.apache.hadoop.mapred.JvmManager: JVM
  Runner jvm_201005020105_0001_m_518928642 spawned.^M
  2010-05-02 01:06:37,921 INFO org.apache.hadoop.mapred.JvmManager: In
  JvmRunner constructed JVM ID: jvm_201005020105_0001_m_1918177803^M
  2010-05-02 01:06:37,921 INFO org.apache.hadoop.mapred.JvmManager: JVM
  Runner jvm_201005020105_0001_m_1918177803 spawned.^M
  2010-05-02 01:06:40,312 INFO org.apache.hadoop.mapred.TaskTracker: JVM
  with ID: jvm_201005020105_0001_m_518928642 given task:
  attempt_201005020105_0001_m_02_0^M
  2010-05-02 01:06:40,578 INFO org.apache.hadoop.mapred.TaskTracker:
  attempt_201005020105_0001_m_02_0 0.0% ^M
  2010-05-02 01:06:40,687 INFO org.apache.hadoop.mapred.JvmManager: JVM
  : 

Re: HDF5 and Hadoop

2010-05-03 Thread Andrew Nguyen
Chris,

Thanks for the heads up!

--Andrew

On May 3, 2010, at 10:45 AM, Mattmann, Chris A (388J) wrote:

 Hi Andrew,
 
 There has been some work in the Tika [1] project recently on looking at 
 NetCDF4 [2] and HDF4/5 [3] and extracting metadata/text content from them. 
 Though this doesn't directly apply to your question below, it might be worth 
 perhaps looking at how to marry Tika and Hadoop in that regard.
 
 HTH!
 
 Cheers,
 Chris
 
 [1] http://lucene.apache.org/tika/
 [2] http://issues.apache.org/jira/browse/TIKA-400
 [3] https://issues.apache.org/jira/browse/TIKA-399