Re: How to perform FILE IO with Hadoop DFS

2008-05-06 Thread Steve Loughran

vikas wrote:

Thank you very much for the right link. It really helped. As many others
even I'm waiting for
Append to files in HDFS

Is there any thing which I can do to raise its priority. Does HADOOP
Developer community is  tracking any request counter for a particular
feature to raise ones priority. if that is the case I would like to add my
vote to this :)


Apache projects celebrate community contributions more than just votes.



I've registered to mailing list .. and that gives me previlage of creating
JIRA and watching on one. can you tell me how I get into developer community
so that if time permits even I can contribute by discussion or code.



-get on the core-developer list
-watch how things work. Most discussion is on on specific bugs. Note 
also how hudson tests all patches, rejects anything with no tests or 
javac, javadoc warnings.

-check out SVN_HEAD and build it
-start patching stuff on the side, non critical things, so that people 
learn to trust your coding skills. The filesystem is taken very 
seriously, as a failure there could lose petabytes of data.
-look at the test process. All changes need to fit in there. Even if you 
don't have 500 machines to spare, others do, so design your changes to 
test in that world, and to run on bigger clusters.


Note that Append is not something you are ever going to see on S3 files; 
it's not part of the S3 REST API. So if you assume append everywhere, 
your app wont be so happy on the EC2 farms.


--
Steve Loughran  http://www.1060.org/blogxter/publish/5
Author: Ant in Action   http://antbook.org/


copyFromLocal strangeness

2008-05-06 Thread Bo Shi
Hi All,

I recently encountered the following exception when attempting to
write to our hadoop grid

This is from the name-node log:

2008-05-05 16:11:01,966 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 4 on 54310, call create(/2008-05-05/ONE-2008-05-05-14.gz,
DFSClient_-164311132, false, 3, 67108864) from 10.2.15.6:42519: error:
org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create
file /2008-05-05/ONE-2008-05-05-14.gz for DFSClient_-164311132 on
client 10.2.15.6 because current leaseholder is trying to recreate
file.
org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create
file /2008-05-05/ONE-2008-05-05-14.gz for DFSClient_-164311132 on
client 10.2.15.6 because current leaseholder is trying to recreate
file.
at 
org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:850)
at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:806)
at org.apache.hadoop.dfs.NameNode.create(NameNode.java:276)
at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)


I was wondering if anyone could offer some insight as to possible
causes for this.  Only one process was attempting to use copyFromLocal
on the file at that time (although five other machines were attempting
to copy files differently named to the same directory using
copyFromLocal).  Our grid was under very heavy load at that time - so
might this have something to do with the following exception?  It
should be noted that the following attempt to use copyFromLocal an
hour after this incident was successful.

Further, is there a way to specify that failed copies not leave
artifacts?  In this particular case, a size 0 file
ONE-2008-05-05-14.gz was left on the grid.

-- 
Bo Shi
(207) 469-8264 (M)
25 Kingston St, 5th Fl
Boston, MA 02111 USA


Re: Monthly Hadoop user group meetings

2008-05-06 Thread 小龙
-Will  the meeting be recorded as electronic type?
I wanna take part in , but I 'm not in US.
:D

-- 
deSign thE  fuTure
http://www.freeis.cn/


Hadoop Resiliency

2008-05-06 Thread Arv Mistry
 
Hi folks,

I'm new to hadoop and just had a few questions regards resiliency

i) Does hadoop support redundant NameNodes? I didn't see any mention of
it.

ii) In a distributed setup, when you kill a DataNode, should the
NameNode restart it automatically? I see the NameNode detects
(eventually) that its down but it never seems to restart it. Is the
expectation that some kind of wrapper (e.g. Java Service Wrapper) will
do this.

iii) Maybe an obvious one, but I couldn't see how you just start a
DataNode from the scripts. Should I create my own script to do that,
based on start_dfs?

Cheers Arv


Re: Monthly Hadoop user group meetings

2008-05-06 Thread Leon Mergen
On Tue, May 6, 2008 at 6:59 PM, Cole Flournoy [EMAIL PROTECTED]
wrote:

 Is there anyway we could set up some off site web cam conferencing
 abilities?  I would love to attend, but I am on the east coast.


Seconded. I'm from Europe, and am pretty sure that I will watch any video
about a Hadoop conference I can get my hands on, including this. :-)

-- 
Leon Mergen
http://www.solatis.com


Re: HDFS: fault tolerance to block losses with namenode failure

2008-05-06 Thread Dhruba Borthakur
Starting in 0.17 release, an application can invoke
DFSOutputStream.fsync() to persist block locations for a file even
before the file is closed.

thanks,
dhruba


On Tue, May 6, 2008 at 8:11 AM, Cagdas Gerede [EMAIL PROTECTED] wrote:
 If you are writing 10 blocks for a file and let's say in 10th block namenode
  fails, all previous 9 blocks are lost because you were not able to close the
  file and therefore namenode did not persist the information about 9 blocks
  to the fsimage file.

  How would you solve this problem in the application? Why does the hdfs
  client make namenode persist every block once the block is written and
  instead waits until closing of the file? Then, don't you need to keep a copy
  of all the blocks in your application until you close the file successfully
  to prevent data loss. Does it make sense to have this semantics with the
  assumption of very large files with multiple blocks?

  Thanks for your response,

  --
  
  Best Regards, Cagdas Evren Gerede
  Home Page: http://cagdasgerede.info



RE: How do I copy files from my linux file system to HDFS using a java prog?

2008-05-06 Thread Ajey Shah

Thanks Suresh. But even this program reads and writes from the HDFS. What i
need to do is read from my normal local linux harddrive and write to the
HDFS.

I'm sorry if I misunderstood your program.

Thanks for replying. :)



Babu, Suresh wrote:
 
 
 Try this program. Modify the HDFS configuration, if it is different from
 the default.
 
 import java.io.File;
 import java.io.IOException;
 
 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.fs.FileStatus;
 import org.apache.hadoop.fs.FileSystem;
 import org.apache.hadoop.fs.FSDataInputStream;
 import org.apache.hadoop.fs.FSDataOutputStream;
 import org.apache.hadoop.fs.Path;
 import org.apache.hadoop.io.IOUtils;
 
 public class HadoopDFSFileReadWrite {
 
   static void usage () {
 System.out.println(Usage : HadoopDFSFileReadWrite inputfile
 output file);
 System.exit(1);
   }
 
   static void printAndExit(String str) {
 System.err.println(str);
 System.exit(1);
   }
 
   public static void main (String[] argv) throws IOException {
 Configuration conf = new Configuration();
 conf.set(fs.default.name, localhost:9000);
 FileSystem fs = FileSystem.get(conf);
 
 FileStatus[] fileStatus = fs.listStatus(fs.getHomeDirectory());
 for(FileStatus status : fileStatus) {
 System.out.println(File:  + status.getPath());
 }
 
 if (argv.length != 2)
   usage();
 
 // HadoopDFS deals with Path
 Path inFile = new Path(argv[0]);
 Path outFile = new Path(argv[1]);
 
 // Check if input/output are valid
 if (!fs.exists(inFile))
   printAndExit(Input file not found);
 if (!fs.isFile(inFile))
   printAndExit(Input should be a file);
 if (fs.exists(outFile))
   printAndExit(Output already exists);
 
 // Read from and write to new file
 FSDataInputStream in = fs.open(inFile);
 FSDataOutputStream out = fs.create(outFile);
 byte buffer[] = new byte[256];
 try {
   int bytesRead = 0;
   while ((bytesRead = in.read(buffer))  0) {
 out.write(buffer, 0, bytesRead);
   }
 
 } catch (IOException e) {
   System.out.println(Error while copying file);
 } finally {
   in.close();
   out.close();
 }
   }
 }
 
 Suresh
 
 
 -Original Message-
 From: Ajey Shah [mailto:[EMAIL PROTECTED] 
 Sent: Thursday, May 01, 2008 3:31 AM
 To: core-user@hadoop.apache.org
 Subject: How do I copy files from my linux file system to HDFS using a
 java prog?
 
 
 Hello all,
 
 I need to copy files from my linux file system to HDFS in a java program
 and not manually. This is the piece of code that I have.
 
 try {
 
   FileSystem hdfs = FileSystem.get(new
 Configuration());
   
   LocalFileSystem ls = null;
   
   ls = hdfs.getLocal(hdfs.getConf());
   
   hdfs.copyFromLocalFile(false, new
 Path(fileName), new Path(outputFile));
 
   } catch (Exception e) {
   e.printStackTrace();
   }
 
 The problem is that it searches for the input path on the HDFS and not
 my linux file system.
 
 Can someone point out where I may be wrong. I feel it's some
 configuration issue but have not been able to figure it out. 
 
 Thanks.
 --
 View this message in context:
 http://www.nabble.com/How-do-I-copy-files-from-my-linux-file-system-to-H
 DFS-using-a-java-prog--tp16992491p16992491.html
 Sent from the Hadoop core-user mailing list archive at Nabble.com.
 
 
 

-- 
View this message in context: 
http://www.nabble.com/How-do-I-copy-files-from-my-linux-file-system-to-HDFS-using-a-java-prog--tp16992491p17093646.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.



Re: How do I copy files from my linux file system to HDFS using a java prog?

2008-05-06 Thread Ted Dunning

I think that file names of the form file://directory-path should work to
give you local file access using this program.


On 5/6/08 3:34 PM, Ajey Shah [EMAIL PROTECTED] wrote:

 
 Thanks Suresh. But even this program reads and writes from the HDFS. What i
 need to do is read from my normal local linux harddrive and write to the
 HDFS.
 
 I'm sorry if I misunderstood your program.
 
 Thanks for replying. :)
 
 
 
 Babu, Suresh wrote:
 
 
 Try this program. Modify the HDFS configuration, if it is different from
 the default.
 
 import java.io.File;
 import java.io.IOException;
 
 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.fs.FileStatus;
 import org.apache.hadoop.fs.FileSystem;
 import org.apache.hadoop.fs.FSDataInputStream;
 import org.apache.hadoop.fs.FSDataOutputStream;
 import org.apache.hadoop.fs.Path;
 import org.apache.hadoop.io.IOUtils;
 
 public class HadoopDFSFileReadWrite {
 
   static void usage () {
 System.out.println(Usage : HadoopDFSFileReadWrite inputfile
 output file);
 System.exit(1);
   }
 
   static void printAndExit(String str) {
 System.err.println(str);
 System.exit(1);
   }
 
   public static void main (String[] argv) throws IOException {
 Configuration conf = new Configuration();
 conf.set(fs.default.name, localhost:9000);
 FileSystem fs = FileSystem.get(conf);
 
 FileStatus[] fileStatus = fs.listStatus(fs.getHomeDirectory());
 for(FileStatus status : fileStatus) {
 System.out.println(File:  + status.getPath());
 }
 
 if (argv.length != 2)
   usage();
 
 // HadoopDFS deals with Path
 Path inFile = new Path(argv[0]);
 Path outFile = new Path(argv[1]);
 
 // Check if input/output are valid
 if (!fs.exists(inFile))
   printAndExit(Input file not found);
 if (!fs.isFile(inFile))
   printAndExit(Input should be a file);
 if (fs.exists(outFile))
   printAndExit(Output already exists);
 
 // Read from and write to new file
 FSDataInputStream in = fs.open(inFile);
 FSDataOutputStream out = fs.create(outFile);
 byte buffer[] = new byte[256];
 try {
   int bytesRead = 0;
   while ((bytesRead = in.read(buffer))  0) {
 out.write(buffer, 0, bytesRead);
   }
 
 } catch (IOException e) {
   System.out.println(Error while copying file);
 } finally {
   in.close();
   out.close();
 }
   }
 }
 
 Suresh
 
 
 -Original Message-
 From: Ajey Shah [mailto:[EMAIL PROTECTED]
 Sent: Thursday, May 01, 2008 3:31 AM
 To: core-user@hadoop.apache.org
 Subject: How do I copy files from my linux file system to HDFS using a
 java prog?
 
 
 Hello all,
 
 I need to copy files from my linux file system to HDFS in a java program
 and not manually. This is the piece of code that I have.
 
 try {
 
 FileSystem hdfs = FileSystem.get(new
 Configuration());
 
 LocalFileSystem ls = null;
 
 ls = hdfs.getLocal(hdfs.getConf());
 
 hdfs.copyFromLocalFile(false, new
 Path(fileName), new Path(outputFile));
 
 } catch (Exception e) {
 e.printStackTrace();
 }
 
 The problem is that it searches for the input path on the HDFS and not
 my linux file system.
 
 Can someone point out where I may be wrong. I feel it's some
 configuration issue but have not been able to figure it out.
 
 Thanks.
 --
 View this message in context:
 http://www.nabble.com/How-do-I-copy-files-from-my-linux-file-system-to-H
 DFS-using-a-java-prog--tp16992491p16992491.html
 Sent from the Hadoop core-user mailing list archive at Nabble.com.
 
 
 



Re: Hadoop Resiliency

2008-05-06 Thread Otis Gospodnetic
Hi Arv,

1) look for info on secondary NameNode on the Hadoop wiki and ML archives.

2) I don't think a NN is supposed to restart a killed DN.  I haven't tried it, 
haven't seen it, but haven't read that anywhere either.

3) I think bin/start-dfs.sh is what you are after, no?  Or at least the one of 
the last couple of lines from there.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message 
 From: Arv Mistry [EMAIL PROTECTED]
 To: core-user@hadoop.apache.org
 Sent: Tuesday, May 6, 2008 3:04:56 PM
 Subject: Hadoop Resiliency
 
  
 Hi folks,
 
 I'm new to hadoop and just had a few questions regards resiliency
 
 i) Does hadoop support redundant NameNodes? I didn't see any mention of
 it.
 
 ii) In a distributed setup, when you kill a DataNode, should the
 NameNode restart it automatically? I see the NameNode detects
 (eventually) that its down but it never seems to restart it. Is the
 expectation that some kind of wrapper (e.g. Java Service Wrapper) will
 do this.
 
 iii) Maybe an obvious one, but I couldn't see how you just start a
 DataNode from the scripts. Should I create my own script to do that,
 based on start_dfs?
 
 Cheers Arv
 




Re: Monthly Hadoop user group meetings

2008-05-06 Thread Zhou, Yunqing
Thirded. I'm doing my machine learning experiment on a hadoop cluster and
eagering to acquire more info on it. :-)

2008/5/7, Leon Mergen [EMAIL PROTECTED]:

 On Tue, May 6, 2008 at 6:59 PM, Cole Flournoy [EMAIL PROTECTED]
 wrote:

  Is there anyway we could set up some off site web cam conferencing
  abilities?  I would love to attend, but I am on the east coast.


 Seconded. I'm from Europe, and am pretty sure that I will watch any video
 about a Hadoop conference I can get my hands on, including this. :-)

 --
 Leon Mergen
 http://www.solatis.com



Collecting output not to file

2008-05-06 Thread Derek Shaw
Hey,

From the examples that I have seen thus far, all of the results from the 
reduce function are being written to a file. Instead of writing results to a 
file, I want to store them and inspect them after the job is completed. (I 
think that I need to implement my own OutputCollector, but I don't know how to 
tell hadoop to use it.) How can I do this?

-Derek


RE: How do I copy files from my linux file system to HDFS using a java prog?

2008-05-06 Thread Babu, Suresh
This program can read from local file system as well as HDFS.
Try java program Localfile HDFSDestination

Thanks
Suresh

-Original Message-
From: Ajey Shah [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, May 07, 2008 4:04 AM
To: core-user@hadoop.apache.org
Subject: RE: How do I copy files from my linux file system to HDFS using
a java prog?


Thanks Suresh. But even this program reads and writes from the HDFS.
What i need to do is read from my normal local linux harddrive and write
to the HDFS.

I'm sorry if I misunderstood your program.

Thanks for replying. :)



Babu, Suresh wrote:
 
 
 Try this program. Modify the HDFS configuration, if it is different 
 from the default.
 
 import java.io.File;
 import java.io.IOException;
 
 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.fs.FileStatus; import 
 org.apache.hadoop.fs.FileSystem; import 
 org.apache.hadoop.fs.FSDataInputStream;
 import org.apache.hadoop.fs.FSDataOutputStream;
 import org.apache.hadoop.fs.Path;
 import org.apache.hadoop.io.IOUtils;
 
 public class HadoopDFSFileReadWrite {
 
   static void usage () {
 System.out.println(Usage : HadoopDFSFileReadWrite inputfile 
 output file);
 System.exit(1);
   }
 
   static void printAndExit(String str) {
 System.err.println(str);
 System.exit(1);
   }
 
   public static void main (String[] argv) throws IOException {
 Configuration conf = new Configuration();
 conf.set(fs.default.name, localhost:9000);
 FileSystem fs = FileSystem.get(conf);
 
 FileStatus[] fileStatus = fs.listStatus(fs.getHomeDirectory());
 for(FileStatus status : fileStatus) {
 System.out.println(File:  + status.getPath());
 }
 
 if (argv.length != 2)
   usage();
 
 // HadoopDFS deals with Path
 Path inFile = new Path(argv[0]);
 Path outFile = new Path(argv[1]);
 
 // Check if input/output are valid
 if (!fs.exists(inFile))
   printAndExit(Input file not found);
 if (!fs.isFile(inFile))
   printAndExit(Input should be a file);
 if (fs.exists(outFile))
   printAndExit(Output already exists);
 
 // Read from and write to new file
 FSDataInputStream in = fs.open(inFile);
 FSDataOutputStream out = fs.create(outFile);
 byte buffer[] = new byte[256];
 try {
   int bytesRead = 0;
   while ((bytesRead = in.read(buffer))  0) {
 out.write(buffer, 0, bytesRead);
   }
 
 } catch (IOException e) {
   System.out.println(Error while copying file);
 } finally {
   in.close();
   out.close();
 }
   }
 }
 
 Suresh
 
 
 -Original Message-
 From: Ajey Shah [mailto:[EMAIL PROTECTED]
 Sent: Thursday, May 01, 2008 3:31 AM
 To: core-user@hadoop.apache.org
 Subject: How do I copy files from my linux file system to HDFS using a

 java prog?
 
 
 Hello all,
 
 I need to copy files from my linux file system to HDFS in a java 
 program and not manually. This is the piece of code that I have.
 
 try {
 
   FileSystem hdfs = FileSystem.get(new
Configuration());
   
   LocalFileSystem ls = null;
   
   ls = hdfs.getLocal(hdfs.getConf());
   
   hdfs.copyFromLocalFile(false, new
 Path(fileName), new Path(outputFile));
 
   } catch (Exception e) {
   e.printStackTrace();
   }
 
 The problem is that it searches for the input path on the HDFS and not

 my linux file system.
 
 Can someone point out where I may be wrong. I feel it's some 
 configuration issue but have not been able to figure it out.
 
 Thanks.
 --
 View this message in context:
 http://www.nabble.com/How-do-I-copy-files-from-my-linux-file-system-to
 -H DFS-using-a-java-prog--tp16992491p16992491.html
 Sent from the Hadoop core-user mailing list archive at Nabble.com.
 
 
 

--
View this message in context:
http://www.nabble.com/How-do-I-copy-files-from-my-linux-file-system-to-H
DFS-using-a-java-prog--tp16992491p17093646.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.