Re: How to perform FILE IO with Hadoop DFS
vikas wrote: Thank you very much for the right link. It really helped. As many others even I'm waiting for Append to files in HDFS Is there any thing which I can do to raise its priority. Does HADOOP Developer community is tracking any request counter for a particular feature to raise ones priority. if that is the case I would like to add my vote to this :) Apache projects celebrate community contributions more than just votes. I've registered to mailing list .. and that gives me previlage of creating JIRA and watching on one. can you tell me how I get into developer community so that if time permits even I can contribute by discussion or code. -get on the core-developer list -watch how things work. Most discussion is on on specific bugs. Note also how hudson tests all patches, rejects anything with no tests or javac, javadoc warnings. -check out SVN_HEAD and build it -start patching stuff on the side, non critical things, so that people learn to trust your coding skills. The filesystem is taken very seriously, as a failure there could lose petabytes of data. -look at the test process. All changes need to fit in there. Even if you don't have 500 machines to spare, others do, so design your changes to test in that world, and to run on bigger clusters. Note that Append is not something you are ever going to see on S3 files; it's not part of the S3 REST API. So if you assume append everywhere, your app wont be so happy on the EC2 farms. -- Steve Loughran http://www.1060.org/blogxter/publish/5 Author: Ant in Action http://antbook.org/
copyFromLocal strangeness
Hi All, I recently encountered the following exception when attempting to write to our hadoop grid This is from the name-node log: 2008-05-05 16:11:01,966 INFO org.apache.hadoop.ipc.Server: IPC Server handler 4 on 54310, call create(/2008-05-05/ONE-2008-05-05-14.gz, DFSClient_-164311132, false, 3, 67108864) from 10.2.15.6:42519: error: org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file /2008-05-05/ONE-2008-05-05-14.gz for DFSClient_-164311132 on client 10.2.15.6 because current leaseholder is trying to recreate file. org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file /2008-05-05/ONE-2008-05-05-14.gz for DFSClient_-164311132 on client 10.2.15.6 because current leaseholder is trying to recreate file. at org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:850) at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:806) at org.apache.hadoop.dfs.NameNode.create(NameNode.java:276) at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596) I was wondering if anyone could offer some insight as to possible causes for this. Only one process was attempting to use copyFromLocal on the file at that time (although five other machines were attempting to copy files differently named to the same directory using copyFromLocal). Our grid was under very heavy load at that time - so might this have something to do with the following exception? It should be noted that the following attempt to use copyFromLocal an hour after this incident was successful. Further, is there a way to specify that failed copies not leave artifacts? In this particular case, a size 0 file ONE-2008-05-05-14.gz was left on the grid. -- Bo Shi (207) 469-8264 (M) 25 Kingston St, 5th Fl Boston, MA 02111 USA
Re: Monthly Hadoop user group meetings
-Will the meeting be recorded as electronic type? I wanna take part in , but I 'm not in US. :D -- deSign thE fuTure http://www.freeis.cn/
Hadoop Resiliency
Hi folks, I'm new to hadoop and just had a few questions regards resiliency i) Does hadoop support redundant NameNodes? I didn't see any mention of it. ii) In a distributed setup, when you kill a DataNode, should the NameNode restart it automatically? I see the NameNode detects (eventually) that its down but it never seems to restart it. Is the expectation that some kind of wrapper (e.g. Java Service Wrapper) will do this. iii) Maybe an obvious one, but I couldn't see how you just start a DataNode from the scripts. Should I create my own script to do that, based on start_dfs? Cheers Arv
Re: Monthly Hadoop user group meetings
On Tue, May 6, 2008 at 6:59 PM, Cole Flournoy [EMAIL PROTECTED] wrote: Is there anyway we could set up some off site web cam conferencing abilities? I would love to attend, but I am on the east coast. Seconded. I'm from Europe, and am pretty sure that I will watch any video about a Hadoop conference I can get my hands on, including this. :-) -- Leon Mergen http://www.solatis.com
Re: HDFS: fault tolerance to block losses with namenode failure
Starting in 0.17 release, an application can invoke DFSOutputStream.fsync() to persist block locations for a file even before the file is closed. thanks, dhruba On Tue, May 6, 2008 at 8:11 AM, Cagdas Gerede [EMAIL PROTECTED] wrote: If you are writing 10 blocks for a file and let's say in 10th block namenode fails, all previous 9 blocks are lost because you were not able to close the file and therefore namenode did not persist the information about 9 blocks to the fsimage file. How would you solve this problem in the application? Why does the hdfs client make namenode persist every block once the block is written and instead waits until closing of the file? Then, don't you need to keep a copy of all the blocks in your application until you close the file successfully to prevent data loss. Does it make sense to have this semantics with the assumption of very large files with multiple blocks? Thanks for your response, -- Best Regards, Cagdas Evren Gerede Home Page: http://cagdasgerede.info
RE: How do I copy files from my linux file system to HDFS using a java prog?
Thanks Suresh. But even this program reads and writes from the HDFS. What i need to do is read from my normal local linux harddrive and write to the HDFS. I'm sorry if I misunderstood your program. Thanks for replying. :) Babu, Suresh wrote: Try this program. Modify the HDFS configuration, if it is different from the default. import java.io.File; import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileStatus; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.FSDataInputStream; import org.apache.hadoop.fs.FSDataOutputStream; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IOUtils; public class HadoopDFSFileReadWrite { static void usage () { System.out.println(Usage : HadoopDFSFileReadWrite inputfile output file); System.exit(1); } static void printAndExit(String str) { System.err.println(str); System.exit(1); } public static void main (String[] argv) throws IOException { Configuration conf = new Configuration(); conf.set(fs.default.name, localhost:9000); FileSystem fs = FileSystem.get(conf); FileStatus[] fileStatus = fs.listStatus(fs.getHomeDirectory()); for(FileStatus status : fileStatus) { System.out.println(File: + status.getPath()); } if (argv.length != 2) usage(); // HadoopDFS deals with Path Path inFile = new Path(argv[0]); Path outFile = new Path(argv[1]); // Check if input/output are valid if (!fs.exists(inFile)) printAndExit(Input file not found); if (!fs.isFile(inFile)) printAndExit(Input should be a file); if (fs.exists(outFile)) printAndExit(Output already exists); // Read from and write to new file FSDataInputStream in = fs.open(inFile); FSDataOutputStream out = fs.create(outFile); byte buffer[] = new byte[256]; try { int bytesRead = 0; while ((bytesRead = in.read(buffer)) 0) { out.write(buffer, 0, bytesRead); } } catch (IOException e) { System.out.println(Error while copying file); } finally { in.close(); out.close(); } } } Suresh -Original Message- From: Ajey Shah [mailto:[EMAIL PROTECTED] Sent: Thursday, May 01, 2008 3:31 AM To: core-user@hadoop.apache.org Subject: How do I copy files from my linux file system to HDFS using a java prog? Hello all, I need to copy files from my linux file system to HDFS in a java program and not manually. This is the piece of code that I have. try { FileSystem hdfs = FileSystem.get(new Configuration()); LocalFileSystem ls = null; ls = hdfs.getLocal(hdfs.getConf()); hdfs.copyFromLocalFile(false, new Path(fileName), new Path(outputFile)); } catch (Exception e) { e.printStackTrace(); } The problem is that it searches for the input path on the HDFS and not my linux file system. Can someone point out where I may be wrong. I feel it's some configuration issue but have not been able to figure it out. Thanks. -- View this message in context: http://www.nabble.com/How-do-I-copy-files-from-my-linux-file-system-to-H DFS-using-a-java-prog--tp16992491p16992491.html Sent from the Hadoop core-user mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/How-do-I-copy-files-from-my-linux-file-system-to-HDFS-using-a-java-prog--tp16992491p17093646.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
Re: How do I copy files from my linux file system to HDFS using a java prog?
I think that file names of the form file://directory-path should work to give you local file access using this program. On 5/6/08 3:34 PM, Ajey Shah [EMAIL PROTECTED] wrote: Thanks Suresh. But even this program reads and writes from the HDFS. What i need to do is read from my normal local linux harddrive and write to the HDFS. I'm sorry if I misunderstood your program. Thanks for replying. :) Babu, Suresh wrote: Try this program. Modify the HDFS configuration, if it is different from the default. import java.io.File; import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileStatus; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.FSDataInputStream; import org.apache.hadoop.fs.FSDataOutputStream; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IOUtils; public class HadoopDFSFileReadWrite { static void usage () { System.out.println(Usage : HadoopDFSFileReadWrite inputfile output file); System.exit(1); } static void printAndExit(String str) { System.err.println(str); System.exit(1); } public static void main (String[] argv) throws IOException { Configuration conf = new Configuration(); conf.set(fs.default.name, localhost:9000); FileSystem fs = FileSystem.get(conf); FileStatus[] fileStatus = fs.listStatus(fs.getHomeDirectory()); for(FileStatus status : fileStatus) { System.out.println(File: + status.getPath()); } if (argv.length != 2) usage(); // HadoopDFS deals with Path Path inFile = new Path(argv[0]); Path outFile = new Path(argv[1]); // Check if input/output are valid if (!fs.exists(inFile)) printAndExit(Input file not found); if (!fs.isFile(inFile)) printAndExit(Input should be a file); if (fs.exists(outFile)) printAndExit(Output already exists); // Read from and write to new file FSDataInputStream in = fs.open(inFile); FSDataOutputStream out = fs.create(outFile); byte buffer[] = new byte[256]; try { int bytesRead = 0; while ((bytesRead = in.read(buffer)) 0) { out.write(buffer, 0, bytesRead); } } catch (IOException e) { System.out.println(Error while copying file); } finally { in.close(); out.close(); } } } Suresh -Original Message- From: Ajey Shah [mailto:[EMAIL PROTECTED] Sent: Thursday, May 01, 2008 3:31 AM To: core-user@hadoop.apache.org Subject: How do I copy files from my linux file system to HDFS using a java prog? Hello all, I need to copy files from my linux file system to HDFS in a java program and not manually. This is the piece of code that I have. try { FileSystem hdfs = FileSystem.get(new Configuration()); LocalFileSystem ls = null; ls = hdfs.getLocal(hdfs.getConf()); hdfs.copyFromLocalFile(false, new Path(fileName), new Path(outputFile)); } catch (Exception e) { e.printStackTrace(); } The problem is that it searches for the input path on the HDFS and not my linux file system. Can someone point out where I may be wrong. I feel it's some configuration issue but have not been able to figure it out. Thanks. -- View this message in context: http://www.nabble.com/How-do-I-copy-files-from-my-linux-file-system-to-H DFS-using-a-java-prog--tp16992491p16992491.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
Re: Hadoop Resiliency
Hi Arv, 1) look for info on secondary NameNode on the Hadoop wiki and ML archives. 2) I don't think a NN is supposed to restart a killed DN. I haven't tried it, haven't seen it, but haven't read that anywhere either. 3) I think bin/start-dfs.sh is what you are after, no? Or at least the one of the last couple of lines from there. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Arv Mistry [EMAIL PROTECTED] To: core-user@hadoop.apache.org Sent: Tuesday, May 6, 2008 3:04:56 PM Subject: Hadoop Resiliency Hi folks, I'm new to hadoop and just had a few questions regards resiliency i) Does hadoop support redundant NameNodes? I didn't see any mention of it. ii) In a distributed setup, when you kill a DataNode, should the NameNode restart it automatically? I see the NameNode detects (eventually) that its down but it never seems to restart it. Is the expectation that some kind of wrapper (e.g. Java Service Wrapper) will do this. iii) Maybe an obvious one, but I couldn't see how you just start a DataNode from the scripts. Should I create my own script to do that, based on start_dfs? Cheers Arv
Re: Monthly Hadoop user group meetings
Thirded. I'm doing my machine learning experiment on a hadoop cluster and eagering to acquire more info on it. :-) 2008/5/7, Leon Mergen [EMAIL PROTECTED]: On Tue, May 6, 2008 at 6:59 PM, Cole Flournoy [EMAIL PROTECTED] wrote: Is there anyway we could set up some off site web cam conferencing abilities? I would love to attend, but I am on the east coast. Seconded. I'm from Europe, and am pretty sure that I will watch any video about a Hadoop conference I can get my hands on, including this. :-) -- Leon Mergen http://www.solatis.com
Collecting output not to file
Hey, From the examples that I have seen thus far, all of the results from the reduce function are being written to a file. Instead of writing results to a file, I want to store them and inspect them after the job is completed. (I think that I need to implement my own OutputCollector, but I don't know how to tell hadoop to use it.) How can I do this? -Derek
RE: How do I copy files from my linux file system to HDFS using a java prog?
This program can read from local file system as well as HDFS. Try java program Localfile HDFSDestination Thanks Suresh -Original Message- From: Ajey Shah [mailto:[EMAIL PROTECTED] Sent: Wednesday, May 07, 2008 4:04 AM To: core-user@hadoop.apache.org Subject: RE: How do I copy files from my linux file system to HDFS using a java prog? Thanks Suresh. But even this program reads and writes from the HDFS. What i need to do is read from my normal local linux harddrive and write to the HDFS. I'm sorry if I misunderstood your program. Thanks for replying. :) Babu, Suresh wrote: Try this program. Modify the HDFS configuration, if it is different from the default. import java.io.File; import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileStatus; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.FSDataInputStream; import org.apache.hadoop.fs.FSDataOutputStream; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IOUtils; public class HadoopDFSFileReadWrite { static void usage () { System.out.println(Usage : HadoopDFSFileReadWrite inputfile output file); System.exit(1); } static void printAndExit(String str) { System.err.println(str); System.exit(1); } public static void main (String[] argv) throws IOException { Configuration conf = new Configuration(); conf.set(fs.default.name, localhost:9000); FileSystem fs = FileSystem.get(conf); FileStatus[] fileStatus = fs.listStatus(fs.getHomeDirectory()); for(FileStatus status : fileStatus) { System.out.println(File: + status.getPath()); } if (argv.length != 2) usage(); // HadoopDFS deals with Path Path inFile = new Path(argv[0]); Path outFile = new Path(argv[1]); // Check if input/output are valid if (!fs.exists(inFile)) printAndExit(Input file not found); if (!fs.isFile(inFile)) printAndExit(Input should be a file); if (fs.exists(outFile)) printAndExit(Output already exists); // Read from and write to new file FSDataInputStream in = fs.open(inFile); FSDataOutputStream out = fs.create(outFile); byte buffer[] = new byte[256]; try { int bytesRead = 0; while ((bytesRead = in.read(buffer)) 0) { out.write(buffer, 0, bytesRead); } } catch (IOException e) { System.out.println(Error while copying file); } finally { in.close(); out.close(); } } } Suresh -Original Message- From: Ajey Shah [mailto:[EMAIL PROTECTED] Sent: Thursday, May 01, 2008 3:31 AM To: core-user@hadoop.apache.org Subject: How do I copy files from my linux file system to HDFS using a java prog? Hello all, I need to copy files from my linux file system to HDFS in a java program and not manually. This is the piece of code that I have. try { FileSystem hdfs = FileSystem.get(new Configuration()); LocalFileSystem ls = null; ls = hdfs.getLocal(hdfs.getConf()); hdfs.copyFromLocalFile(false, new Path(fileName), new Path(outputFile)); } catch (Exception e) { e.printStackTrace(); } The problem is that it searches for the input path on the HDFS and not my linux file system. Can someone point out where I may be wrong. I feel it's some configuration issue but have not been able to figure it out. Thanks. -- View this message in context: http://www.nabble.com/How-do-I-copy-files-from-my-linux-file-system-to -H DFS-using-a-java-prog--tp16992491p16992491.html Sent from the Hadoop core-user mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/How-do-I-copy-files-from-my-linux-file-system-to-H DFS-using-a-java-prog--tp16992491p17093646.html Sent from the Hadoop core-user mailing list archive at Nabble.com.