Re: Hadoop calculates PI
Hi, There are actually three MapReduce example programs for computing pi. pi - uses a qMC method (a powerful method which could evaluate arbitrary integrals but not particularly good in computing pi), bbp - uses a BBP formula, each task computes a few digits of pi in a specific position (e.g. task 1 computes 1st - 4th digits, task 2 computes 5th - 8th digits, etc.) distbbp - uses also a BBP formula but evaluates the formula in a distributed manner. pi is only able to compute ~10 digits even with a large number of samples. I got the following result in HADOOP-4437. 1000 maps and 1000 samples per map. Job Finished in 67.337 seconds Estimated value of PI is 3.141592645200 bbp is able to compute millions of digits (I forgot if it could scale to billions but it definitely won't work well on trillions.) See HADOOP-5052. distbbp is able to compute digits of pi up to quadrillionth (10^15 th) positions using a large cluster. Note that it skips to a particular position and computes the digits starting at that position. See MAPREDUCE-637 and MAPREDUCE-1923. See also the articles at the end. Note that bbp and distbbp available in 2.0.0 and above (also 0.21 and above) but neither 1.x.x nor 0.20.x. Thanks for being interested in it! Tsz-Wo -- - The Two Quadrillionth Bit of Pi is 0! Distributed Computation of Pi with Apache Hadoop http://arxiv.org/abs/1008.3171 - BBC News: Pi record smashed as team finds two-quadrillionth digit http://www.bbc.co.uk/news/technology-11313194 - New Scientist: New pi record exploits Yahoo's computers http://www.newscientist.com/article/dn19465-new-pi-record-exploits-yahoos-computers.html - CNN Money Tech: Yahoo exec finds two-quadrillionth digit of pi http://cnnmoneytech.tumblr.com/post/1137357695/yahoo-exec-finds-two-quadrillionth-digit-of-pi - David Bailey (mathematician) Yahoo! researcher computes binary digits of pi beginning at two quadrillionth digit http://experimentalmath.info/blog/2010/09/yahoo-researcher-computes-binary-digits-of-pi-beginning-at-two-quadrillionth-digit/ - Communications of the ACM: New Pi Record Exploits Yahoo's Computers http://cacm.acm.org/news/99207-new-pi-record-exploits-yahoos-computers - Communications of the ACM: Math at Web Speed http://mags.acm.org/communications/201011?pg=20#pg20 - computing now (IEEE): Yahoo Sets Record for Pi Bit Calculation http://www.computer.org/portal/web/news/home/-/blogs/3147549 - The Register: Yahoo! boffin scores pi's two quadrillionth bit http://www.theregister.co.uk/2010/09/16/pi_record_at_yahoo/ - ReadWriteCloud A Cloud Computing Milestone: Yahoo! Reaches the 2 Quadrillionth Bit of Pi http://www.readwriteweb.com/cloud/2010/09/a-cloud-computing-milestone-ya.php - ZDNet: Hadoop used to calculate Pi's two quadrillionth bit http://www.zdnet.co.uk/blogs/mapping-babel-10017967/hadoop-used-to-calculate-pis-two-quadrillionth-bit-10018670/ From: Alex Paransky ap...@standardset.com To: common-user@hadoop.apache.org Sent: Tuesday, May 8, 2012 5:35 PM Subject: Hadoop calculates PI So, I installed Hadoop on my imac via port install hadoop and after working through a few configuration issues tried to test the setup with calculation of PI. Unfortunately, I got this answer: Estimated value of Pi is *3.1480* Which is not what I expected. Is there something that I missed? Thanks for any help you can offer. Here is the job output: hadoop-1.0.2 $ hadoop-bin hadoop jar $HADOOP_HOME/hadoop-examples-*.jar pi 10 100 Warning: $HADOOP_HOME is deprecated. Number of Maps = 10 Samples per Map = 100 Wrote input for Map #0 Wrote input for Map #1 Wrote input for Map #2 Wrote input for Map #3 Wrote input for Map #4 Wrote input for Map #5 Wrote input for Map #6 Wrote input for Map #7 Wrote input for Map #8 Wrote input for Map #9 Starting Job 12/05/08 16:15:12 INFO mapred.FileInputFormat: Total input paths to process : 10 12/05/08 16:15:13 INFO mapred.JobClient: Running job: job_201205081614_0001 12/05/08 16:15:14 INFO mapred.JobClient: map 0% reduce 0% 12/05/08 16:15:28 INFO mapred.JobClient: map 20% reduce 0% 12/05/08 16:15:34 INFO mapred.JobClient: map 40% reduce 0% 12/05/08 16:15:37 INFO mapred.JobClient: map 40% reduce 6% 12/05/08 16:15:40 INFO mapred.JobClient: map 60% reduce 6% 12/05/08 16:15:46 INFO mapred.JobClient: map 80% reduce 13% 12/05/08 16:15:52 INFO mapred.JobClient: map 100% reduce 26% 12/05/08 16:16:01 INFO mapred.JobClient: map 100% reduce 100% 12/05/08 16:16:06 INFO mapred.JobClient: Job complete: job_201205081614_0001 12/05/08 16:16:06 INFO mapred.JobClient: Counters: 27 12/05/08 16:16:06 INFO mapred.JobClient: Job Counters 12/05/08 16:16:06 INFO mapred.JobClient: Launched reduce tasks=1 12/05/08 16:16:06 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=49813 12/05/08 16:16:06 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 12/05/08 16:16:06 INFO
Re: Has anyone written a program to show total use on hdfs by directory
Hi Steve, You may use the shell command hadoop fs -count or calling FileSystem.getContentSummary(Path f) in Java. Hope it helps. Tsz-Wo From: Steve Lewis lordjoe2...@gmail.com To: mapreduce-user mapreduce-user@hadoop.apache.org; hdfs-u...@hadoop.apache.org Sent: Tuesday, October 25, 2011 5:51 PM Subject: Has anyone written a program to show total use on hdfs by directory While I can see file sizes with the web interface, it is very difficult to tell which directories are taking up space especially when nested by several levels -- Steven M. Lewis PhD 4221 105th Ave NE Kirkland, WA 98033 206-384-1340 (cell) Skype lordjoe_com
Re: Has anyone written a program to show total use on hdfs by directory
Hi Steve, You may use the shell command hadoop fs -count or calling FileSystem.getContentSummary(Path f) in Java. Hope it helps. Tsz-Wo From: Steve Lewis lordjoe2...@gmail.com To: mapreduce-user mapreduce-u...@hadoop.apache.org; hdfs-user@hadoop.apache.org Sent: Tuesday, October 25, 2011 5:51 PM Subject: Has anyone written a program to show total use on hdfs by directory While I can see file sizes with the web interface, it is very difficult to tell which directories are taking up space especially when nested by several levels -- Steven M. Lewis PhD 4221 105th Ave NE Kirkland, WA 98033 206-384-1340 (cell) Skype lordjoe_com
Re: HDFS File Appending URGENT
Hi Jagaran, Short answer: the append feature is not in any release. In this sense, it is not stable. Below are more details on the Append feature status. - 0.20.x (includes release 0.20.2) There are known bugs in append. The bugs may cause data loss. - 0.20-append There were effort on fixing the known append bugs but there are no releases. I heard Facebook was using it (with additional patches?) in production but I did not have the details. - 0.21 It has a new append design (HDFS-265). However, the 0.21.0 release is only a minor release. It has not undergone testing at scale and should not be considered stable or suitable for production. Also, 0.21 development has been discontinued. Newly discovered bugs may not be fixed. - 0.22, 0.23 Not yet released. Regards, Tsz-Wo From: jagaran das jagaran_...@yahoo.co.in To: common-user@hadoop.apache.org Sent: Fri, June 17, 2011 11:15:04 AM Subject: Fw: HDFS File Appending URGENT Please help me on this. I need it very urgently Regards, Jagaran - Forwarded Message From: jagaran das jagaran_...@yahoo.co.in To: common-user@hadoop.apache.org Sent: Thu, 16 June, 2011 9:51:51 PM Subject: Re: HDFS File Appending URGENT Thanks a lot Xiabo. I have tried with the below code in HDFS version 0.20.20 and it worked. Is it not stable yet? public class HadoopFileWriter { public static void main (String [] args) throws Exception{ try{ URI uri = new URI(hdfs://localhost:9000/Users/jagarandas/Work-Assignment/Analytics/analytics-poc/hadoop-0.20.203.0/data/test.dat); Path pt=new Path(uri); FileSystem fs = FileSystem.get(new Configuration()); BufferedWriter br; if(fs.isFile(pt)){ br=new BufferedWriter(new OutputStreamWriter(fs.append(pt))); br.newLine(); }else{ br=new BufferedWriter(new OutputStreamWriter(fs.create(pt,true))); } String line = args[0]; System.out.println(line); br.write(line); br.close(); }catch(Exception e){ e.printStackTrace(); System.out.println(File not found); } } } Thanks a lot for your help. Regards, Jagaran From: Xiaobo Gu guxiaobo1...@gmail.com To: common-user@hadoop.apache.org Sent: Thu, 16 June, 2011 8:01:14 PM Subject: Re: HDFS File Appending URGENT You can merge multiple files into a new one, there is no means to append to a existing file. On Fri, Jun 17, 2011 at 10:29 AM, jagaran das jagaran_...@yahoo.co.in wrote: Is the hadoop version Hadoop 0.20.203.0 API That means still the hadoop files in HDFS version 0.20.20 are immutable? And there is no means we can append to an existing file in HDFS? We need to do this urgently as we have do set up the pipeline accordingly in production? Regards, Jagaran From: Xiaobo Gu guxiaobo1...@gmail.com To: common-user@hadoop.apache.org Sent: Thu, 16 June, 2011 6:26:45 PM Subject: Re: HDFS File Appending please refer to FileUtil.CopyMerge On Fri, Jun 17, 2011 at 8:33 AM, jagaran das jagaran_...@yahoo.co.in wrote: Hi, We have a requirement where There would be huge number of small files to be pushed to hdfs and then use pig to do analysis. To get around the classic Small File Issue we merge the files and push a bigger file in to HDFS. But we are loosing time in this merging process of our pipeline. But If we can directly append to an existing file in HDFS we can save this Merging Files time. Can you please suggest if there a newer stable version of Hadoop where can go for appending ? Thanks and Regards, Jagaran
Re: Developing, Testing, Distributing
(Resent with -hadoopuser. Apologize if you receive multiple copies.) From: Tsz Wo (Nicholas), Sze s29752-hadoopgene...@yahoo.com To: common-user@hadoop.apache.org Sent: Fri, April 8, 2011 11:08:22 AM Subject: Re: Developing, Testing, Distributing First of all, I am a Hadoop contributor and I am familiar with the Hadoop code base/build mechanism. Here is what I do: Q1: What IDE you are using, Eclipse. Q2: What plugins to the IDE you are using No plugins. Q3: How do you test your code, which Unit test libraries your using, how do you run your automatic tests after you have finished the development? I use JUnit. The tests are executed using ant, the same way for what we did in Hadoop development. Q4: Do you have test/qa/staging environments beside the dev and the production? How do you keep it similar to the production We, Yahoo!, have test clusters which have similar settings as production cluster. Q5: Code reuse - how do you build components that can be used in other jobs, do you build generic map or reduce class? I do have my own framework for running generic computations or generic jobs. Some more details: 1) svn checkout MapReduce trunk (or common/branches/branch-0.20 for 0.20) 2) compile everything using ant 3) setup eclipse 4) remove existing files under ./src/examples 5) develop my codes under ./src/examples 6) add unit tests under ./src/test/mapred I find it very convenient since (i) the build scripts could compile the examples code, run unit test, create jar, etc., and (ii) Hadoop contributors would maintain it. Hope it helps. Nicholas Sze
Re: Developing, Testing, Distributing
First of all, I am a Hadoop contributor and I am familiar with the Hadoop code base/build mechanism. Here is what I do: Q1: What IDE you are using, Eclipse. Q2: What plugins to the IDE you are using No plugins. Q3: How do you test your code, which Unit test libraries your using, how do you run your automatic tests after you have finished the development? I use JUnit. The tests are executed using ant, the same way for what we did in Hadoop development. Q4: Do you have test/qa/staging environments beside the dev and the production? How do you keep it similar to the production We, Yahoo!, have test clusters which have similar settings as production cluster. Q5: Code reuse - how do you build components that can be used in other jobs, do you build generic map or reduce class? I do have my own framework for running generic computations or generic jobs. Some more details: 1) svn checkout MapReduce trunk (or common/branches/branch-0.20 for 0.20) 2) compile everything using ant 3) setup eclipse 4) remove existing files under ./src/examples 5) develop my codes under ./src/examples 6) add unit tests under ./src/test/mapred I find it very convenient since (i) the build scripts could compile the examples code, run unit test, create jar, etc., and (ii) Hadoop contributors would maintain it. Hope it helps. Nicholas Sze
Re: Are there any Hadoop books in print that use the new API?
Not sure if you already know: the MapReduce examples are using the new API. You may want to take a look. http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/examples/org/apache/hadoop/examples/ Regards, Nicholas On Wed, Apr 6, 2011 at 3:31 PM, W.P. McNeill bill...@gmail.com wrote: I've been working from the 2nd Edition of Tom White's *Hadoop: The Definitive Guide*, but that's still old API (0.20). Are there any books in print that use the new API? Separating old-API vs. new-API examples that you find on the internet can be tricky.
Re: Hadoop for Bioinformatics
Hi Franco, I recall that there are some Hadoop-Blast researches/projects. For examples, see - http://www.cs.umd.edu/Grad/scholarlypapers/papers/MichaelSchatz.pdf - http://salsahpc.indiana.edu/tutorial/hadoopblast.html Nicholas From: Franco Nazareno franco.nazar...@gmail.com To: common-user@hadoop.apache.org Sent: Sun, March 27, 2011 7:51:14 PM Subject: Hadoop for Bioinformatics Good day everyone! First, I want to congratulate the group for this wonderful project. It did open up new ideas and solutions in computing and technology-wise. I'm excited to learn more about it and discover possibilities using Hadoop and its components. Well I just want to ask this with regards to my study. Currently I'm studying my PhD course in Bioinformatics, and my question is that can you give me a (rough) idea if it's possible to use Hadoop cluster in achieving a DNA sequence alignment? My basic idea for this goes something like a string search out of a huge data files stored in HDFS, and the application uses MapReduce in searching and computing. As the Hadoop paradigm impies, it doesn't serve well in interactive applications, and I think this kind of searching is a write-once, read-many application. I hope you don't mind my question. And it'll be great hearing your comments or suggestions about this. Thanks and more power! Franco
Re: Zero file size after hsync
Hi Viliam, Which version of Hadoop are you using? First of all, hsyn is the same as hflush in 0.21 and above. hflush/hsync won't update the file length on the NameNode. So the answer to your question is yes. We have to call DFSDataInputStream.getVisibleLength() to get the visible length of the file. When is the SequenceFile opened? Before or after hflush/hsync? Note that only new reader can see the new data. So if the file, normal file or SequenceFile, is opened before hflush/hsync, we have to re-open the file in order to see the new data. Anyway, please feel free to file a JIRA if you feel it is a bug or you like to have a feature request. Hope it helps. Nicholas From: Viliam Holub viliam.ho...@ucd.ie To: hdfs-user@hadoop.apache.org Sent: Fri, March 18, 2011 9:29:32 AM Subject: Zero file size after hsync Hi all, size of a newly created file is reported to be zero even though I've written some data and hsync-ed them. Is that correct and expected effect? hadoop fs -cat will retrieve the data correctly. As a consequence SequenceFile fails to seek in the file since it tests the position against file size. And data are there... Thanks! Viliam
Re: PiEstimator error - Type mismatch in key from map
Hi Pedro, This is interesting. Which version of Hadoop are you using? And where did you get the example class files? Also, are you able to reproduce it deterministically? Nicholas From: Pedro Costa psdc1...@gmail.com To: mapreduce-user@hadoop.apache.org Sent: Wed, January 26, 2011 5:47:01 AM Subject: PiEstimator error - Type mismatch in key from map Hi, I run the PI example of hadoop, and I've got the following error: [code] java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.BooleanWritable, recieved org.apache.hadoop.io.LongWritable at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:885) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:551) at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:81) at org.apache.hadoop.mapreduce.Mapper.map(Mapper.java:124) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:637) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.Child.main(Child.java:190) [/code] I've look at the map function of the class PiEstimator.class and it seems ok. [code] public void map(LongWritable offset, LongWritable size, OutputCollectorBooleanWritable, LongWritable out, Reporter reporter) throws IOException {} [/code] What's wrong with this examples? Thanks, -- Pedro
Re: PiEstimator error - Type mismatch in key from map
Thanks for the info. I ran PiEstimator many many times and never have observed such problem. Nicholas From: Pedro Costa psdc1...@gmail.com To: mapreduce-user@hadoop.apache.org Sent: Wed, January 26, 2011 10:09:36 AM Subject: Re: PiEstimator error - Type mismatch in key from map Yes, I can reproduce it deterministically. But, I also did some changes to the Hadoop MR code. Most definitely this is the reason. I'm looking throughly through the code. I'll say something after I find the problem. I was just wondering if this error has happened to someone before. Maybe I could get a hint and try to see what's my problem easily. Thanks, On Wed, Jan 26, 2011 at 6:02 PM, Tsz Wo (Nicholas), Sze s29752-hadoopu...@yahoo.com wrote: Hi Pedro, This is interesting. Which version of Hadoop are you using? And where did you get the example class files? Also, are you able to reproduce it deterministically? Nicholas From: Pedro Costa psdc1...@gmail.com To: mapreduce-user@hadoop.apache.org Sent: Wed, January 26, 2011 5:47:01 AM Subject: PiEstimator error - Type mismatch in key from map Hi, I run the PI example of hadoop, and I've got the following error: [code] java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.BooleanWritable, recieved org.apache.hadoop.io.LongWritable at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:885) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:551) at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:81) ) at org.apache.hadoop.mapreduce.Mapper.map(Mapper.java:124) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:637) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.Child.main(Child.java:190) [/code] I've look at the map function of the class PiEstimator.class and it seems ok. [code] public void map(LongWritable offset, LongWritable size, OutputCollectorBooleanWritable, LongWritable out, Reporter reporter) throws IOException {} [/code] What's wrong with this examples? Thanks, -- Pedro -- Pedro
Re: PiEstimator error - Type mismatch in key from map
Hi Srihari, Same questions to you: Which version of Hadoop are you using? And where did you get the examples? I guess you were able to reproduce it. I suspect the examples and the Hadoop are in different versions. Nicholas From: Srihari Anantha Padmanabhan sriha...@yahoo-inc.com To: mapreduce-user@hadoop.apache.org mapreduce-user@hadoop.apache.org Sent: Wed, January 26, 2011 10:15:08 AM Subject: Re: PiEstimator error - Type mismatch in key from map I got a similar error before in one of my projects. I had to set the values for mapred.output.key.class and mapred.output.value.class. That resolved the issue for me. Srihari On Jan 26, 2011, at 10:09 AM, Pedro Costa wrote: Yes, I can reproduce it deterministically. But, I also did some changes to the Hadoop MR code. Most definitely this is the reason. I'm looking throughly through the code. I'll say something after I find the problem. I was just wondering if this error has happened to someone before. Maybe I could get a hint and try to see what's my problem easily. Thanks, On Wed, Jan 26, 2011 at 6:02 PM, Tsz Wo (Nicholas), Sze s29752-hadoopu...@yahoo.com wrote: Hi Pedro, This is interesting. Which version of Hadoop are you using? And where did you get the example class files? Also, are you able to reproduce it deterministically? Nicholas From: Pedro Costa psdc1...@gmail.com To: mapreduce-user@hadoop.apache.org Sent: Wed, January 26, 2011 5:47:01 AM Subject: PiEstimator error - Type mismatch in key from map Hi, I run the PI example of hadoop, and I've got the following error: [code] java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.BooleanWritable, recieved org.apache.hadoop.io.LongWritable at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:885) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:551) at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:81) at org.apache.hadoop.mapreduce.Mapper.map(Mapper.java:124) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:637) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.Child.main(Child.java:190) [/code] I've look at the map function of the class PiEstimator.class and it seems ok. [code] public void map(LongWritable offset, LongWritable size, OutputCollectorBooleanWritable, LongWritable out, Reporter reporter) throws IOException {} [/code] What's wrong with this examples? Thanks, -- Pedro -- Pedro
Re: PiEstimator error - Type mismatch in key from map
Okay, I got it now. You were talking about your programs but not the PiEstimator example came from Hadoop. Then, you have to set mapred.output.key.class and mapred.output.value.class as Srihari mentioned. Below are the APIs. //new API final Job job = ... job.setMapOutputKeyClass(BooleanWritable.class); job.setMapOutputValueClass(LongWritable.class); //old API final JobConf jobconf = ... jobconf.setOutputKeyClass(BooleanWritable.class); jobconf.setOutputValueClass(LongWritable.class); Nicholas From: Srihari Anantha Padmanabhan sriha...@yahoo-inc.com To: mapreduce-user@hadoop.apache.org mapreduce-user@hadoop.apache.org Sent: Wed, January 26, 2011 10:36:09 AM Subject: Re: PiEstimator error - Type mismatch in key from map I am using Hadoop 0.20.2. I just wrote my own map-reduce program based on the map-reduce tutorial at http://hadoop.apache.org/common/docs/r0.20.2/mapred_tutorial.html On Jan 26, 2011, at 10:27 AM, Pedro Costa wrote: Hadoop 20.1 On Wed, Jan 26, 2011 at 6:26 PM, Tsz Wo (Nicholas), Sze s29752-hadoopu...@yahoo.com wrote: Hi Srihari, Same questions to you: Which version of Hadoop are you using? And where did you get the examples? I guess you were able to reproduce it. I suspect the examples and the Hadoop are in different versions. Nicholas From: Srihari Anantha Padmanabhan sriha...@yahoo-inc.com To: mapreduce-user@hadoop.apache.org mapreduce-user@hadoop.apache.org Sent: Wed, January 26, 2011 10:15:08 AM Subject: Re: PiEstimator error - Type mismatch in key from map I got a similar error before in one of my projects. I had to set the values for mapred.output.key.class and mapred.output.value.class. That resolved the issue for me. Srihari On Jan 26, 2011, at 10:09 AM, Pedro Costa wrote: Yes, I can reproduce it deterministically. But, I also did some changes to the Hadoop MR code. Most definitely this is the reason. I'm looking throughly through the code. I'll say something after I find the problem. I was just wondering if this error has happened to someone before. Maybe I could get a hint and try to see what's my problem easily. Thanks, On Wed, Jan 26, 2011 at 6:02 PM, Tsz Wo (Nicholas), Sze s29752-hadoopu...@yahoo.com wrote: Hi Pedro, This is interesting. Which version of Hadoop are you using? And where did you get the example class files? Also, are you able to reproduce it deterministically? Nicholas From: Pedro Costa psdc1...@gmail.com To: mapreduce-user@hadoop.apache.org Sent: Wed, January 26, 2011 5:47:01 AM Subject: PiEstimator error - Type mismatch in key from map Hi, I run the PI example of hadoop, and I've got the following error: [code] java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.BooleanWritable, recieved org.apache.hadoop.io.LongWritable at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:885) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:551) at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:81) ) at org.apache.hadoop.mapreduce.Mapper.map(Mapper.java:124) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:637) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.Child.main(Child.java:190) [/code] I've look at the map function of the class PiEstimator.class and it seems ok. [code] public void map(LongWritable offset, LongWritable size, OutputCollectorBooleanWritable, LongWritable out, Reporter reporter) throws IOException {} [/code] What's wrong with this examples? Thanks, -- Pedro -- Pedro -- Pedro
Re: About hadoop-..-examples.jar
The examples package is in the MapReduce trunk. Note that it is under a different src directory src/examples but not src/java. See also http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/examples/org/apache/hadoop/examples/terasort/ Nicholas From: Bo Sang sampl...@gmail.com To: Hadoop user mail list common-user@hadoop.apache.org Sent: Thu, January 13, 2011 11:23:44 AM Subject: About hadoop-..-examples.jar Hi, guys: Do anyone know where I can get package hadoop-..-examples.jar? I want ti use TeraSort in it. It seems this package is not included in hadoop source code. And I also fail to find download links on its Homepage. -- Best Regards! Sincerely Bo Sang
Re: Prime number of reduces vs. linear hash function
You may also see Knuth's The Art of Computer Programming. I remember that there is a discussion about prime number and hash function. (It should be in Volume 3 Chapter 6. There is a section about hashing. Sorry that I don't have the book with me and can't give you the page numbers.) Nicholas From: aniket ray aniket@gmail.com To: common-user@hadoop.apache.org Sent: Mon, October 25, 2010 12:12:16 PM Subject: Re: Prime number of reduces vs. linear hash function http://computinglife.wordpress.com/2008/11/20/why-do-hash-functions-use-prime-numbers/ http://computinglife.wordpress.com/2008/11/20/why-do-hash-functions-use-prime-numbers/discusses the theory in detail. On Sun, Oct 24, 2010 at 7:30 AM, Shi Yu sh...@uchicago.edu wrote: There is a suggestion to set the number of reducers to a prime number closest to the number of nodes and number of mappers a prime number closest to several times the number of nodes in the cluster. But there is also saying that There is no need for the number of reduces to be prime. The only thing it helps is if you are using the HashPartitioner and your key's hash function is too linear. In practice, you usually want to use 99% of your reduce capacity of the cluster. Could anyone explain what is the theory behind the prime number and the hash function here? Shi
Re: Namenode warnings
Hi Runping, This is a known issue. See https://issues.apache.org/jira/browse/HDFS-625. Nicholas Sze - Original Message From: Runping Qi runping...@gmail.com To: common-user@hadoop.apache.org Sent: Wed, May 12, 2010 12:53:13 AM Subject: Namenode warnings Hi, I saw a lot of warnings like the following in namenode log: 2010-05-11 06:45:07,186 WARN /: /listPaths/s: java.lang.NullPointerException at org.apache.hadoop.hdfs.server.namenode.ListPathsServlet.doGet(ListPathsServlet.java:153) at javax.servlet.http.HttpServlet.service(HttpServlet.java:596) at javax.servlet.http.HttpServlet.service(HttpServlet.java:689) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:427) at org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHandler.java:475) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567) at org.mortbay.http.HttpContext.handle(HttpContext.java:1565) at org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationContext.java:635) at org.mortbay.http.HttpContext.handle(HttpContext.java:1517) at org.mortbay.http.HttpServer.service(HttpServer.java:954) at org.mortbay.http.HttpConnection.service(HttpConnection.java:814) at org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981) at org.mortbay.http.HttpConnection.handle(HttpConnection.java:831) at org.mortbay.http.SocketListener.handleConnection(SocketListener.java:244) at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357) I am using Hadoop 0.19. Anybody knows what might be the problem? Thanks, Runping at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534)
Re: JavaDocs for DistCp (or similar)
Oops, DistCp.main(..) calls System.exit(..) at the end. So it would also terminate your Java program. It probably is not desirable. You may still use similar codes as the ones in DistCp.main(..) as shown below. However, they are not stable APIs. //DistCp.main public static void main(String[] args) throws Exception { JobConf job = new JobConf(DistCp.class); DistCp distcp = new DistCp(job); int res = ToolRunner.run(distcp, args); System.exit(res); } Nicholas - Original Message From: Tsz Wo (Nicholas), Sze s29752-hadoopu...@yahoo.com To: common-user@hadoop.apache.org Sent: Wed, February 17, 2010 10:58:58 PM Subject: Re: JavaDocs for DistCp (or similar) Hi Balu, Unfortunately, DistCp does not have a public Java API. One simple way is to invoke DistCp.main(args) in your java program, where args is an array of the string arguments you would pass in the command line. Hope this helps. Nicholas Sze - Original Message From: Balu Vellanki To: common-user@hadoop.apache.org Sent: Wed, February 17, 2010 5:43:11 PM Subject: JavaDocs for DistCp (or similar) Hi Folks Currently we use distCp to transfer files between two hadoop clusters. I have a perl script which calls a system command “hadoop distcp” to achieve this. Is there a Java Api to do distCp, so that we can avoid system calls from our java code? Thanks Balu
Re: hflush not working for me?
Soft lease is for another writer to obtain the file lease if the original writer appears to abandon the file. In the current TestReadWhileWriting (not counting part (c) and (d)), there is only one writer. So soft lease is not related. Will check your test. Nicholas From: stack st...@duboce.net To: hdfs-user@hadoop.apache.org Sent: Fri, October 9, 2009 2:32:02 PM Subject: Re: hflush not working for me? On Fri, Oct 9, 2009 at 1:27 PM, Tsz Wo (Nicholas), Sze s29752-hadoopu...@yahoo.com wrote: Hi St.Ack, ... soft lease to 1 second ... You are right that you don't have to change soft lease. It is for append but not related to hflash. I should not have to set it then? I can remove this 70 second pause in middle of my test? Do I have to do open as another user? This should not be necessary. Could you send me/post your test? Sure, as long as you don't hold this ugly code against me ever after. I checked in the code so you could try it: http://svn.apache.org/repos/asf/hadoop/hbase/trunk/src/test/org/apache/hadoop/hbase/regionserver/TestHLog.java Its the first test, testSync. It starts out by copying whats down in the hdfs testReadWhileWriting. That bit works fine. Then comes the ugly stuff. HLog is our write-ahead log wrapper. Internally it writes out to a SequenceFile.Writer. The SequenceFile.Writer has been doctored using reflection so the out datamember is non-private. A call to HLog.sync runs the SequenceFile.Writer.sync -- which DOES NOT call sync on the backing output stream -- and then it calls sync on the now accessible out stream (Sorry its so ugly -- I'm trying to hack stuff up fast so all of hbase gets access to this new facility). If I trace in the debugger, I can see that the sync on the out data member goes down into hflush. Queued up edits are flushed. It seems like it should be working. Do I have to do some doctoring of the reader? (It doesn't seem so given that the code at the head of this test works). Thanks for taking a look Nicholas. To run the test, you can do ant clean jar test -Dtestcase=TestHLog. (Let me know if you want an eclipse .project + .classpath so you can get it up in an ide to run debugger). St.Ack Nicholas Sze From: stack st...@duboce.net To: hdfs-user@hadoop.apache.org Sent: Fri, October 9, 2009 1:13:37 PM Subject: hflush not working for me? I'm putting together some unit tests up in our application that exercise hflush. I'm using minidfscluster and a jar made by building head of the 0.21 branch of hdfs (from about a minute ago). Code opens a file, writes a bunch of edits, invokes hflush (by calling sync on DFSDataOutputStream instance) and then, without closing the Writer, opens a Reader on same file. This Reader does not see any edits not to mind edits up to the sync invocation. I can trace the code and see how on hflush it sends the queued packets of edits. I studied TestReadWhileWriting. I've set setBoolean(dfs.support.append, true) before minidfscluster spins up. I can't set soft lease to 1 second because not in same package so I just wait out the default minute. It doesn't seem to make a difference. Do I have to do open as another user? Thanks for any pointers, St.Ack
Re: how to compile HDFS-265 branch together with MAPREDUCE trunk?
Hi Zheng, I have created a script to compile everything and posted it on HDFS-265. See also https://issues.apache.org/jira/browse/HDFS-265?focusedCommentId=12760809page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12760809 Hope this helps. Nicholas Sze From: Zheng Shao zs...@facebook.com To: common-user@hadoop.apache.org common-user@hadoop.apache.org; hdfs-u...@hadoop.apache.org hdfs-u...@hadoop.apache.org Sent: Monday, October 5, 2009 3:21:55 PM Subject: how to compile HDFS-265 branch together with MAPREDUCE trunk? I got the HDFS-265 branch from hdfs and compiled it successfully, and generated hadoop-hdfs-*.jar. But I also need mapreduce. Is there an easy to compile hdfs and mapreduce together? I need HDFS-265 branch, instead of the default one when I check out and build “common”. Thanks, Zheng
Re: how to compile HDFS-265 branch together with MAPREDUCE trunk?
Hi Zheng, I have created a script to compile everything and posted it on HDFS-265. See also https://issues.apache.org/jira/browse/HDFS-265?focusedCommentId=12760809page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12760809 Hope this helps. Nicholas Sze From: Zheng Shao zs...@facebook.com To: common-u...@hadoop.apache.org common-u...@hadoop.apache.org; hdfs-user@hadoop.apache.org hdfs-user@hadoop.apache.org Sent: Monday, October 5, 2009 3:21:55 PM Subject: how to compile HDFS-265 branch together with MAPREDUCE trunk? I got the HDFS-265 branch from hdfs and compiled it successfully, and generated hadoop-hdfs-*.jar. But I also need mapreduce. Is there an easy to compile hdfs and mapreduce together? I need HDFS-265 branch, instead of the default one when I check out and build “common”. Thanks, Zheng
Re: distcp between 0.17 and 0.18.3 issues
Hi tp, distcp definitely supports copying file from a 0.17 cluster to a 0.18 cluster. The error message is saying that the delete operation is not supported in HftpFileSystem. Would you mind to show me the actual command used? Nicholas Sze - Original Message From: charles du taiping...@gmail.com To: core-u...@hadoop.apache.org Sent: Wednesday, August 5, 2009 12:36:49 PM Subject: distcp between 0.17 and 0.18.3 issues Hi: I tried to use distcp to copy files from one cluster running hadoop 0.17.0 to another cluster running hadoop 0.18.3, and got the following errors. With failures, global counters are inaccurate; consider running with -i Copy failed: java.io.IOException: Not supported at org.apache.hadoop.dfs.HftpFileSystem.delete(HftpFileSystem.java:263) at org.apache.hadoop.fs.FileUtil.fullyDelete(FileUtil.java:119) at org.apache.hadoop.tools.DistCp.fullyDelete(DistCp.java:843) at org.apache.hadoop.tools.DistCp.copy(DistCp.java:623) at org.apache.hadoop.tools.DistCp.run(DistCp.java:768) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.tools.DistCp.main(DistCp.java:788) I ran the distcp from the 0.18.3 cluster. does the error message mean that distcp does not support 0.17.0 as the copy source? Regards -- tp
Re: distcp between 0.17 and 0.18.3 issues
hadoop distcp -i hftp://nn1:50070/src hftp://nn2:50070/dest The problem in the command above is hftp://nn2:50070/dest while hftp (i.e. HftpFileSystem) is a read-only file system. You may change it to hdfs://nn2:/dest, where is a different port. You may find the port number from the NN's web page. Nicholas - Original Message From: charles du taiping...@gmail.com To: common-user@hadoop.apache.org Sent: Wednesday, August 5, 2009 1:54:55 PM Subject: Re: distcp between 0.17 and 0.18.3 issues Hi Nicholas: The command I used is hadoop distcp -i hftp://nn1:50070/src hftp://nn2:50070/dest I ran hadoop ls on both src and destination, and it lists files just fine. nn1 is 0.17.0, and nn2 is 0.18.3 Thanks. tp On Wed, Aug 5, 2009 at 1:49 PM, Tsz Wo (Nicholas), Sze s29752-hadoopu...@yahoo.com wrote: Hi tp, distcp definitely supports copying file from a 0.17 cluster to a 0.18 cluster. The error message is saying that the delete operation is not supported in HftpFileSystem. Would you mind to show me the actual command used? Nicholas Sze - Original Message From: charles du To: core-u...@hadoop.apache.org Sent: Wednesday, August 5, 2009 12:36:49 PM Subject: distcp between 0.17 and 0.18.3 issues Hi: I tried to use distcp to copy files from one cluster running hadoop 0.17.0 to another cluster running hadoop 0.18.3, and got the following errors. With failures, global counters are inaccurate; consider running with -i Copy failed: java.io.IOException: Not supported at org.apache.hadoop.dfs.HftpFileSystem.delete(HftpFileSystem.java:263) at org.apache.hadoop.fs.FileUtil.fullyDelete(FileUtil.java:119) at org.apache.hadoop.tools.DistCp.fullyDelete(DistCp.java:843) at org.apache.hadoop.tools.DistCp.copy(DistCp.java:623) at org.apache.hadoop.tools.DistCp.run(DistCp.java:768) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.tools.DistCp.main(DistCp.java:788) I ran the distcp from the 0.18.3 cluster. does the error message mean that distcp does not support 0.17.0 as the copy source? Regards -- tp -- tp
Re: Does distcp support copying data from local directories of different nodes?
bin/hadoop distcp A://a B://b C://c hdfs://namenode/ Yes or no. In the command above, A, B and C are supposed to be schemes, e.g. hdfs, hftp, file, etc. but not host names. So the command won't work. If nodes A, B and C support some schemes (they are not necessary the same), say ftp, then you can do the following with distcp. bin/hadoop distcp ftp://A/a ftp://B/b ftp://C/c hdfs://namenode/ Hope this help. Nicholas Sze - Original Message From: Martin Mituzas xietao1...@hotmail.com To: core-u...@hadoop.apache.org Sent: Wednesday, July 15, 2009 2:25:52 AM Subject: Does distcp support copying data from local directories of different nodes? I mean if I have different directories on node A, B, C, can I put them together as source directory arguments to copy them into HDFS? like: bin/hadoop distcp A://a B://b C://c hdfs://namenode/ Thanks! -- View this message in context: http://www.nabble.com/Does-distcp-support-copying-data-from-local-directories-of-different-nodes--tp24494574p24494574.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
Re: A brief report of Second Hadoop in China Salon
Congratulations! Nicholas Sze - Original Message From: He Yongqiang heyongqi...@software.ict.ac.cn To: core-...@hadoop.apache.org core-...@hadoop.apache.org; core-user@hadoop.apache.org core-user@hadoop.apache.org Sent: Friday, May 15, 2009 6:09:50 PM Subject: A brief report of Second Hadoop in China Salon Hi, all In May 9, we held the second Hadoop In China salon. About 150 people attended, 46% of them are engineers/managers from industry companies, and 38% of them are students/professors from universities and institutes. This salon was successfully held with great technical support from Yahoo! Beijing RD, Zheng Shao from Facebook Inc., Wang Shouyan from Baidu Inc. and many other high technology companies in China. We got over one hundred feedbacks from attendees, and most of them are interested in details and wants more discussions. And 1/3 of them want we to include more topics or more sessions for hadoop subprojects. And most students/professors want to be more familiar with hadoop and try to find new research topic on top of hadoop. Most students want to involve themselves and contribute to hadoop, but do not know how or find it is a little difficulty because of language/zone problems. Thank you all the attendees again. Without you, it would never success. We already put the slides on site: www.hadooper.cn, and the videos are coming soon. BTW, I insist on letting this event to be nonprofit. In the past two meetings, we did not charge anyone for anything.
Re: Doubt regarding permissions
Hi Amar, I just have tried. Everything worked as expected. I guess user A in your experiment was a superuser so that he could read anything. Nicholas Sze /// permission testing // drwx-wx-wx - nicholas supergroup 0 2009-04-13 10:55 /temp drwx-w--w- - tsz supergroup 0 2009-04-13 10:58 /temp/test -rw-r--r-- 3 tsz supergroup 1366 2009-04-13 10:58 /temp/test/r.txt //login as nicholas (non-superuser) $ whoami nicholas $ ./bin/hadoop fs -lsr /temp drwx-w--w- - tsz supergroup 0 2009-04-13 10:58 /temp/test lsr: could not get get listing for 'hdfs://:9000/temp/test' : org.apache.hadoop.security.AccessControlException: Permission denied: user=nicholas, access=READ_EXECUTE, inode=test:tsz:supergroup:rwx-w--w- $ ./bin/hadoop fs -cat /temp/test/r.txt cat: org.apache.hadoop.security.AccessControlException: Permission denied: user=nicholas, access=EXECUTE, inode=test:tsz:supergroup:rwx-w--w- - Original Message From: Amar Kamat ama...@yahoo-inc.com To: core-user@hadoop.apache.org Sent: Monday, April 13, 2009 2:02:24 AM Subject: Doubt regarding permissions Hey, I tried the following : - created a dir temp for user A and permission 733 - created a dir temp/test for user B and permission 722 - - created a file temp/test/test.txt for user B and permission722 Now in HDFS, user A can list as well as read the contents of file temp/test/test.txt while on my RHEL box I cant. Is it a feature or a bug. Can someone please try this out and confirm? Thanks Amar
Re: using distcp for http source files
Hi Derek, The http in http://core:7274/logs/log.20090121; should be hftp. hftp is the scheme name of HftpFileSystem which uses http for accessing hdfs. Hope this helps. Nicholas Sze - Original Message From: Derek Young dyo...@kayak.com To: core-user@hadoop.apache.org Sent: Wednesday, January 21, 2009 1:23:56 PM Subject: using distcp for http source files I plan to use hadoop to do some log processing and I'm working on a method to load the files (probably nightly) into hdfs. My plan is to have a web server on each machine with logs that serves up the log directories. Then I would give distcp a list of http URLs of the log files and have it copy the files in. Reading http://issues.apache.org/jira/browse/HADOOP-341 it sounds like this should be supported, but the http URLs are not working for me. Are http source URLs still supported? I tried a simple test with an http source URL (using Hadoop 0.19): hadoop distcp -f http://core:7274/logs/log.20090121 /user/dyoung/mylogs This fails: With failures, global counters are inaccurate; consider running with -i Copy failed: java.io.IOException: No FileSystem for scheme: http at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1364) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:56) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1379) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:215) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:175) at org.apache.hadoop.tools.DistCp.fetchFileList(DistCp.java:578) at org.apache.hadoop.tools.DistCp.access$300(DistCp.java:74) at org.apache.hadoop.tools.DistCp$Arguments.valueOf(DistCp.java:775) at org.apache.hadoop.tools.DistCp.run(DistCp.java:844) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.tools.DistCp.main(DistCp.java:871)
Re: Block not found during commitBlockSynchronization
Which version are you using? Calling commitBlockSynchronization(...) with newgenerationstamp=0, newlength=0, newtargets=[] does not look normal. You may check the namenode log and the client log about the block blk_-4236881263392665762. Nicholas Sze - Original Message From: Brian Bockelman [EMAIL PROTECTED] To: core-user@hadoop.apache.org Sent: Friday, December 5, 2008 5:22:03 PM Subject: Block not found during commitBlockSynchronization Hey, I'm seeing this message repeated over and over in my logs: 2008-12-05 19:20:00,534 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: commitBlockSynchronization(lastblock=blk_-4236881263392665762_88597, newgenerationstamp=0, newlength=0, newtargets=[]) 2008-12-05 19:20:00,534 INFO org.apache.hadoop.ipc.Server: IPC Server handler 29 on 9000, call commitBlockSynchronization(blk_-4236881263392665762_88597, 0, 0, false, true, [Lorg.apache.hadoop.hdfs.protocol.DatanodeID;@67537412) from 172.16.1.184:57586: error: java.io.IOException: Block (=blk_-4236881263392665762_88597) not found java.io.IOException: Block (=blk_-4236881263392665762_88597) not found at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.commitBlockSynchronization(FSNamesystem.java:1898) at org.apache.hadoop.hdfs.server.namenode.NameNode.commitBlockSynchronization(NameNode.java:410) at sun.reflect.GeneratedMethodAccessor21.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:892) What can I do to debug? Brian
Re: ls command output format
Hi Alex, Yes, the doc about ls is out-dated. Thanks for pointing this out. Would you mind to file a JIRA? Nicholas Sze - Original Message From: Alexander Aristov [EMAIL PROTECTED] To: core-user@hadoop.apache.org Sent: Friday, November 21, 2008 6:08:08 AM Subject: Re: ls command output format Found out that output has been changed in 0.18 see HADOOP-2865 Docs should be also then updated. Alex 2008/11/21 Alexander Aristov Hello I wonder if hadoop shell command ls has changed output format Trying hadoop-0.18.2 I got next output [root]# hadoop fs -ls / Found 2 items drwxr-xr-x - root supergroup 0 2008-11-21 08:08 /mnt drwxr-xr-x - root supergroup 0 2008-11-21 08:19 /repos Though according to docs it should be that file name goes first. http://hadoop.apache.org/core/docs/r0.18.2/hdfs_shell.html#ls Usage: hadoop fs -ls For a file returns stat on the file with the following format: filename filesize modification_date modification_time permissions userid groupid For a directory it returns list of its direct children as in unix. A directory is listed as: dirname modification_time modification_time permissions userid groupid Example: hadoop fs -ls /user/hadoop/file1 /user/hadoop/file2 hdfs:// nn.example.com/user/hadoop/dir1 /nonexistentfile Exit Code: Returns 0 on success and -1 on error. I wouldn't notice the issue if I haven't had scripts which rely on the formatting. -- Best Regards Alexander Aristov -- Best Regards Alexander Aristov
Re: Anything like RandomAccessFile in Hadoop FS ?
Append is going to be available in 0.19 (not yet released). There are new FileSystem APIs for append, e.g. //FileSysetm.java public abstract FSDataOutputStream append(Path f, int bufferSize, Progressable progress) throws IOException; Nicholas Sze - Original Message From: Bryan Duxbury [EMAIL PROTECTED] To: Wasim Bari [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Thursday, November 13, 2008 1:11:57 PM Subject: Re: Anything like RandomAccessFile in Hadoop FS ? I'm not sure off hand. Maybe someone else can point you in the right direction? On Nov 13, 2008, at 1:09 PM, Wasim Bari wrote: Hi, Thanks for reply. HDFS supports append file. How can I do this ? I tried to look API under fileSystem create method but couldn't find. Thanks for ur help. Wasim -- From: Bryan Duxbury Sent: Thursday, November 13, 2008 9:48 PM To: Subject: Re: Anything like RandomAccessFile in Hadoop FS ? If you mean a file where you can write anywhere, then no. HDFS is streaming only. If you want to read from anywhere, then no problem - just use seek() and then read. On Nov 13, 2008, at 11:40 AM, Wasim Bari wrote: Hi, Is there any Utility for Hadoop files which can work same as RandomAccessFile in Java ? Thanks, Wasim
Re: DistCp 0.18 Vs DistCp 0.17
There was a code refactoring in 0.18. So the codes have been moved around. distcp is implemented by org.apache.hadoop.util.CopyFiles in 0.17 while it is implemented by org.apache.hadoop.tools.DistCp in 0.18. There were improvements and bug fixes for distcp in 0.18 compared to 0.17. Try bin/hadoop distcp to see help messages. Nicholas Sze - Original Message From: Wasim Bari [EMAIL PROTECTED] To: core-user core-user@hadoop.apache.org Sent: Tuesday, November 11, 2008 9:09:22 AM Subject: DistCp 0.18 Vs DistCp 0.17 Hi, The package for DistCp in 0.18 is: org.Apache.Hadoop.tools. Is it same in 0.17 or different one ? is there any difference among these two versions for DistCp ? Thanks, Wasim
Re: rsync on 2 HDFS
Hi Deepika, We have a utility called distcp - distributed copy. Note that distcp itself is different from rsync. However, distcp -delete is similar to rsync --delete. distcp -delete is a new feature in 0.19. See HADOOP-3939. For more details about distcp, see http://hadoop.apache.org/core/docs/r0.18.0/distcp.html (the doc is for 0.18, so it won't mention distcp -delete. The 0.19 doc will be updated in HADOOP-3942.) Nicholas Sze - Original Message From: Deepika Khera [EMAIL PROTECTED] To: core-user@hadoop.apache.org Sent: Friday, September 5, 2008 2:42:09 PM Subject: rsync on 2 HDFS Hi, I wanted to do an rsync --delete between data in 2 HDFS system directories. Do we have a utility that could do this? I am aware that HDFS does not allow partial writes. An alternative would be to write a program to generate the list of differences in paths and then use distcp to copy the files and delete the appropriate files. Any pointers to implementations (or partial implementations)? Thanks, Deepika
Re: Please help me: is there a way to chown in Hadoop?
Yes, there is a chown command. % hadoop fs -chown ... For more help, try below % hadoop fs -help chown Nicholas Sze - Original Message From: Gopal Gandhi [EMAIL PROTECTED] To: core-user@hadoop.apache.org; [EMAIL PROTECTED] Sent: Tuesday, August 26, 2008 11:35:58 AM Subject: Please help me: is there a way to chown in Hadoop? I need to change a file's owner from userA to userB. Is there such a command? Thanks lot! % hadoop dfs -ls file /user/userA/file2008-08-25 20:00 rwxr-xr-x userAsupergroup