Re: Hadoop calculates PI

2012-05-08 Thread Tsz Wo (Nicholas), Sze
Hi,

There are actually three MapReduce example programs for computing pi.

pi - uses a qMC method (a powerful method which could evaluate arbitrary 
integrals but not particularly good in computing pi),
bbp - uses a BBP formula, each task computes a few digits of pi in a specific 
position (e.g. task 1 computes 1st - 4th digits, task 2 computes 5th - 8th 
digits, etc.)
distbbp - uses also a BBP formula but evaluates the formula in a distributed 
manner.

pi is only able to compute ~10 digits even with a large number of samples.  I 
got the following result in HADOOP-4437.

1000 maps and 1000 samples per map.
Job Finished in 67.337 seconds
Estimated value of PI is 3.141592645200

bbp is able to compute millions of digits (I forgot if it could scale to 
billions but it definitely won't work well on trillions.)  See HADOOP-5052.

distbbp is able to compute digits of pi up to quadrillionth (10^15 th) 
positions using a large cluster.  Note that it skips to a particular position 
and computes the digits starting at that position.  See MAPREDUCE-637 and 
MAPREDUCE-1923.  See also the articles at the end.

Note that bbp and distbbp available in 2.0.0 and above (also 0.21 and above) 
but neither 1.x.x nor 0.20.x.

Thanks for being interested in it!
Tsz-Wo
--
- The Two Quadrillionth Bit of Pi is 0! Distributed Computation of Pi with 
Apache Hadoop
http://arxiv.org/abs/1008.3171

- BBC News: Pi record smashed as team finds two-quadrillionth digit
http://www.bbc.co.uk/news/technology-11313194

- New Scientist: New pi record exploits Yahoo's computers
http://www.newscientist.com/article/dn19465-new-pi-record-exploits-yahoos-computers.html

- CNN Money Tech: Yahoo exec finds two-quadrillionth digit of pi
http://cnnmoneytech.tumblr.com/post/1137357695/yahoo-exec-finds-two-quadrillionth-digit-of-pi

- David Bailey (mathematician)
Yahoo! researcher computes binary digits of pi beginning at two quadrillionth 
digit
http://experimentalmath.info/blog/2010/09/yahoo-researcher-computes-binary-digits-of-pi-beginning-at-two-quadrillionth-digit/

- Communications of the ACM: New Pi Record Exploits Yahoo's Computers
http://cacm.acm.org/news/99207-new-pi-record-exploits-yahoos-computers

- Communications of the ACM: Math at Web Speed
http://mags.acm.org/communications/201011?pg=20#pg20

- computing now (IEEE): Yahoo Sets Record for Pi Bit Calculation
http://www.computer.org/portal/web/news/home/-/blogs/3147549

- The Register: Yahoo! boffin scores pi's two quadrillionth bit
http://www.theregister.co.uk/2010/09/16/pi_record_at_yahoo/

- ReadWriteCloud
A Cloud Computing Milestone: Yahoo! Reaches the 2 Quadrillionth Bit of Pi
http://www.readwriteweb.com/cloud/2010/09/a-cloud-computing-milestone-ya.php

- ZDNet: Hadoop used to calculate Pi's two quadrillionth bit
http://www.zdnet.co.uk/blogs/mapping-babel-10017967/hadoop-used-to-calculate-pis-two-quadrillionth-bit-10018670/




From: Alex Paransky ap...@standardset.com
To: common-user@hadoop.apache.org 
Sent: Tuesday, May 8, 2012 5:35 PM
Subject: Hadoop calculates PI

So, I installed Hadoop on my imac via port install hadoop and after working
through a few configuration issues tried to test the setup with calculation
of PI.  Unfortunately, I got this answer:

Estimated value of Pi is *3.1480*

Which is not what I expected.  Is there something that I missed?

Thanks for any help you can offer.

Here is the job output:
hadoop-1.0.2 $ hadoop-bin hadoop jar $HADOOP_HOME/hadoop-examples-*.jar pi
10 100
Warning: $HADOOP_HOME is deprecated.

Number of Maps  = 10
Samples per Map = 100
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Wrote input for Map #4
Wrote input for Map #5
Wrote input for Map #6
Wrote input for Map #7
Wrote input for Map #8
Wrote input for Map #9
Starting Job
12/05/08 16:15:12 INFO mapred.FileInputFormat: Total input paths to process
:
10
12/05/08 16:15:13 INFO mapred.JobClient: Running job: job_201205081614_0001
12/05/08 16:15:14 INFO mapred.JobClient:  map 0% reduce 0%
12/05/08 16:15:28 INFO mapred.JobClient:  map 20% reduce 0%
12/05/08 16:15:34 INFO mapred.JobClient:  map 40% reduce 0%
12/05/08 16:15:37 INFO mapred.JobClient:  map 40% reduce 6%
12/05/08 16:15:40 INFO mapred.JobClient:  map 60% reduce 6%
12/05/08 16:15:46 INFO mapred.JobClient:  map 80% reduce 13%
12/05/08 16:15:52 INFO mapred.JobClient:  map 100% reduce 26%
12/05/08 16:16:01 INFO mapred.JobClient:  map 100% reduce 100%
12/05/08 16:16:06 INFO mapred.JobClient: Job complete: job_201205081614_0001
12/05/08 16:16:06 INFO mapred.JobClient: Counters: 27
12/05/08 16:16:06 INFO mapred.JobClient:   Job Counters
12/05/08 16:16:06 INFO mapred.JobClient:     Launched reduce tasks=1
12/05/08 16:16:06 INFO mapred.JobClient: 
   SLOTS_MILLIS_MAPS=49813
12/05/08 16:16:06 INFO mapred.JobClient:     Total time spent by all
reduces waiting after reserving slots (ms)=0
12/05/08 16:16:06 INFO 

Re: Has anyone written a program to show total use on hdfs by directory

2011-10-25 Thread Tsz Wo (Nicholas), Sze
Hi Steve,

You may use the shell command hadoop fs -count or calling 
FileSystem.getContentSummary(Path f) in Java.


Hope it helps.

Tsz-Wo



From: Steve Lewis lordjoe2...@gmail.com
To: mapreduce-user mapreduce-user@hadoop.apache.org; 
hdfs-u...@hadoop.apache.org
Sent: Tuesday, October 25, 2011 5:51 PM
Subject: Has anyone written a program to show total use on hdfs by directory


While I can see file sizes with the web interface, it is very difficult to tell 
which directories are taking up space especially
when nested by several levels

-- 
Steven M. Lewis PhD
4221 105th Ave NE
Kirkland, WA 98033
206-384-1340 (cell)
Skype lordjoe_com

Re: Has anyone written a program to show total use on hdfs by directory

2011-10-25 Thread Tsz Wo (Nicholas), Sze
Hi Steve,

You may use the shell command hadoop fs -count or calling 
FileSystem.getContentSummary(Path f) in Java.


Hope it helps.

Tsz-Wo



From: Steve Lewis lordjoe2...@gmail.com
To: mapreduce-user mapreduce-u...@hadoop.apache.org; 
hdfs-user@hadoop.apache.org
Sent: Tuesday, October 25, 2011 5:51 PM
Subject: Has anyone written a program to show total use on hdfs by directory


While I can see file sizes with the web interface, it is very difficult to tell 
which directories are taking up space especially
when nested by several levels

-- 
Steven M. Lewis PhD
4221 105th Ave NE
Kirkland, WA 98033
206-384-1340 (cell)
Skype lordjoe_com

Re: HDFS File Appending URGENT

2011-06-17 Thread Tsz Wo (Nicholas), Sze
Hi Jagaran,

Short answer: the append feature is not in any release.  In this sense, it is 
not stable.  Below are more details on the Append feature status.

- 0.20.x (includes release 0.20.2)
There are known bugs in append.  The bugs may cause data loss.

- 0.20-append
There were effort on fixing the known append bugs but there are no releases.  I 
heard Facebook was using it (with additional patches?) in production but I did 
not have the details.

- 0.21
It has a new append design (HDFS-265).  However, the 0.21.0 release is only a 
minor release.  It has not undergone testing at scale and should not be 
considered stable or suitable for production.  Also, 0.21 development has been 
discontinued.  Newly discovered bugs may not be fixed.

- 0.22, 0.23
Not yet released.


Regards,
Tsz-Wo





From: jagaran das jagaran_...@yahoo.co.in
To: common-user@hadoop.apache.org
Sent: Fri, June 17, 2011 11:15:04 AM
Subject: Fw: HDFS File Appending URGENT

Please help me on this.
I need it very urgently

Regards,
Jagaran 


- Forwarded Message 
From: jagaran das jagaran_...@yahoo.co.in
To: common-user@hadoop.apache.org
Sent: Thu, 16 June, 2011 9:51:51 PM
Subject: Re: HDFS File Appending URGENT

Thanks a lot Xiabo.

I have tried with the  below code in HDFS version 0.20.20 and it worked.
Is it not stable yet?

public class HadoopFileWriter {
public static void main (String [] args) throws Exception{
try{
URI uri = new 
URI(hdfs://localhost:9000/Users/jagarandas/Work-Assignment/Analytics/analytics-poc/hadoop-0.20.203.0/data/test.dat);



Path pt=new Path(uri);
FileSystem fs = FileSystem.get(new Configuration());
BufferedWriter br;
if(fs.isFile(pt)){
br=new BufferedWriter(new OutputStreamWriter(fs.append(pt)));
br.newLine();
}else{
br=new BufferedWriter(new OutputStreamWriter(fs.create(pt,true)));
}
String line = args[0];
System.out.println(line);
br.write(line);
br.close();
}catch(Exception e){
e.printStackTrace();
System.out.println(File not found);
}
}
}

Thanks a lot for your help.

Regards,
Jagaran 





From: Xiaobo Gu guxiaobo1...@gmail.com
To: common-user@hadoop.apache.org
Sent: Thu, 16 June, 2011 8:01:14 PM
Subject: Re: HDFS File Appending URGENT

You can merge multiple files into a new one, there is no means to
append to a existing file.

On Fri, Jun 17, 2011 at 10:29 AM, jagaran das jagaran_...@yahoo.co.in wrote:
 Is the hadoop version Hadoop 0.20.203.0 API

 That means still the hadoop files in HDFS version 0.20.20  are immutable?
 And there is no means we can append to an existing file in HDFS?

 We need to do this urgently as we have do set up the pipeline accordingly in
 production?

 Regards,
 Jagaran



 
 From: Xiaobo Gu guxiaobo1...@gmail.com
 To: common-user@hadoop.apache.org
 Sent: Thu, 16 June, 2011 6:26:45 PM
 Subject: Re: HDFS File Appending

 please refer to FileUtil.CopyMerge

 On Fri, Jun 17, 2011 at 8:33 AM, jagaran das jagaran_...@yahoo.co.in wrote:
 Hi,

 We have a requirement where

  There would be huge number of small files to be pushed to hdfs and then use
pig
 to do analysis.
  To get around the classic Small File Issue we merge the files and push a
 bigger file in to HDFS.
  But we are loosing time in this merging process of our pipeline.

 But If we can directly append to an existing file in HDFS we can save this
 Merging Files time.

 Can you please suggest if there a newer stable version of Hadoop where can go
 for appending ?

 Thanks and Regards,
 Jagaran



Re: Developing, Testing, Distributing

2011-04-08 Thread Tsz Wo (Nicholas), Sze
(Resent with -hadoopuser.  Apologize if you receive multiple copies.)






From: Tsz Wo (Nicholas), Sze s29752-hadoopgene...@yahoo.com
To: common-user@hadoop.apache.org
Sent: Fri, April 8, 2011 11:08:22 AM
Subject: Re: Developing, Testing, Distributing


First of all, I am a Hadoop contributor and I am familiar with the Hadoop code 
base/build mechanism.  Here is what I do:


Q1: What IDE you are using,
Eclipse.

Q2: What plugins to the IDE you are using
No plugins.

Q3:  How do you test your code, which Unit test libraries your using, how do  
you run your automatic tests after you have finished the development?
I use JUnit.  The tests are executed using ant, the same way for what we did in 
Hadoop development.

Q4: Do you have test/qa/staging environments beside the dev and the production? 
How do you keep it similar to the production
We, Yahoo!, have test clusters which have similar settings as production 
cluster.

Q5: Code reuse - how do you build components that can be used in other jobs, do 
you build generic map or reduce class?
I do have my own framework for running generic computations or generic jobs.

Some more details:
1) svn checkout MapReduce trunk (or common/branches/branch-0.20 for 0.20)
2) compile everything using ant
3) setup eclipse
4) remove existing files under ./src/examples 
5) develop my codes under ./src/examples
6) add unit tests under ./src/test/mapred

I find it very convenient since (i) the build scripts could  compile the 
examples code, run unit test, create jar, etc., and (ii) Hadoop contributors 
would maintain it.

Hope it helps.
Nicholas Sze


Re: Developing, Testing, Distributing

2011-04-08 Thread Tsz Wo (Nicholas), Sze
First of all, I am a Hadoop contributor and I am familiar with the Hadoop code 
base/build mechanism.  Here is what I do:


Q1: What IDE you are using,
Eclipse.

Q2: What plugins to the IDE you are using
No plugins.

Q3:  How do you test your code, which Unit test libraries your using, how do  
you run your automatic tests after you have finished the development?
I use JUnit.  The tests are executed using ant, the same way for what we did in 
Hadoop development.

Q4: Do you have test/qa/staging environments beside the dev and the production? 
How do you keep it similar to the production
We, Yahoo!, have test clusters which have similar settings as production 
cluster.

Q5: Code reuse - how do you build components that can be used in other jobs, do 
you build generic map or reduce class?
I do have my own framework for running generic computations or generic jobs.

Some more details:
1) svn checkout MapReduce trunk (or common/branches/branch-0.20 for 0.20)
2) compile everything using ant
3) setup eclipse
4) remove existing files under ./src/examples 
5) develop my codes under ./src/examples
6) add unit tests under ./src/test/mapred

I find it very convenient since (i) the build scripts could compile the 
examples 
code, run unit test, create jar, etc., and (ii) Hadoop contributors would 
maintain it.

Hope it helps.
Nicholas Sze


Re: Are there any Hadoop books in print that use the new API?

2011-04-06 Thread Tsz Wo (Nicholas), Sze
Not sure if you already know: the MapReduce examples are using the new API.  
You 
may want to take a look.

http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/examples/org/apache/hadoop/examples/



Regards,
Nicholas



On Wed, Apr 6, 2011 at 3:31 PM, W.P. McNeill bill...@gmail.com wrote:

 I've been working from the 2nd Edition of Tom White's *Hadoop: The
 Definitive Guide*, but that's still old API (0.20).  Are there any books in
 print that use the new API?  Separating old-API vs. new-API examples that
 you find on the internet can be tricky.



Re: Hadoop for Bioinformatics

2011-03-29 Thread Tsz Wo (Nicholas), Sze
Hi Franco,

I recall that there are some Hadoop-Blast researches/projects.  For examples, 
see


- http://www.cs.umd.edu/Grad/scholarlypapers/papers/MichaelSchatz.pdf
- http://salsahpc.indiana.edu/tutorial/hadoopblast.html

Nicholas




From: Franco Nazareno franco.nazar...@gmail.com
To: common-user@hadoop.apache.org
Sent: Sun, March 27, 2011 7:51:14 PM
Subject: Hadoop for Bioinformatics

Good day everyone!



First, I want to congratulate the group for this wonderful project. It did
open up new ideas and solutions in computing and technology-wise. I'm
excited to learn more about it and discover possibilities using Hadoop and
its components. 



Well I just want to ask this with regards to my study. Currently I'm
studying my PhD course in Bioinformatics, and my question is that can you
give me a (rough) idea if it's possible to use Hadoop cluster in achieving a
DNA sequence alignment? My basic idea for this goes something like a string
search out of a huge data files stored in HDFS, and the application uses
MapReduce in searching and computing. As the Hadoop paradigm impies, it
doesn't serve well in interactive applications, and I think this kind of
searching is a write-once, read-many application.



I hope you don't mind my question. And it'll be great hearing your comments
or suggestions about this.



Thanks and more power!

Franco

Re: Zero file size after hsync

2011-03-18 Thread Tsz Wo (Nicholas), Sze
Hi Viliam,

Which version of Hadoop are you using?

First of all, hsyn is the same as hflush in 0.21 and above.  hflush/hsync won't 
update the file length on the NameNode.  So the answer to your question is yes. 
 
We have to call DFSDataInputStream.getVisibleLength() to get the visible length 
of the file.

When is the SequenceFile opened?  Before or after hflush/hsync?  Note that only 
new reader can see the new data.  So if the file, normal file or SequenceFile, 
is opened before hflush/hsync, we have to re-open the file in order to see the 
new data.

Anyway, please feel free to file a JIRA if you feel it is a bug or you like to 
have a feature request.

Hope it helps.
Nicholas





From: Viliam Holub viliam.ho...@ucd.ie
To: hdfs-user@hadoop.apache.org
Sent: Fri, March 18, 2011 9:29:32 AM
Subject: Zero file size after hsync


Hi all,

size of a newly created file is reported to be zero even though I've written
some data and hsync-ed them. Is that correct and expected effect?
hadoop fs -cat will retrieve the data correctly.

As a consequence SequenceFile fails to seek in the file since it tests the
position against file size. And data are there...

Thanks!
Viliam

Re: PiEstimator error - Type mismatch in key from map

2011-01-26 Thread Tsz Wo (Nicholas), Sze
Hi Pedro,
This is interesting.  Which version of Hadoop are you using?  And where did you 
get the example class files?  Also, are you able to reproduce it 
deterministically?
Nicholas





From: Pedro Costa psdc1...@gmail.com
To: mapreduce-user@hadoop.apache.org
Sent: Wed, January 26, 2011 5:47:01 AM
Subject: PiEstimator error - Type mismatch in key from map

Hi,

I run the PI example of hadoop, and I've got the following error:

[code]
java.io.IOException: Type mismatch in key from map: expected
org.apache.hadoop.io.BooleanWritable, recieved
org.apache.hadoop.io.LongWritable
at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:885)
at 
org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:551)
at 
org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:81)

at org.apache.hadoop.mapreduce.Mapper.map(Mapper.java:124)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:637)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.Child.main(Child.java:190)
[/code]

I've look at the map function of the class PiEstimator.class and it seems ok.

[code]
public void map(LongWritable offset,
LongWritable size,
OutputCollectorBooleanWritable, LongWritable out,
Reporter reporter) throws IOException {}
[/code]


What's wrong with this examples?

Thanks,
-- 
Pedro


Re: PiEstimator error - Type mismatch in key from map

2011-01-26 Thread Tsz Wo (Nicholas), Sze
Thanks for the info.  I ran PiEstimator many many times and never have observed 
such problem.
Nicholas





From: Pedro Costa psdc1...@gmail.com
To: mapreduce-user@hadoop.apache.org
Sent: Wed, January 26, 2011 10:09:36 AM
Subject: Re: PiEstimator error - Type mismatch in key from map

Yes, I can reproduce it deterministically. But, I also did some
changes to the Hadoop MR code. Most definitely this is the reason. I'm
looking throughly through the code.

I'll say something after I find the problem.

I was just wondering if this error has happened to someone before.
Maybe I could get a hint and try to see what's my problem easily.

Thanks,

On Wed, Jan 26, 2011 at 6:02 PM, Tsz Wo (Nicholas), Sze
s29752-hadoopu...@yahoo.com wrote:
 Hi Pedro,
 This is interesting.  Which version of Hadoop are you using?  And where did
 you get the example class files?  Also, are you able to reproduce it
 deterministically?
 Nicholas

 
 From: Pedro Costa psdc1...@gmail.com
 To: mapreduce-user@hadoop.apache.org
 Sent: Wed, January 26, 2011 5:47:01 AM
 Subject: PiEstimator error - Type mismatch in key from map

 Hi,

 I run the PI example of hadoop, and I've got the following error:

 [code]
 java.io.IOException: Type mismatch in key from map: expected
 org.apache.hadoop.io.BooleanWritable, recieved
 org.apache.hadoop.io.LongWritable
 at
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:885)
 at
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:551)
 at
org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:81)
)
 at org.apache.hadoop.mapreduce.Mapper.map(Mapper.java:124)
 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:637)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
 at org.apache.hadoop.mapred.Child.main(Child.java:190)
 [/code]

 I've look at the map function of the class PiEstimator.class and it seems
 ok.

 [code]
 public void map(LongWritable offset,
 LongWritable size,
 OutputCollectorBooleanWritable, LongWritable out,
 Reporter reporter) throws IOException {}
 [/code]


 What's wrong with this examples?

 Thanks,
 --
 Pedro




-- 
Pedro


Re: PiEstimator error - Type mismatch in key from map

2011-01-26 Thread Tsz Wo (Nicholas), Sze
Hi Srihari,

Same questions to you: Which version of Hadoop are you using?  And where did 
you 
get the examples?  I guess you were able to reproduce it.  I suspect the 
examples and the Hadoop are in different versions.

Nicholas






From: Srihari Anantha Padmanabhan sriha...@yahoo-inc.com
To: mapreduce-user@hadoop.apache.org mapreduce-user@hadoop.apache.org
Sent: Wed, January 26, 2011 10:15:08 AM
Subject: Re: PiEstimator error - Type mismatch in key from map

I got a similar error before in one of my projects. I had to set the values for 
mapred.output.key.class and mapred.output.value.class.

That resolved the issue for me. 

Srihari

On Jan 26, 2011, at 10:09 AM, Pedro Costa wrote:

Yes, I can reproduce it deterministically. But, I also did some
changes to the Hadoop MR code. Most definitely this is the reason. I'm
looking throughly through the code.

I'll say something after I find the problem.

I was just wondering if this error has happened to someone before.
Maybe I could get a hint and try to see what's my problem easily.

Thanks,

On Wed, Jan 26, 2011 at 6:02 PM, Tsz Wo (Nicholas), Sze
s29752-hadoopu...@yahoo.com wrote:

Hi Pedro,

This is interesting.  Which version of Hadoop are you using?  And where did

you get the example class files?  Also, are you able to reproduce it

deterministically?

Nicholas





From: Pedro Costa psdc1...@gmail.com

To: mapreduce-user@hadoop.apache.org

Sent: Wed, January 26, 2011 5:47:01 AM

Subject: PiEstimator error - Type mismatch in key from map



Hi,



I run the PI example of hadoop, and I've got the following error:



[code]

java.io.IOException: Type mismatch in key from map: expected

org.apache.hadoop.io.BooleanWritable, recieved

org.apache.hadoop.io.LongWritable

at

org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:885)

at

org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:551)

at

org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:81)


at org.apache.hadoop.mapreduce.Mapper.map(Mapper.java:124)

at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)

at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:637)

at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)

at org.apache.hadoop.mapred.Child.main(Child.java:190)

[/code]



I've look at the map function of the class PiEstimator.class and it seems

ok.



[code]

public void map(LongWritable offset,

LongWritable size,

OutputCollectorBooleanWritable, LongWritable out,

Reporter reporter) throws IOException {}

[/code]





What's wrong with this examples?



Thanks,

--

Pedro





-- 
Pedro

Re: PiEstimator error - Type mismatch in key from map

2011-01-26 Thread Tsz Wo (Nicholas), Sze
Okay, I got it now.  You were talking about your programs but not the 
PiEstimator example came from Hadoop.  Then, you have to set 
mapred.output.key.class and mapred.output.value.class as Srihari mentioned. 
 
Below are the APIs.

//new API
final Job job = ...
job.setMapOutputKeyClass(BooleanWritable.class);
job.setMapOutputValueClass(LongWritable.class);

//old API
final JobConf jobconf = ...
jobconf.setOutputKeyClass(BooleanWritable.class);
jobconf.setOutputValueClass(LongWritable.class);

Nicholas





From: Srihari Anantha Padmanabhan sriha...@yahoo-inc.com
To: mapreduce-user@hadoop.apache.org mapreduce-user@hadoop.apache.org
Sent: Wed, January 26, 2011 10:36:09 AM
Subject: Re: PiEstimator error - Type mismatch in key from map

I am using Hadoop 0.20.2. I just wrote my own map-reduce program based on the 
map-reduce tutorial at 
http://hadoop.apache.org/common/docs/r0.20.2/mapred_tutorial.html

On Jan 26, 2011, at 10:27 AM, Pedro Costa wrote:

 Hadoop 20.1
 
 On Wed, Jan 26, 2011 at 6:26 PM, Tsz Wo (Nicholas), Sze
 s29752-hadoopu...@yahoo.com wrote:
 Hi Srihari,
 
 Same questions to you: Which version of Hadoop are you using?  And where did
 you get the examples?  I guess you were able to reproduce it.  I suspect the
 examples and the Hadoop are in different versions.
 
 Nicholas
 
 
 
 From: Srihari Anantha Padmanabhan sriha...@yahoo-inc.com
 To: mapreduce-user@hadoop.apache.org mapreduce-user@hadoop.apache.org
 Sent: Wed, January 26, 2011 10:15:08 AM
 Subject: Re: PiEstimator error - Type mismatch in key from map
 
 I got a similar error before in one of my projects. I had to set the values
 for mapred.output.key.class and mapred.output.value.class.
 That resolved the issue for me.
 Srihari
 On Jan 26, 2011, at 10:09 AM, Pedro Costa wrote:
 
 Yes, I can reproduce it deterministically. But, I also did some
 changes to the Hadoop MR code. Most definitely this is the reason. I'm
 looking throughly through the code.
 
 I'll say something after I find the problem.
 
 I was just wondering if this error has happened to someone before.
 Maybe I could get a hint and try to see what's my problem easily.
 
 Thanks,
 
 On Wed, Jan 26, 2011 at 6:02 PM, Tsz Wo (Nicholas), Sze
 s29752-hadoopu...@yahoo.com wrote:
 
 Hi Pedro,
 
 This is interesting.  Which version of Hadoop are you using?  And where did
 
 you get the example class files?  Also, are you able to reproduce it
 
 deterministically?
 
 Nicholas
 
 
 
 From: Pedro Costa psdc1...@gmail.com
 
 To: mapreduce-user@hadoop.apache.org
 
 Sent: Wed, January 26, 2011 5:47:01 AM
 
 Subject: PiEstimator error - Type mismatch in key from map
 
 Hi,
 
 I run the PI example of hadoop, and I've got the following error:
 
 [code]
 
 java.io.IOException: Type mismatch in key from map: expected
 
 org.apache.hadoop.io.BooleanWritable, recieved
 
 org.apache.hadoop.io.LongWritable
 
 at
 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:885)
 
 at
 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:551)
 
 at
 
org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:81)
)
 
 at org.apache.hadoop.mapreduce.Mapper.map(Mapper.java:124)
 
 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
 
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:637)
 
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
 
 at org.apache.hadoop.mapred.Child.main(Child.java:190)
 
 [/code]
 
 I've look at the map function of the class PiEstimator.class and it seems
 
 ok.
 
 [code]
 
 public void map(LongWritable offset,
 
 LongWritable size,
 
 OutputCollectorBooleanWritable, LongWritable out,
 
 Reporter reporter) throws IOException {}
 
 [/code]
 
 
 What's wrong with this examples?
 
 Thanks,
 
 --
 
 Pedro
 
 
 
 
 --
 Pedro
 
 
 
 
 
 -- 
 Pedro

Re: About hadoop-..-examples.jar

2011-01-13 Thread Tsz Wo (Nicholas), Sze
The examples package is in the MapReduce trunk.  Note that it is under a 
different src directory src/examples but not src/java.

See also 
http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/examples/org/apache/hadoop/examples/terasort/


Nicholas






From: Bo Sang sampl...@gmail.com
To: Hadoop user mail list common-user@hadoop.apache.org
Sent: Thu, January 13, 2011 11:23:44 AM
Subject: About hadoop-..-examples.jar

Hi, guys:

Do anyone know where I can get package  hadoop-..-examples.jar? I want ti
use  TeraSort in it. It seems this package is not included in hadoop source
code. And I also fail to find download links on its Homepage.
-- 
Best Regards!

Sincerely
Bo Sang


Re: Prime number of reduces vs. linear hash function

2010-10-24 Thread Tsz Wo (Nicholas), Sze
You may also see Knuth's The Art of Computer Programming.  I remember that 
there is a discussion about prime number and hash function. (It should be in 
Volume 3 Chapter 6.  There is a section about hashing.  Sorry that I don't have 
the book with me and can't give you the page numbers.)


Nicholas




From: aniket ray aniket@gmail.com
To: common-user@hadoop.apache.org
Sent: Mon, October 25, 2010 12:12:16 PM
Subject: Re: Prime number of reduces vs. linear hash function

http://computinglife.wordpress.com/2008/11/20/why-do-hash-functions-use-prime-numbers/


http://computinglife.wordpress.com/2008/11/20/why-do-hash-functions-use-prime-numbers/discusses

the theory in detail.

On Sun, Oct 24, 2010 at 7:30 AM, Shi Yu sh...@uchicago.edu wrote:

 There is a suggestion to set the number of reducers to a prime number
 closest to the number of nodes and number of mappers a prime number closest
 to several times the number of nodes in the cluster. But there is also
 saying that There is no need for the number of reduces to be prime. The
 only thing it helps is if you are using the HashPartitioner and your key's
 hash function is too linear. In practice, you usually want to use 99% of
 your reduce capacity of the cluster.

 Could anyone explain what is the theory behind the prime number and the
 hash function here?

 Shi




Re: Namenode warnings

2010-05-11 Thread Tsz Wo (Nicholas), Sze
Hi Runping,
This is a known issue.  See https://issues.apache.org/jira/browse/HDFS-625.
Nicholas Sze




- Original Message 
 From: Runping Qi runping...@gmail.com
 To: common-user@hadoop.apache.org
 Sent: Wed, May 12, 2010 12:53:13 AM
 Subject: Namenode warnings
 
 Hi,

I saw a lot of  warnings like the following in namenode 
 log:

2010-05-11 06:45:07,186 WARN /: 
 /listPaths/s:
java.lang.NullPointerException

 at 
 org.apache.hadoop.hdfs.server.namenode.ListPathsServlet.doGet(ListPathsServlet.java:153)

 at 
 javax.servlet.http.HttpServlet.service(HttpServlet.java:596)

 at 
 javax.servlet.http.HttpServlet.service(HttpServlet.java:689)

 at 
 org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:427)

 at 
 org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHandler.java:475)

 at 
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567)

 at 
 org.mortbay.http.HttpContext.handle(HttpContext.java:1565)

 at 
 org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationContext.java:635)

 at 
 org.mortbay.http.HttpContext.handle(HttpContext.java:1517)

 at 
 org.mortbay.http.HttpServer.service(HttpServer.java:954)

 at 
 org.mortbay.http.HttpConnection.service(HttpConnection.java:814)

 at 
 org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981)

 at 
 org.mortbay.http.HttpConnection.handle(HttpConnection.java:831)

 at 
 org.mortbay.http.SocketListener.handleConnection(SocketListener.java:244)

 at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357)

I am 
 using Hadoop 0.19.

Anybody knows what might be the 
 problem?

Thanks,

Runping

at 
 org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534)



Re: JavaDocs for DistCp (or similar)

2010-02-17 Thread Tsz Wo (Nicholas), Sze
Oops, DistCp.main(..) calls System.exit(..) at the end.  So it would also 
terminate your Java program.  It probably is not desirable.  You may still use 
similar codes as the ones in DistCp.main(..) as shown below.  However, they are 
not stable APIs.


//DistCp.main
  public static void main(String[] args) throws Exception {
JobConf job = new JobConf(DistCp.class);
DistCp distcp = new DistCp(job);
int res = ToolRunner.run(distcp, args);
System.exit(res);
  }

Nicholas



- Original Message 
 From: Tsz Wo (Nicholas), Sze s29752-hadoopu...@yahoo.com
 To: common-user@hadoop.apache.org
 Sent: Wed, February 17, 2010 10:58:58 PM
 Subject: Re: JavaDocs for DistCp (or similar)
 
 Hi Balu,
 
 Unfortunately, DistCp does not have a public Java API.  One simple way is to 
 invoke DistCp.main(args) in your java program, where args is an array of the 
 string arguments you would pass in the command line.
 
 Hope this helps.
 Nicholas Sze
 
 
 
 
 - Original Message 
  From: Balu Vellanki 
  To: common-user@hadoop.apache.org 
  Sent: Wed, February 17, 2010 5:43:11 PM
  Subject: JavaDocs for DistCp (or similar)
  
  Hi Folks
  
  Currently we use distCp to transfer files between two hadoop clusters. I
  have a perl script which calls a system command “hadoop distcp” to
  achieve this.
  
  Is there a Java Api to do distCp, so that we can avoid system calls from our
  java code?
  
  Thanks
  Balu




Re: hflush not working for me?

2009-10-09 Thread Tsz Wo (Nicholas), Sze
Soft lease is for another writer to obtain the file lease if the original 
writer appears to abandon the file.  In the current TestReadWhileWriting (not 
counting part (c) and (d)), there is only one writer.  So soft lease is not 
related.

Will check your test.

Nicholas



From: stack st...@duboce.net
To: hdfs-user@hadoop.apache.org
Sent: Fri, October 9, 2009 2:32:02 PM
Subject: Re: hflush not working for me?

On Fri, Oct 9, 2009 at 1:27 PM, Tsz Wo (Nicholas), Sze 
s29752-hadoopu...@yahoo.com wrote:

Hi St.Ack,

 ... soft lease to 1 second ...
You are right that you don't have to change soft lease.  It is for append but 
not related to hflash.



I should not have to set it then?  I can remove this 70 second pause in middle 
of my test?

 

 Do I have to do open as another user?
This should not be necessary.

Could you send me/post your test?



Sure, as long as you don't hold this ugly code against me ever after.

I checked in the code so you could try it:  
http://svn.apache.org/repos/asf/hadoop/hbase/trunk/src/test/org/apache/hadoop/hbase/regionserver/TestHLog.java

Its the first test, testSync.

It starts out by copying whats down in the hdfs testReadWhileWriting.  That 
bit works fine.

Then comes the ugly stuff.

HLog is our write-ahead log wrapper.  Internally it writes out to a 
SequenceFile.Writer.  The SequenceFile.Writer has been doctored using 
reflection so the out datamember is non-private.  A call to HLog.sync runs the 
SequenceFile.Writer.sync -- which DOES NOT call sync on the backing output 
stream -- and then it calls sync on the now accessible out stream (Sorry its 
so ugly -- I'm trying to hack stuff up fast so all of hbase gets access to 
this new facility).  If I trace in the debugger, I can see that the sync on 
the out data member goes down into hflush.  Queued up edits are flushed.  It 
seems like it should be working.

Do I have to do some doctoring of the reader? (It doesn't seem so given that 
the code at the head of this test works).

Thanks for taking a look Nicholas.

To run the test, you can do ant clean jar test -Dtestcase=TestHLog.

(Let me know if you want an eclipse .project + .classpath so you can get it up 
in an ide to run debugger).

St.Ack



 
Nicholas Sze



From: stack st...@duboce.net
To: hdfs-user@hadoop.apache.org
Sent: Fri, October 9, 2009 1:13:37 PM
Subject: hflush not working for me?


I'm putting together some unit tests up in our application that exercise 
hflush.  I'm using minidfscluster and a jar made by building head of the 
0.21 branch of hdfs (from about a minute ago).

Code opens a file, writes a bunch of edits, invokes hflush (by calling sync 
on DFSDataOutputStream instance) and then, without closing the Writer, opens 
a Reader on same file.  This Reader does not see any edits not to mind edits 
up to the sync invocation.

I can trace the code and see how on hflush it sends the queued packets of 
edits.

I studied TestReadWhileWriting.  I've set setBoolean(dfs.support.append, 
true) before minidfscluster spins up.  I can't set soft lease to 1 second 
because not in same package so I just wait out the default minute.  It 
doesn't seem to make a difference.

Do I have to do open as another user?

Thanks for any pointers,
St.Ack



Re: how to compile HDFS-265 branch together with MAPREDUCE trunk?

2009-10-05 Thread Tsz Wo (Nicholas), Sze
Hi Zheng,

I have created a script to compile everything and posted it on HDFS-265.  See 
also 
https://issues.apache.org/jira/browse/HDFS-265?focusedCommentId=12760809page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12760809

Hope this helps.
Nicholas Sze




From: Zheng Shao zs...@facebook.com
To: common-user@hadoop.apache.org common-user@hadoop.apache.org; 
hdfs-u...@hadoop.apache.org hdfs-u...@hadoop.apache.org
Sent: Monday, October 5, 2009 3:21:55 PM
Subject: how to compile HDFS-265 branch together with MAPREDUCE trunk?

 



I got the HDFS-265 branch from hdfs and compiled it
successfully, and generated hadoop-hdfs-*.jar.
But I also need mapreduce.
 
Is there an easy to compile hdfs and mapreduce together? I
need HDFS-265 branch, instead of the default one when I check out and build
“common”.
 
Thanks,
Zheng
 

Re: how to compile HDFS-265 branch together with MAPREDUCE trunk?

2009-10-05 Thread Tsz Wo (Nicholas), Sze
Hi Zheng,

I have created a script to compile everything and posted it on HDFS-265.  See 
also 
https://issues.apache.org/jira/browse/HDFS-265?focusedCommentId=12760809page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12760809

Hope this helps.
Nicholas Sze




From: Zheng Shao zs...@facebook.com
To: common-u...@hadoop.apache.org common-u...@hadoop.apache.org; 
hdfs-user@hadoop.apache.org hdfs-user@hadoop.apache.org
Sent: Monday, October 5, 2009 3:21:55 PM
Subject: how to compile HDFS-265 branch together with MAPREDUCE trunk?

 



I got the HDFS-265 branch from hdfs and compiled it
successfully, and generated hadoop-hdfs-*.jar.
But I also need mapreduce.
 
Is there an easy to compile hdfs and mapreduce together? I
need HDFS-265 branch, instead of the default one when I check out and build
“common”.
 
Thanks,
Zheng
 

Re: distcp between 0.17 and 0.18.3 issues

2009-08-05 Thread Tsz Wo (Nicholas), Sze
Hi tp,

distcp definitely supports copying file from a 0.17 cluster to a 0.18 cluster.  
The error message is saying that the delete operation is not supported in 
HftpFileSystem.  Would you mind to show me the actual command used?

Nicholas Sze




- Original Message 
 From: charles du taiping...@gmail.com
 To: core-u...@hadoop.apache.org
 Sent: Wednesday, August 5, 2009 12:36:49 PM
 Subject: distcp between 0.17 and 0.18.3 issues
 
 Hi:
 
 I tried to use distcp to copy files from one cluster running hadoop 0.17.0
 to another cluster running hadoop 0.18.3, and got the following errors.
 
 With failures, global counters are inaccurate; consider running with -i
 Copy failed: java.io.IOException: Not supported
 at
 org.apache.hadoop.dfs.HftpFileSystem.delete(HftpFileSystem.java:263)
 at org.apache.hadoop.fs.FileUtil.fullyDelete(FileUtil.java:119)
 at org.apache.hadoop.tools.DistCp.fullyDelete(DistCp.java:843)
 at org.apache.hadoop.tools.DistCp.copy(DistCp.java:623)
 at org.apache.hadoop.tools.DistCp.run(DistCp.java:768)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
 at org.apache.hadoop.tools.DistCp.main(DistCp.java:788)
 
 
 I ran the distcp from the 0.18.3 cluster. does the error message mean that
 distcp does not support 0.17.0 as the copy source?
 
 Regards
 
 -- 
 tp



Re: distcp between 0.17 and 0.18.3 issues

2009-08-05 Thread Tsz Wo (Nicholas), Sze
hadoop distcp -i hftp://nn1:50070/src hftp://nn2:50070/dest
The problem in the command above is hftp://nn2:50070/dest while hftp (i.e. 
HftpFileSystem) is a read-only file system.  You may change it to 
hdfs://nn2:/dest, where  is a different port.  You may find the port 
number from the NN's web page.


Nicholas



- Original Message 
 From: charles du taiping...@gmail.com
 To: common-user@hadoop.apache.org
 Sent: Wednesday, August 5, 2009 1:54:55 PM
 Subject: Re: distcp between 0.17 and 0.18.3 issues
 
 Hi Nicholas:
 
 The command I used is
 
hadoop distcp -i hftp://nn1:50070/src hftp://nn2:50070/dest
 
 I ran hadoop ls on both src and destination, and it lists files just fine.
 nn1 is 0.17.0, and nn2 is 0.18.3
 
 Thanks.
 
 tp
 
 On Wed, Aug 5, 2009 at 1:49 PM, Tsz Wo (Nicholas), Sze 
 s29752-hadoopu...@yahoo.com wrote:
 
  Hi tp,
 
  distcp definitely supports copying file from a 0.17 cluster to a 0.18
  cluster.  The error message is saying that the delete operation is not
  supported in HftpFileSystem.  Would you mind to show me the actual command
  used?
 
  Nicholas Sze
 
 
 
 
  - Original Message 
   From: charles du 
   To: core-u...@hadoop.apache.org
   Sent: Wednesday, August 5, 2009 12:36:49 PM
   Subject: distcp between 0.17 and 0.18.3 issues
  
   Hi:
  
   I tried to use distcp to copy files from one cluster running hadoop
  0.17.0
   to another cluster running hadoop 0.18.3, and got the following errors.
  
   With failures, global counters are inaccurate; consider running with -i
   Copy failed: java.io.IOException: Not supported
   at
   org.apache.hadoop.dfs.HftpFileSystem.delete(HftpFileSystem.java:263)
   at org.apache.hadoop.fs.FileUtil.fullyDelete(FileUtil.java:119)
   at org.apache.hadoop.tools.DistCp.fullyDelete(DistCp.java:843)
   at org.apache.hadoop.tools.DistCp.copy(DistCp.java:623)
   at org.apache.hadoop.tools.DistCp.run(DistCp.java:768)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
   at org.apache.hadoop.tools.DistCp.main(DistCp.java:788)
  
  
   I ran the distcp from the 0.18.3 cluster. does the error message mean
  that
   distcp does not support 0.17.0 as the copy source?
  
   Regards
  
   --
   tp
 
 
 
 
 -- 
 tp



Re: Does distcp support copying data from local directories of different nodes?

2009-07-15 Thread Tsz Wo (Nicholas), Sze

 bin/hadoop distcp A://a B://b C://c hdfs://namenode/

Yes or no.  In the command above, A, B and C are supposed to be schemes, e.g. 
hdfs, hftp, file, etc. but not host names.  So  the command won't work.  If 
nodes A, B and C support some schemes (they are not necessary the same), say 
ftp, then you can do the following with distcp.


   bin/hadoop distcp ftp://A/a ftp://B/b ftp://C/c hdfs://namenode/


Hope this help.
Nicholas Sze


- Original Message 
 From: Martin Mituzas xietao1...@hotmail.com
 To: core-u...@hadoop.apache.org
 Sent: Wednesday, July 15, 2009 2:25:52 AM
 Subject: Does distcp support copying data from local directories of different 
 nodes?
 
 
 I mean if I have different directories on node A, B, C, can I put them
 together as source directory arguments to copy them into HDFS?
 
 like:
 bin/hadoop distcp A://a B://b C://c hdfs://namenode/
 
 Thanks!
 -- 
 View this message in context: 
 http://www.nabble.com/Does-distcp-support-copying-data-from-local-directories-of-different-nodes--tp24494574p24494574.html
 Sent from the Hadoop core-user mailing list archive at Nabble.com.



Re: A brief report of Second Hadoop in China Salon

2009-05-16 Thread Tsz Wo (Nicholas), Sze

Congratulations!

Nicholas Sze




- Original Message 
 From: He Yongqiang heyongqi...@software.ict.ac.cn
 To: core-...@hadoop.apache.org core-...@hadoop.apache.org; 
 core-user@hadoop.apache.org core-user@hadoop.apache.org
 Sent: Friday, May 15, 2009 6:09:50 PM
 Subject: A brief report of Second Hadoop in China Salon
 
 Hi, all
 In May 9, we held the second Hadoop In China salon.  About 150 people
 attended, 46% of them are engineers/managers from industry companies, and
 38% of them are students/professors from universities and institutes.  This
 salon was successfully held with great technical support from Yahoo! Beijing
 RD, Zheng Shao from Facebook Inc., Wang Shouyan from Baidu Inc. and many
 other high technology companies in China. We got over one hundred feedbacks
 from attendees, and most of them are interested in details and wants more
 discussions. And 1/3 of them want we to include more topics or more sessions
 for hadoop subprojects. And most students/professors want to be more
 familiar with hadoop and try to find new research topic on top of hadoop.
 Most students want to involve themselves and contribute to hadoop, but do
 not know how or find it is a little difficulty because of language/zone
 problems.
Thank you all the attendees again. Without you, it would never success.
 
   We already put the slides on site: www.hadooper.cn, and the videos are
 coming soon.
 
   BTW, I insist on letting this event to be nonprofit. In the past two
 meetings, we did not charge anyone for anything.



Re: Doubt regarding permissions

2009-04-13 Thread Tsz Wo (Nicholas), Sze

Hi Amar,

I just have tried.  Everything worked as expected.  I guess user A in your 
experiment was a superuser so that he could read anything.

Nicholas Sze

/// permission testing //
drwx-wx-wx   - nicholas supergroup  0 2009-04-13 10:55 /temp
drwx-w--w-   - tsz supergroup  0 2009-04-13 10:58 /temp/test
-rw-r--r--   3 tsz supergroup   1366 2009-04-13 10:58 /temp/test/r.txt

//login as nicholas (non-superuser)

$ whoami
nicholas

$ ./bin/hadoop fs -lsr /temp
drwx-w--w-   - tsz supergroup  0 2009-04-13 10:58 /temp/test
lsr: could not get get listing for 'hdfs://:9000/temp/test' : 
org.apache.hadoop.security.AccessControlException: Permission denied: 
user=nicholas, access=READ_EXECUTE, inode=test:tsz:supergroup:rwx-w--w-

$ ./bin/hadoop fs -cat /temp/test/r.txt
cat: org.apache.hadoop.security.AccessControlException: Permission denied: 
user=nicholas, access=EXECUTE, inode=test:tsz:supergroup:rwx-w--w-



- Original Message 
 From: Amar Kamat ama...@yahoo-inc.com
 To: core-user@hadoop.apache.org
 Sent: Monday, April 13, 2009 2:02:24 AM
 Subject: Doubt regarding permissions
 
 Hey, I tried the following :
 
 -  created a dir temp for user A and permission 733
 
 -  created a dir temp/test for user B and permission 722
 
 -  - created a file temp/test/test.txt for user B and permission722
 
 
 
 Now in HDFS, user A can list as well as read the contents of file
 temp/test/test.txt while on my RHEL box I cant. Is it a feature or a
 bug. Can someone please try this out and confirm?
 
 
 
 Thanks
 
 Amar



Re: using distcp for http source files

2009-01-21 Thread Tsz Wo (Nicholas), Sze
Hi Derek,

The http in http://core:7274/logs/log.20090121; should be hftp.  hftp is 
the scheme name of HftpFileSystem which uses http for accessing hdfs.

Hope this helps.

Nicholas Sze




- Original Message 
 From: Derek Young dyo...@kayak.com
 To: core-user@hadoop.apache.org
 Sent: Wednesday, January 21, 2009 1:23:56 PM
 Subject: using distcp for http source files
 
 I plan to use hadoop to do some log processing and I'm working on a method to 
 load the files (probably nightly) into hdfs.  My plan is to have a web server 
 on 
 each machine with logs that serves up the log directories.  Then I would give 
 distcp a list of http URLs of the log files and have it copy the files in.
 
 Reading http://issues.apache.org/jira/browse/HADOOP-341 it sounds like this 
 should be supported, but the http URLs are not working for me.  Are http 
 source 
 URLs still supported?
 
 I tried a simple test with an http source URL (using Hadoop 0.19):
 
 hadoop distcp -f http://core:7274/logs/log.20090121 /user/dyoung/mylogs
 
 This fails:
 
 With failures, global counters are inaccurate; consider running with -i
 Copy failed: java.io.IOException: No FileSystem for scheme: http
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1364)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:56)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1379)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:215)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:175)
at org.apache.hadoop.tools.DistCp.fetchFileList(DistCp.java:578)
at org.apache.hadoop.tools.DistCp.access$300(DistCp.java:74)
at org.apache.hadoop.tools.DistCp$Arguments.valueOf(DistCp.java:775)
at org.apache.hadoop.tools.DistCp.run(DistCp.java:844)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.tools.DistCp.main(DistCp.java:871)



Re: Block not found during commitBlockSynchronization

2008-12-05 Thread Tsz Wo (Nicholas), Sze
Which version are you using?

Calling commitBlockSynchronization(...) with newgenerationstamp=0, newlength=0, 
newtargets=[] does not look normal.  You may check the namenode log and the 
client log about the block blk_-4236881263392665762.

Nicholas Sze




- Original Message 
 From: Brian Bockelman [EMAIL PROTECTED]
 To: core-user@hadoop.apache.org
 Sent: Friday, December 5, 2008 5:22:03 PM
 Subject: Block not found during commitBlockSynchronization
 
 Hey,
 
 I'm seeing this message repeated over and over in my logs:
 
 2008-12-05 19:20:00,534 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
 commitBlockSynchronization(lastblock=blk_-4236881263392665762_88597, 
 newgenerationstamp=0, newlength=0, newtargets=[])
 2008-12-05 19:20:00,534 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
 29 
 on 9000, call commitBlockSynchronization(blk_-4236881263392665762_88597, 0, 
 0, 
 false, true, [Lorg.apache.hadoop.hdfs.protocol.DatanodeID;@67537412) from 
 172.16.1.184:57586: error: java.io.IOException: Block 
 (=blk_-4236881263392665762_88597) not found
 java.io.IOException: Block (=blk_-4236881263392665762_88597) not found
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.commitBlockSynchronization(FSNamesystem.java:1898)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.commitBlockSynchronization(NameNode.java:410)
 at sun.reflect.GeneratedMethodAccessor21.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:892)
 
 What can I do to debug?
 
 Brian



Re: ls command output format

2008-11-21 Thread Tsz Wo (Nicholas), Sze
Hi Alex,

Yes, the doc about ls is out-dated.  Thanks for pointing this out.  Would you 
mind to file a JIRA?

Nicholas Sze



- Original Message 
 From: Alexander Aristov [EMAIL PROTECTED]
 To: core-user@hadoop.apache.org
 Sent: Friday, November 21, 2008 6:08:08 AM
 Subject: Re: ls command output format
 
 Found out that output has been changed in 0.18
 
 see HADOOP-2865 
 
 Docs should be also then updated.
 
 Alex
 
 2008/11/21 Alexander Aristov 
 
  Hello
 
  I wonder if hadoop shell command ls has changed output format
 
  Trying hadoop-0.18.2 I got next output
 
  [root]# hadoop fs -ls /
  Found 2 items
  drwxr-xr-x   - root supergroup  0 2008-11-21 08:08 /mnt
  drwxr-xr-x   - root supergroup  0 2008-11-21 08:19 /repos
 
 
  Though according to docs it should be that file name goes first.
  http://hadoop.apache.org/core/docs/r0.18.2/hdfs_shell.html#ls
 
  Usage: hadoop fs -ls 
  For a file returns stat on the file with the following format:
  filename filesize modification_date modification_time
  permissions userid groupid
  For a directory it returns list of its direct children as in unix. A
  directory is listed as:
  dirname 
modification_time modification_time permissions userid
  groupid
  Example:
  hadoop fs -ls /user/hadoop/file1 /user/hadoop/file2 hdfs://
  nn.example.com/user/hadoop/dir1 /nonexistentfile
  Exit Code:
   Returns 0 on success and -1 on error.
 
 
  I wouldn't notice the issue if I haven't had scripts which rely on the
  formatting.
 
  --
  Best Regards
  Alexander Aristov
 
 
 
 
 -- 
 Best Regards
 Alexander Aristov



Re: Anything like RandomAccessFile in Hadoop FS ?

2008-11-13 Thread Tsz Wo (Nicholas), Sze
Append is going to be available in 0.19 (not yet released).  There are new 
FileSystem APIs for append, e.g.

//FileSysetm.java
  public abstract FSDataOutputStream append(Path f, int bufferSize,
  Progressable progress) throws IOException;

Nicholas Sze




- Original Message 
 From: Bryan Duxbury [EMAIL PROTECTED]
 To: Wasim Bari [EMAIL PROTECTED]
 Cc: [EMAIL PROTECTED]
 Sent: Thursday, November 13, 2008 1:11:57 PM
 Subject: Re: Anything like RandomAccessFile in Hadoop FS ?
 
 I'm not sure off hand. Maybe someone else can point you in the right 
 direction?
 
 On Nov 13, 2008, at 1:09 PM, Wasim Bari wrote:
 
  Hi,
 Thanks for reply.
  
  HDFS supports append file. How can I do this ?
  I tried to look API under fileSystem create method but couldn't find.
  
  Thanks for ur help.
  
  Wasim
  
  --
  From: Bryan Duxbury 
  Sent: Thursday, November 13, 2008 9:48 PM
  To: 
  Subject: Re: Anything like RandomAccessFile in Hadoop FS ?
  
  If you mean a file where you can write anywhere, then no. HDFS is  
  streaming 
 only. If you want to read from anywhere, then no problem -  just use seek() 
 and 
 then read.
  On Nov 13, 2008, at 11:40 AM, Wasim Bari wrote:
  Hi,
   Is there any Utility for Hadoop files which can work same as  
 RandomAccessFile in Java ?
  Thanks,
  
  Wasim



Re: DistCp 0.18 Vs DistCp 0.17

2008-11-11 Thread Tsz Wo (Nicholas), Sze
There was a code refactoring in 0.18.  So the codes have been moved around.  
distcp is implemented by org.apache.hadoop.util.CopyFiles in 0.17 while it is 
implemented by org.apache.hadoop.tools.DistCp in 0.18.

There were improvements and bug fixes for distcp in 0.18 compared to 0.17.  Try 
bin/hadoop distcp to see help messages.

Nicholas Sze



- Original Message 
 From: Wasim Bari [EMAIL PROTECTED]
 To: core-user core-user@hadoop.apache.org
 Sent: Tuesday, November 11, 2008 9:09:22 AM
 Subject: DistCp 0.18 Vs DistCp 0.17
 
 Hi,
 The package for DistCp in 0.18 is:  org.Apache.Hadoop.tools. Is it same 
 in 
 0.17 or different one ?
 is there any difference among these two versions for DistCp ?
 Thanks,
 
 Wasim



Re: rsync on 2 HDFS

2008-09-05 Thread Tsz Wo (Nicholas), Sze
Hi Deepika,

We have a utility called distcp - distributed copy.  Note that distcp itself is 
different from rsync.  However, distcp -delete is similar to rsync --delete.

distcp -delete is a new feature in 0.19.  See HADOOP-3939.  For more details 
about distcp, see http://hadoop.apache.org/core/docs/r0.18.0/distcp.html
(the doc is for 0.18, so it won't mention distcp -delete.  The 0.19 doc will 
be updated in HADOOP-3942.)

Nicholas Sze




- Original Message 
 From: Deepika Khera [EMAIL PROTECTED]
 To: core-user@hadoop.apache.org
 Sent: Friday, September 5, 2008 2:42:09 PM
 Subject: rsync on 2 HDFS
 
 Hi,
 
 
 
 I wanted to do an rsync --delete between data in 2 HDFS system
 directories. Do we have a utility that could do this?
 
 
 
 I am aware that HDFS does not allow partial writes. An alternative would
 be to write a program to generate the list of differences in paths and
 then use distcp to copy the files and delete the appropriate files.
 
 
 
 Any pointers to implementations (or partial implementations)?
 
 
 
 Thanks,
 
 Deepika



Re: Please help me: is there a way to chown in Hadoop?

2008-08-26 Thread Tsz Wo (Nicholas), Sze
Yes, there is a chown command.
% hadoop fs -chown ...

For more help, try below
% hadoop fs -help chown 

Nicholas Sze



- Original Message 
 From: Gopal Gandhi [EMAIL PROTECTED]
 To: core-user@hadoop.apache.org; [EMAIL PROTECTED]
 Sent: Tuesday, August 26, 2008 11:35:58 AM
 Subject: Please help me: is there a way to chown in Hadoop?
 
 I need to change a file's owner from userA to userB. Is there such a command? 
 Thanks lot!
 
 % hadoop dfs -ls file
 /user/userA/file2008-08-25 20:00 rwxr-xr-x   userAsupergroup