Question about writing to local file system in reduce job

2008-05-21 Thread Cedric Ho
Hello,

I am using FileSystem.startLocalOutput() and
FileSystem.completeLocalOutput() in my reduce tasks (more than 1) to
produce some output. I have two questions:

1. if everything runs correctly, after the call to
completeLocalOutput(), the output is copied to HDFS and the local file
is deleted. However if the reduce tasks are killed in the middle for
whatever reason, the local files are not deleted. How can I delete it
when the reduce task failed.

2. if speculative execution is on, how can I force the two speculative
tasks that are working on the same output to write to different local
path, in case they happens to run on the same machine? Also, will
there be any problem if they both succeed and copy the output back to
HDFS in the same path in HDFS?

appreciate any help.

Cheers,
Cedric


Re: Problem with start-all on 0.16.4

2008-05-21 Thread Jean-Adrien

Hi

Same problem for me. I tried to rm -rf the datastore as well (prior to
reformat) but no change. Any clue is welcome

Regards



Adam Wynne wrote:
 
 Hi,
 
 I have a working 0.15.3 install and am trying to upgrade to 0.16.4.  I
 want to start clean with an empty filesystem, so I just reformatted
 the filesystem instead of using the upgrade option.  When I run
 start-all.sh, I get a null pointer exception originating from the
 NetUtils.getServerAddress() method.  This cluster is on a private
 network, could there be a bug with the way hadoop is looking up the
 address?  Other ideas?
 
 Here is the full error and stack trace from the namenode log:
 
 2008-05-14 08:03:37,252 INFO org.apache.hadoop.fs.FSNamesystem:
 fsOwner=qeadmin,qeadmin,wheel
 2008-05-14 08:03:37,253 INFO org.apache.hadoop.fs.FSNamesystem:
 supergroup=supergroup
 2008-05-14 08:03:37,253 INFO org.apache.hadoop.fs.FSNamesystem:
 isPermissionEnabled=true
 2008-05-14 08:03:37,358 INFO org.apache.hadoop.fs.FSNamesystem:
 Finished loading FSImage in 137 msecs
 2008-05-14 08:03:37,362 INFO org.apache.hadoop.fs.FSNamesystem:
 Leaving safemode after 142 msecs
 2008-05-14 08:03:37,362 INFO org.apache.hadoop.dfs.StateChange: STATE*
 Network topology has 0 racks and 0 datanodes
 2008-05-14 08:03:37,363 INFO org.apache.hadoop.dfs.StateChange: STATE*
 UnderReplicatedBlocks has 0 blocks
 2008-05-14 08:03:37,377 INFO org.apache.hadoop.fs.FSNamesystem:
 Registered FSNamesystemStatusMBean
 2008-05-14 08:03:37,398 ERROR org.apache.hadoop.dfs.NameNode:
 java.lang.NullPointerException
at
 org.apache.hadoop.net.NetUtils.getServerAddress(NetUtils.java:148)
at
 org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:279)
at org.apache.hadoop.dfs.FSNamesystem.init(FSNamesystem.java:235)
at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:131)
at org.apache.hadoop.dfs.NameNode.init(NameNode.java:176)
at org.apache.hadoop.dfs.NameNode.init(NameNode.java:162)
at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:846)
at org.apache.hadoop.dfs.NameNode.main(NameNode.java:855)
 
 2008-05-14 08:03:37,399 INFO org.apache.hadoop.dfs.NameNode: SHUTDOWN_MSG:
 /
 SHUTDOWN_MSG: Shutting down NameNode at compute-0-0.local/192.168.1.254
 /
 
 
 Thanks
 
 

-- 
View this message in context: 
http://www.nabble.com/Problem-with-start-all-on-0.16.4-tp17233437p17364262.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.



Re: Hadoop 0.17 AMI?

2008-05-21 Thread Jeff Eastman
Any word on 0.17? I was able to build an AMI from a trunk checkout and 
deploy a single node cluster but the create-hadoop-image-remote script 
really wants a tarball in the archive. I'd rather not waste time munging 
the scripts if a release is near.


Jeff

Nigel Daley wrote:
Hadoop 0.17 hasn't been released yet.  I (or Mukund) is hoping to call 
a vote this afternoon or tomorrow.


Nige

On May 14, 2008, at 12:36 PM, Jeff Eastman wrote:

I'm trying to bring up a cluster on EC2 using
(http://wiki.apache.org/hadoop/AmazonEC2) and it seems that 0.17 is the
version to use because of the DNS improvements, etc. Unfortunately, I
cannot find a public AMI with this build. Is there one that I'm not
finding or do I need to create one?

Jeff









Hadoop Streaming - revised

2008-05-21 Thread Tanton Gibbs
Ok, I turned on verbose output.  It looks as though it is adding
everything in my /tmp directory to the jar file it builds.  Where do I
tell it not to do that?

Thanks!
Tanton


Hadoop Streaming - final

2008-05-21 Thread Tanton Gibbs
Ok, I figured it out.  Hadoop Streaming adds the entire
stream.shipped.hadoopstreaming directory to the jar file.  For me, I
wasn't setting it and it was defaulting to /tmp.  That means my entire
/tmp directory was getting added to the jar.

I set that directory to the location of my hadoop streaming jar
directory and it seemed to work fine.

Sorry for the noise.


RE: Monthly Hadoop user group meetings

2008-05-21 Thread Ajay Anand
Reminder: the user group meeting is today at 6 pm at Yahoo! Mission
College.

Ajay

 



From: Ajay Anand 
Sent: Wednesday, May 14, 2008 9:53 AM
To: '[EMAIL PROTECTED]'; '[EMAIL PROTECTED]';
'[EMAIL PROTECTED]'
Cc: 'Chad Walters'; 'Jeff Hammerbacher'; Owen O'Malley
Subject: RE: Monthly Hadoop user group meetings

 

Agenda for the Hadoop user group meeting on Wednesday 5/21 6:00-7:30 pm
at Yahoo! Mission College:

-  Hadoop .17 release - Sameer Paranjpye

-  Mahout update - Jeff Eastman

-   And plenty of opportunity for networking, discussions
and beer...

 

Look forward to seeing you there. (Registration is at
http://upcoming.yahoo.com/event/591971/. )

Ajay

 



From: Ajay Anand 
Sent: Tuesday, May 06, 2008 9:53 AM
To: [EMAIL PROTECTED]; [EMAIL PROTECTED]
Cc: Chad Walters; Jeff Hammerbacher; Owen O'Malley
Subject: Monthly Hadoop user group meetings

 

One of the things we had discussed at the Hadoop summit was to set up
monthly user group meetings to discuss topics of interest to the hadoop
community. We have scheduled the first of these meetings for May 21st
from 6 to 7:30 pm at the Yahoo! Mission College campus. You can register
for this at http://upcoming.yahoo.com/event/591971/.

 

The core group organizing these includes Chad Walters from Powerset,
Jeff Hammerbacher from Facebook and Owen O'Malley from Yahoo. Please
send us any suggestions for topics or things you would like to share
with the group. Topics related to pig, hbase, zookeeper and mahout are
welcome as well

 

Look forward to seeing you there!

 

Ajay



Re: joins in map reduce

2008-05-21 Thread Owen O'Malley


On May 21, 2008, at 11:16 AM, Shirley Cohen wrote:

How does one do a join operation in map reduce? Is there more than  
one way to do a join? Which way works better and why?


There are a couple of ways, depending on what you need to do. If your  
input data is sorted and partitioned equivalently on the same key,  
you can do a join before the map (aka map-side join). The  
documentation is at:  http://tinyurl.com/5v4rot


If your data is not sorted and partitioned consistently, you need to  
do the join in the reduce. There is a library to help at: http:// 
tinyurl.com/5cz669


-- Owen




Re: Monthly Hadoop user group meetings

2008-05-21 Thread Ted Dunning
And anybody who wants to be early can meet some of us at Bennigan's.

On Wed, May 21, 2008 at 11:20 AM, Ajay Anand [EMAIL PROTECTED] wrote:

 Reminder: the user group meeting is today at 6 pm at Yahoo! Mission
 College.

 Ajay



 

 From: Ajay Anand
 Sent: Wednesday, May 14, 2008 9:53 AM
 To: '[EMAIL PROTECTED]'; '[EMAIL PROTECTED]';
 '[EMAIL PROTECTED]'
 Cc: 'Chad Walters'; 'Jeff Hammerbacher'; Owen O'Malley
 Subject: RE: Monthly Hadoop user group meetings



 Agenda for the Hadoop user group meeting on Wednesday 5/21 6:00-7:30 pm
 at Yahoo! Mission College:

 -  Hadoop .17 release - Sameer Paranjpye

 -  Mahout update - Jeff Eastman

 -   And plenty of opportunity for networking, discussions
 and beer...



 Look forward to seeing you there. (Registration is at
 http://upcoming.yahoo.com/event/591971/. )

 Ajay



 

 From: Ajay Anand
 Sent: Tuesday, May 06, 2008 9:53 AM
 To: [EMAIL PROTECTED]; [EMAIL PROTECTED]
 Cc: Chad Walters; Jeff Hammerbacher; Owen O'Malley
 Subject: Monthly Hadoop user group meetings



 One of the things we had discussed at the Hadoop summit was to set up
 monthly user group meetings to discuss topics of interest to the hadoop
 community. We have scheduled the first of these meetings for May 21st
 from 6 to 7:30 pm at the Yahoo! Mission College campus. You can register
 for this at http://upcoming.yahoo.com/event/591971/.



 The core group organizing these includes Chad Walters from Powerset,
 Jeff Hammerbacher from Facebook and Owen O'Malley from Yahoo. Please
 send us any suggestions for topics or things you would like to share
 with the group. Topics related to pig, hbase, zookeeper and mahout are
 welcome as well



 Look forward to seeing you there!



 Ajay




-- 
ted


[ANNOUNCE] Hadoop release 0.17.0 available

2008-05-21 Thread Mukund Madhugiri
Release 0.17.0 contains many improvements, new features, bug fixes and 
optimizations.


For release details and downloads, visit:

  http://hadoop.apache.org/core/releases.html

Hadoop 0.17.0 Release Notes are at

  http://hadoop.apache.org/core/docs/r0.17.0/releasenotes.html

Thanks to all who contributed to this release!

Mukund


Re: Hadoop experts wanted

2008-05-21 Thread Akshar
Interesting!!

BTW, Where do you work?

On Thu, May 15, 2008 at 2:23 PM, Jim R. Wilson [EMAIL PROTECTED]
wrote:

 Hi all,

 Hadoop is a great project and a growing niche.  As it becomes even
 more popular, there will be increasing demand for experts in the
 field.

 I am compiling a contact list of Hadoop experts who may be interested
 in opportunities under the right circumstances.  I am not a recruiter
 - I'm a regular developer who sometimes gets asked for referrals when
 I'm not personally available.

 If you'd like to be on my shortlist of go-to experts, please contact
 me off-list at: [EMAIL PROTECTED]

 Please be prepared to show your expertise by any of the following:
  * Committer status or patches accepted
  * Commit access to another open source project which uses Hadoop
  * Bugs reported which were either resolved or are still open (real bugs)
  * Articles / blog entries written about Hadoop concepts or development
  * Speaking engagements or user groups at which you've presented
  * Significant contributions to documentation
  * Other? (I'm sure I didn't think of everything)

 I'll be happy to answer any questions, and I look forward to hearing from
 you!

 -- Jim R. Wilson (jimbojw)



Re: Hadoop experts wanted

2008-05-21 Thread Edward J. Yoon
Oh, yes, i guessed wrong.
Thanks :)

Edward

On Thu, May 22, 2008 at 7:42 AM, Jeff Eastman
[EMAIL PROTECTED] wrote:
 Hi Edward,

 Check out this link
 (http://jimbojw.com/wiki/index.php?title=Understanding_Hbase_and_BigTable)
 before you panic over the similar postings. Jim's a little vague about what
 he's actually going to do with this data or when, but I found it useful.

 Jeff


 Edward J. Yoon wrote:

 Hey Akshar!
 Just FYI, See http://www.nabble.com/Django-experts-wanted-td17322054.html

 -Edward

 On Thu, May 22, 2008 at 6:24 AM, Akshar [EMAIL PROTECTED] wrote:


 Interesting!!

 BTW, Where do you work?

 On Thu, May 15, 2008 at 2:23 PM, Jim R. Wilson [EMAIL PROTECTED]
 wrote:



 Hi all,

 Hadoop is a great project and a growing niche.  As it becomes even
 more popular, there will be increasing demand for experts in the
 field.

 I am compiling a contact list of Hadoop experts who may be interested
 in opportunities under the right circumstances.  I am not a recruiter
 - I'm a regular developer who sometimes gets asked for referrals when
 I'm not personally available.

 If you'd like to be on my shortlist of go-to experts, please contact
 me off-list at: [EMAIL PROTECTED]

 Please be prepared to show your expertise by any of the following:
  * Committer status or patches accepted
  * Commit access to another open source project which uses Hadoop
  * Bugs reported which were either resolved or are still open (real
 bugs)
  * Articles / blog entries written about Hadoop concepts or development
  * Speaking engagements or user groups at which you've presented
  * Significant contributions to documentation
  * Other? (I'm sure I didn't think of everything)

 I'll be happy to answer any questions, and I look forward to hearing
 from
 you!

 -- Jim R. Wilson (jimbojw)











-- 
Best regards,
Edward J. Yoon,
http://blog.udanax.org


missing CompressionLevel for ZLibCompressor.

2008-05-21 Thread steph

All,

The class ZLibCompressor contains a enum for the CompressionLevel, and  
only a few compression

 levels have been implemented. Is there a reason for that?

I 'd like to add all the levels (0-9). How do i proceed to check-in  
that change?


Thanks,

S.



Avoiding Newline Problems in Hadoop Streaming + StreamXMLRecordReader

2008-05-21 Thread Bradford Stephens
Greetings,

I have an interesting problem I'm trying to solve. I currently store a bunch
of webpages in a large XML file in Hadoop. I'm trying to parse information
out of these webpages using a complex C# program that I have running on Mono
(I'm in a Linux environment). Therefore, I'm using Hadoop Streaming and the
StreamXMLRecordReader in order to get the information to my C# parser. The
problem is that even wrapped in XML, the Hadoop Streaming ends the records
at newlines! This makes the map input data pretty useless. Does anyone have
any hints on how to get around this?

Here's the XML structure I'm trying to use:

ContentRecordRecordURLhttp://www.blah/RecordURLPageContent![CDATA[page
text would be here including newlines ]]/PageContent/ContentRecord

Any ideas?

Cheers,
Bradford


Re: Hadoop experts wanted

2008-05-21 Thread Jim R. Wilson
Thanks Jeff - glad for the support :)

I appreciate your concern Edward.  My background is primarily in
MediaWiki, and I'm a relative newcomer to Hadoop/Hbase - writing
MapReduce Python jobs using Hadoop streaming and connecting PHP to
HBase through Thrift.  It's all been a very interesting journey which
I plan to write more articles about as time permits.  I'm also
preparing a patch for HBase to support generating EC2 AMIs with
Hadoop+HBase since all the latest public AMIs have only Hadoop.

Regarding the feelers, I've posted feeler messages only in communities
where I feel I could intelligently contribute to a conversation on the
subject.  I wouldn't, for example, post such a feeler on a Linux
kernel development list as I have no experience or knowledge about it.
 Based on my recent experience with Hadoop/HBase, I felt I'd be able
to vet any potentially interested experts by evaluating code samples,
asking pointed questions, reading published articles etc.

Being a wiki guy, the system I eventually create to present the expert
list will almost certainly have a wiki component, giving experts the
opportunity to elaborate on their experience or knowledge without
restriction, but also have an uneditable (moderator only) section
where I'd list the affirmed credentials (such as significant patches,
enhancements, articles on the subject etc).

I'm still not sure yet what the whole thing will look like, but I've
gotten a fairly positive response to my query mails so far, so I'll
begin cooking something up soon.

Sorry for taking this so far off-topic, it wasn't my intent to do so.
I appreciate your concern, and if you have suggestions on how I could
make my emails seem less spammy, I'd be happy to alter them. :)

-- Jim

On Wed, May 21, 2008 at 7:12 PM, Edward J. Yoon [EMAIL PROTECTED] wrote:
 Oh, yes, i guessed wrong.
 Thanks :)

 Edward

 On Thu, May 22, 2008 at 7:42 AM, Jeff Eastman
 [EMAIL PROTECTED] wrote:
 Hi Edward,

 Check out this link
 (http://jimbojw.com/wiki/index.php?title=Understanding_Hbase_and_BigTable)
 before you panic over the similar postings. Jim's a little vague about what
 he's actually going to do with this data or when, but I found it useful.

 Jeff


 Edward J. Yoon wrote:

 Hey Akshar!
 Just FYI, See http://www.nabble.com/Django-experts-wanted-td17322054.html

 -Edward

 On Thu, May 22, 2008 at 6:24 AM, Akshar [EMAIL PROTECTED] wrote:


 Interesting!!

 BTW, Where do you work?

 On Thu, May 15, 2008 at 2:23 PM, Jim R. Wilson [EMAIL PROTECTED]
 wrote:



 Hi all,

 Hadoop is a great project and a growing niche.  As it becomes even
 more popular, there will be increasing demand for experts in the
 field.

 I am compiling a contact list of Hadoop experts who may be interested
 in opportunities under the right circumstances.  I am not a recruiter
 - I'm a regular developer who sometimes gets asked for referrals when
 I'm not personally available.

 If you'd like to be on my shortlist of go-to experts, please contact
 me off-list at: [EMAIL PROTECTED]

 Please be prepared to show your expertise by any of the following:
  * Committer status or patches accepted
  * Commit access to another open source project which uses Hadoop
  * Bugs reported which were either resolved or are still open (real
 bugs)
  * Articles / blog entries written about Hadoop concepts or development
  * Speaking engagements or user groups at which you've presented
  * Significant contributions to documentation
  * Other? (I'm sure I didn't think of everything)

 I'll be happy to answer any questions, and I look forward to hearing
 from
 you!

 -- Jim R. Wilson (jimbojw)











 --
 Best regards,
 Edward J. Yoon,
 http://blog.udanax.org



Re: Hadoop 0.17 AMI?

2008-05-21 Thread Otis Gospodnetic
Hi Jeff,

0.17.0 was released yesterday, from what I can tell.


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


- Original Message 
 From: Jeff Eastman [EMAIL PROTECTED]
 To: core-user@hadoop.apache.org
 Sent: Wednesday, May 21, 2008 11:18:56 AM
 Subject: Re: Hadoop 0.17 AMI?
 
 Any word on 0.17? I was able to build an AMI from a trunk checkout and 
 deploy a single node cluster but the create-hadoop-image-remote script 
 really wants a tarball in the archive. I'd rather not waste time munging 
 the scripts if a release is near.
 
 Jeff
 
 Nigel Daley wrote:
  Hadoop 0.17 hasn't been released yet.  I (or Mukund) is hoping to call 
  a vote this afternoon or tomorrow.
 
  Nige
 
  On May 14, 2008, at 12:36 PM, Jeff Eastman wrote:
  I'm trying to bring up a cluster on EC2 using
  (http://wiki.apache.org/hadoop/AmazonEC2) and it seems that 0.17 is the
  version to use because of the DNS improvements, etc. Unfortunately, I
  cannot find a public AMI with this build. Is there one that I'm not
  finding or do I need to create one?
 
  Jeff
 
 
 
 



Questions on how to use DistributedCache

2008-05-21 Thread Taeho Kang
Dear all,

I am trying to use DistributedCache class for distributing files required
for running my jobs.

While API documentation provides good guidelines,
Is there any tips or usage examples (e.g. sample codes)?

If you could share your experience with me, I would really appreciate it.

Thank you in advance,

/Taeho


Confuse about the Client.Connection

2008-05-21 Thread heyongqiang
hi,all
I took a look at the source code of org.apache.hadoop.ipc.Client ,and i wonder 
if there are two client thread  invoke the getConnection() specifing the same 
arguments,then they will get a same Connection object,how could they 
distinguish the results from each other?
I noticed the results streamed back from the server is collected by the 
Connection's thread,not the callers' threads,
and the Connection's thread expects reaults :callId_XX,reaultBody_XX.
Is there a situation in which Connection's result thread collects 
callId_by_threadA,reaultBody_by_threadB,callId_by_threadB,resultBody_by_threadA?I
 think this situation is kind of reasonable,how does the current code handle 
this?







heyongqiang
[EMAIL PROTECTED]
2008-05-22