Re: Questions on how to use DistributedCache

2008-05-22 Thread Arun C Murthy
On May 21, 2008, at 10:45 PM, Taeho Kang wrote: Dear all, I am trying to use DistributedCache class for distributing files required for running my jobs. While API documentation provides good guidelines, Is there any tips or usage examples (e.g. sample codes)?

Re: Confuse about the Client.Connection

2008-05-22 Thread heyongqiang
well,i guess i got the answer.First hadoop use tcp,so will not occur situations like reaultBody_by_threadB,callId_by_threadB;Second,the server has been synchronied on the response queue when the responder send response messages ,and for one call each time.Since the clients use the same

Re: Problem with start-all on 0.16.4

2008-05-22 Thread Jean-Adrien
Yeps, that's it. I interverted my config dir and I updated hadoop-site rather than hadoop-default... Thx a lot Erwan Arzur-2 wrote: Hey, Did you update the config directory(ies) with defaults from the new release ? config/hadoop-default.xml may have been modified with new settings

Re: Questions on how to use DistributedCache

2008-05-22 Thread Edward J. Yoon
Long time no see, T If you do this on your own, please contribute it back to Hadoop! *smile* Edward On Thu, May 22, 2008 at 4:20 PM, Arun C Murthy [EMAIL PROTECTED] wrote: On May 21, 2008, at 10:45 PM, Taeho Kang wrote: Dear all, I am trying to use DistributedCache class for distributing

RE: Questions on how to use DistributedCache

2008-05-22 Thread Devaraj Das
-Original Message- From: Taeho Kang [mailto:[EMAIL PROTECTED] Sent: Thursday, May 22, 2008 3:41 PM To: core-user@hadoop.apache.org Subject: Re: Questions on how to use DistributedCache Thanks for your reply. Just one more thing to ask.. From what I see from the source

Re: Hadoop 0.17 AMI?

2008-05-22 Thread Tom White
Hi Jeff, I've built two public 0.17.0 AMIs (32-bit and 64-bit), so you should be able to use the 0.17 scripts to launch them now. Cheers, Tom On Thu, May 22, 2008 at 6:37 AM, Otis Gospodnetic [EMAIL PROTECTED] wrote: Hi Jeff, 0.17.0 was released yesterday, from what I can tell. Otis --

Re: Counters problem...

2008-05-22 Thread Ion Badita
Ion Badita wrote: Hi, I have a problem with counters been updated, after i upgraded my hadoop from 0.15.1 to 0.16.4 and i tried 0.17.0 too. The counters are first updated only after first map task completes. The counters worked well in older version. Any ideas why? Thanks. Ion Hi, I

Re: Counters problem...

2008-05-22 Thread Ted Dziuba
I can confirm this behavior, Hadoop 0.16.4 I miss the counters. ted Ion Badita wrote: Ion Badita wrote: Hi, I have a problem with counters been updated, after i upgraded my hadoop from 0.15.1 to 0.16.4 and i tried 0.17.0 too. The counters are first updated only after first map task

Re: Hadoop 0.17 AMI?

2008-05-22 Thread Jeff Eastman
Thanks Tom, I'll try them out today Jeff Tom White wrote: Hi Jeff, I've built two public 0.17.0 AMIs (32-bit and 64-bit), so you should be able to use the 0.17 scripts to launch them now. Cheers, Tom On Thu, May 22, 2008 at 6:37 AM, Otis Gospodnetic [EMAIL PROTECTED] wrote: Hi Jeff,

Users Group Meeting Slides

2008-05-22 Thread Jeff Eastman
I uploaded the slides from my Mahout overview to our wiki (http://cwiki.apache.org/confluence/display/MAHOUT/FAQ) along with another recent talk by Isabel Drost. Both are similar in content but their differences reflect the rapid evolution of the project in the month that separates them in

Re: joins in map reduce

2008-05-22 Thread Ted Dunning
Also, if one source of the join is small enough to fit in memory, you can build an in-memory table and do the map-side join on unsorted data. On 5/21/08 11:43 AM, Owen O'Malley [EMAIL PROTECTED] wrote: On May 21, 2008, at 11:16 AM, Shirley Cohen wrote: How does one do a join operation

Follow-up to sorting question from user meeting

2008-05-22 Thread Nathan Fiedler
In yesterday's user group meeting, Ted asked if the new sort function in the MapReduce implementation was stable, since typically quicksort is not stable for efficient implementations. Assuming Hadoop is using one of the sort() methods in java.util.Arrays based on quicksort, then there is a very

Re: Users Group Meeting Slides

2008-05-22 Thread Lukas Vlcek
http://svn.apache.org/repos/asf/lucene/mahout/trunk On Thu, May 22, 2008 at 8:10 PM, Tanton Gibbs [EMAIL PROTECTED] wrote: I checked out the Wiki. I am in need of a canopy clustering algorithm for hadoop. I'm about to embark on writing one, but if you have one already, that would be

Re: Users Group Meeting Slides

2008-05-22 Thread Ted Dunning
See here: http://cwiki.apache.org/MAHOUT/howtocontribute.html Particularly the getting the source section. On 5/22/08 11:10 AM, Tanton Gibbs [EMAIL PROTECTED] wrote: I checked out the Wiki. I am in need of a canopy clustering algorithm for hadoop. I'm about to embark on writing one, but

Re: Users Group Meeting Slides

2008-05-22 Thread Jeff Eastman
Hi Tanton, We have canopy and kmeans in trunk that you are welcome to use (https://svn.apache.org/repos/asf/lucene/mahout/trunk). Please post any usability questions you may have or suggestions for improvements to the Mahout user list ([EMAIL PROTECTED]). There is some documentation on the

Re: Users Group Meeting Slides

2008-05-22 Thread Tanton Gibbs
Excellent! Thanks! On Thu, May 22, 2008 at 1:13 PM, Lukas Vlcek [EMAIL PROTECTED] wrote: http://svn.apache.org/repos/asf/lucene/mahout/trunk On Thu, May 22, 2008 at 8:10 PM, Tanton Gibbs [EMAIL PROTECTED] wrote: I checked out the Wiki. I am in need of a canopy clustering algorithm for

Can you run multiple simultaneous hadoop jobs?

2008-05-22 Thread Kayla Jay
Hello. I'm trying to figure out why I need to use HOD vs. trying to run multiple jobs at the same time on the same set of resources. Is it possible to run multiple hadoop jobs at the same time on the same set of input data? I tried to run different jobs on the same set of data at the same

Re: Serialization format for structured data

2008-05-22 Thread Ted Dunning
What is it that makes you not think JSON has to be inefficient? ? Repeated value parsing ? ? Redundant redundant data labels ? ? Generic parsing must be slow prejudice ? On 5/22/08 1:54 PM, Stuart Sierra [EMAIL PROTECTED] wrote: Hello, I'm still getting my head around how Hadoop works. A

Hadoop fsck displays open files as corrupt.

2008-05-22 Thread Martin Schaaf
Hi, we wrote a program that uses a Writer to append keys and values to a file. If we do an fsck during these writing the opened files are reported as corrupt and the file size is zero until they are closed. On the other side if we copy a file from local fs to the hadoop fs the size constantly

Import path for hadoop streaming with python

2008-05-22 Thread Martin Blom
Hello all, I'm trying to stream a little python script on my small hadoop cluster, and it doesn't work like I thought it would. The script looks something like #!/usr/bin/env python import mylib dostuff where mylib is a small python library that I want included, and I launch the whole thing

Re: Hadoop fsck displays open files as corrupt.

2008-05-22 Thread stack
The first case sounds like HADOOP-2703. St.Ack Martin Schaaf wrote: Hi, we wrote a program that uses a Writer to append keys and values to a file. If we do an fsck during these writing the opened files are reported as corrupt and the file size is zero until they are closed. On the other side

Re: Hadoop fsck displays open files as corrupt.

2008-05-22 Thread Martin Schaaf
On Thu, 22 May 2008 16:21:31 -0700 (PDT) lohit lohit wrote: What you are seeing is expected behavior in earlier versions of Hadoop. hadoop-0.18 has fix for this. Ok. Thanks for your answer. bye, martin.

Re: Import path for hadoop streaming with python

2008-05-22 Thread Saptarshi Guha
I haven't done this using hadoop but before i 16.4 i had written my own distributed batch processor using HDFS as a common file storage and remote execution of python scripts. They all required a custom module which was copied to the remote temp folders (a primitive implementation of