date:20100910

Re: TOP N items

2010-09-10 Thread Runping Qi

Assuming N is not too large in the sense that your reducers can keep a tree map of N elements, then you can have your reducer maintain the top N elements in a tree-map (or a priority queue, or a heap, whatever), with counts as keys in the tree-map. As the reducers progress, you throw away the item

Re: Question on classpath

2010-09-10 Thread James Seigel

Are the libs exploded inside the main jar? If not then no it probably won't work. James Sent from my mobile. Please excuse the typos. On 2010-09-10, at 7:43 PM, "Mark" wrote: > If I deploy 1 jar (that contains a lib directory with all the required > dependencies) shouldn't that jar be inh

Re: Custom Key class not working correctly

2010-09-10 Thread James Seigel

Is the footer on this email a little rough for content that will be passed around and made indexable on the internets? Just saying :) Cheers James Sent from my mobile. Please excuse the typos. On 2010-09-10, at 8:01 PM, "Kaluskar, Sanjay" wrote: > Have you considered using something higher-l

RE: Custom Key class not working correctly

2010-09-10 Thread Kaluskar, Sanjay

Have you considered using something higher-level like PIG or Hive? Are there reasons why you need to process at this low level? -Original Message- From: Aaron Baff [mailto:aaron.b...@telescope.tv] Sent: Friday, September 10, 2010 11:50 PM To: common-user@hadoop.apache.org Subject: Custom

Re: Question on classpath

2010-09-10 Thread Mark

If I deploy 1 jar (that contains a lib directory with all the required dependencies) shouldn't that jar be inherently be distributed to all the nodes? On 9/10/10 2:49 PM, Mark wrote: I dont know? I'm running in a fully distributed environment.. ie not local or psuedo. On 9/10/10 12:03 PM,

Re: TOP N items

2010-09-10 Thread Neil Ghosh

Hi Alex , Thanks so much for the reply . As of now I don't have any issue with 2 Jobs.I was just making sure that I am not missing any obvious way of writing the program in one job.I will get back if I need to optimize on performance based on specific pattern of input. Thank you so much you all f

Re: TOP N items

2010-09-10 Thread Alex Kozlov

Hi Neil, Uniques and Top N, as well as percentiles, are inherently difficult to distribute/parallelize since you have to have a global view of the dataset. You can optimize the computations given some assumptions about the input (the # of unique values, prevalence of the most frequent value larger

Re: TOP N items

2010-09-10 Thread Neil Ghosh

Thanks Aaron. I employed two Jobs and solved the problem. I was just wondering is there anyway , it can be done in single job so that disk/network I/O is less and no temporary storage is required between 1st and second job. Neil On Sat, Sep 11, 2010 at 4:37 AM, Aaron Baff wrote: > I'm still f

RE: TOP N items

2010-09-10 Thread Aaron Baff

I'm still fairly new at MapReduce, but here's my thoughts the solution. Use the Item as the Key, the Count as the Value, in the Reducer, sum up all of the Count's and output the Item,sum(Count). To make it more efficient, use the same Reducer as the Combiner. Then do a 2nd Job where you map the

Re: TOP N items

2010-09-10 Thread Neil Ghosh

Thanks James, This gives me only N results for sure but not necessarily the top N I have used the Item as Key and Count as Value as input to the reducer. and my reducing logic is to sum the count for a particular item. Now my output comes as grouped but not in order. Do I need to use custom co

Re: Hadoop 0.21.0 release Maven repo

2010-09-10 Thread Tom White

Hi Sonal, The 0.21.0 jars are not available in Maven yet, since the process for publishing them post split has changed. See HDFS-1292 and MAPREDUCE-1929. Cheers, Tom On Fri, Sep 10, 2010 at 1:33 PM, Sonal Goyal wrote: > Hi, > > Can someone please point me to the Maven repo for 0.21 release? Tha

Re: Question on classpath

2010-09-10 Thread Mark

I dont know? I'm running in a fully distributed environment.. ie not local or psuedo. On 9/10/10 12:03 PM, Allen Wittenauer wrote: On Sep 10, 2010, at 11:53 AM, Mark wrote: If I submit a jar that has a lib directory that contains a bunch of jars, shouldn't those jars be in the classpath and

Re: TOP N items

2010-09-10 Thread James Seigel

Welcome to the land of the fuzzy elephant! Of course there are many ways to do it. Here is one, it might not be brilliant or the right was, but I am sure you will get more :) Use the identity mapper... job.setMapperClass(Mapper.class); then have one reducer job.setNumRedu

TOP N items

2010-09-10 Thread Neil Ghosh

Hello , I am new to Hadoop.Can anybody suggest any example or procedure of outputting TOP N items having maximum total count, where the input file has have (Item, count ) pair in each line . Items can repeat. Thanks Neil http://neilghosh.com -- Thanks and Regards Neil http://neilghosh.com

Re: applying patch HADOOP-6835 to hadoop-0.20.2

2010-09-10 Thread Neil Ghosh

Hello , I am new to Hadoop.Can anybody suggest any example or procedure of outputting TOP N items having maximum total count, where the input file has have (Item, count ) pair in each line . Items can repeat. Thanks Neil http://neilghosh.com

Re: applying patch HADOOP-6835 to hadoop-0.20.2

2010-09-10 Thread Lewis Crawford

Yes that seems to have done the trick! Thanks Lewis. On 10 September 2010 20:39, Greg Roelofs wrote: > Lewis Crawford wrote: > >> I am trying unsuccessfully to apply a patch (HADOOP-6835) to hadoop-0.20.2 >> (64bit Ubuntu 10.04) > >> using ant on the command line I was able to build the project

Hadoop 0.21.0 release Maven repo

2010-09-10 Thread Sonal Goyal

Hi, Can someone please point me to the Maven repo for 0.21 release? Thanks. Thanks and Regards, Sonal www.meghsoft.com http://in.linkedin.com/in/sonalgoyal

Re: applying patch HADOOP-6835 to hadoop-0.20.2

2010-09-10 Thread Greg Roelofs

Lewis Crawford wrote: > I am trying unsuccessfully to apply a patch (HADOOP-6835) to hadoop-0.20.2 > (64bit Ubuntu 10.04) > using ant on the command line I was able to build the project again > and generate a new jar hadoop-0.20.3-dev-core.jar which I copied back > into the $HADOOP_HOME and start

Re: Question on classpath

2010-09-10 Thread Allen Wittenauer

On Sep 10, 2010, at 11:53 AM, Mark wrote: > If I submit a jar that has a lib directory that contains a bunch of jars, > shouldn't those jars be in the classpath and available to all nodes? Are you using distributed cache?

Question on classpath

2010-09-10 Thread Mark

If I submit a jar that has a lib directory that contains a bunch of jars, shouldn't those jars be in the classpath and available to all nodes? The reason I ask this is because I am trying to submit a jar myjar.jar that has the following structure --src \ (My source classes) -- lib \

Re: Problem in copyFromLocal

2010-09-10 Thread Medha Atre

On Fri, Sep 10, 2010 at 1:08 PM, leibnitz wrote: > > is it running in safemode?hadoop wil run in this case when start for a How do I find that out if it's running in safemode? After the clue about datanode failures given in earlier replies, I did check the datanode logs and they were running fine

Custom Key class not working correctly

2010-09-10 Thread Aaron Baff

So I'm pretty new to Hadoop, just learning it for work, and starting to play with some of our data on a VM cluster to see it work, and to make sure it can do what we need to. By and large, very cool, I think I'm getting the hang of it, but when I try and make a custom composite key class, it doe

Re: Problem in copyFromLocal

2010-09-10 Thread leibnitz

is it running in safemode?hadoop wil run in this case when start for a moment. -- View this message in context: http://lucene.472066.n3.nabble.com/Problem-in-copyFromLocal-tp1446688p1453684.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com.

applying patch HADOOP-6835 to hadoop-0.20.2

2010-09-10 Thread Lewis Crawford

Hi, I am trying unsuccessfully to apply a patch (HADOOP-6835) to hadoop-0.20.2 (64bit Ubuntu 10.04) I have downloaded the tar.gz and can build the project - I tried to apply the patch from https://issues.apache.org/jira/browse/HADOOP-6835 (specifically HADOOP-6835.v9.yahoo-0.20.2xx-branch.patch

Re: TOP N items

Re: Question on classpath

Re: Custom Key class not working correctly

RE: Custom Key class not working correctly

Re: Question on classpath

Re: TOP N items

Re: TOP N items

Re: TOP N items

RE: TOP N items

Re: TOP N items

Re: Hadoop 0.21.0 release Maven repo

Re: Question on classpath

Re: TOP N items

TOP N items

Re: applying patch HADOOP-6835 to hadoop-0.20.2

Re: applying patch HADOOP-6835 to hadoop-0.20.2

Hadoop 0.21.0 release Maven repo

Re: applying patch HADOOP-6835 to hadoop-0.20.2

Re: Question on classpath

Question on classpath

Re: Problem in copyFromLocal

Custom Key class not working correctly

Re: Problem in copyFromLocal

applying patch HADOOP-6835 to hadoop-0.20.2

24 matches

Site Navigation

Mail list logo

Footer information