Assuming N is not too large in the sense that your reducers can keep a tree
map of N elements, then you can have your reducer maintain the top N
elements in a tree-map (or a priority queue, or a heap, whatever), with
counts as keys in the tree-map. As the reducers progress, you throw away
the item
Are the libs exploded inside the main jar? If not then no it probably won't
work.
James
Sent from my mobile. Please excuse the typos.
On 2010-09-10, at 7:43 PM, "Mark" wrote:
> If I deploy 1 jar (that contains a lib directory with all the required
> dependencies) shouldn't that jar be inh
Is the footer on this email a little rough for content that will be passed
around and made indexable on the internets?
Just saying :)
Cheers
James
Sent from my mobile. Please excuse the typos.
On 2010-09-10, at 8:01 PM, "Kaluskar, Sanjay" wrote:
> Have you considered using something higher-l
Have you considered using something higher-level like PIG or Hive? Are
there reasons why you need to process at this low level?
-Original Message-
From: Aaron Baff [mailto:aaron.b...@telescope.tv]
Sent: Friday, September 10, 2010 11:50 PM
To: common-user@hadoop.apache.org
Subject: Custom
If I deploy 1 jar (that contains a lib directory with all the required
dependencies) shouldn't that jar be inherently be distributed to all the
nodes?
On 9/10/10 2:49 PM, Mark wrote:
I dont know? I'm running in a fully distributed environment.. ie not
local or psuedo.
On 9/10/10 12:03 PM,
Hi Alex ,
Thanks so much for the reply . As of now I don't have any issue with 2
Jobs.I was just making sure that I am not missing any obvious way of writing
the program in one job.I will get back if I need to optimize on performance
based on specific pattern of input.
Thank you so much you all f
Hi Neil,
Uniques and Top N, as well as percentiles, are inherently difficult to
distribute/parallelize since you have to have a global view of the dataset.
You can optimize the computations given some assumptions about the input
(the # of unique values, prevalence of the most frequent value larger
Thanks Aaron. I employed two Jobs and solved the problem.
I was just wondering is there anyway , it can be done in single job so that
disk/network I/O is less and no temporary storage is required between 1st
and second job.
Neil
On Sat, Sep 11, 2010 at 4:37 AM, Aaron Baff wrote:
> I'm still f
I'm still fairly new at MapReduce, but here's my thoughts the solution.
Use the Item as the Key, the Count as the Value, in the Reducer, sum up all of
the Count's and output the Item,sum(Count). To make it more efficient, use the
same Reducer as the Combiner.
Then do a 2nd Job where you map the
Thanks James,
This gives me only N results for sure but not necessarily the top N
I have used the Item as Key and Count as Value as input to the reducer.
and my reducing logic is to sum the count for a particular item.
Now my output comes as grouped but not in order.
Do I need to use custom co
Hi Sonal,
The 0.21.0 jars are not available in Maven yet, since the process for
publishing them post split has changed.
See HDFS-1292 and MAPREDUCE-1929.
Cheers,
Tom
On Fri, Sep 10, 2010 at 1:33 PM, Sonal Goyal wrote:
> Hi,
>
> Can someone please point me to the Maven repo for 0.21 release? Tha
I dont know? I'm running in a fully distributed environment.. ie not
local or psuedo.
On 9/10/10 12:03 PM, Allen Wittenauer wrote:
On Sep 10, 2010, at 11:53 AM, Mark wrote:
If I submit a jar that has a lib directory that contains a bunch of jars,
shouldn't those jars be in the classpath and
Welcome to the land of the fuzzy elephant!
Of course there are many ways to do it. Here is one, it might not be brilliant
or the right was, but I am sure you will get more :)
Use the identity mapper...
job.setMapperClass(Mapper.class);
then have one reducer
job.setNumRedu
Hello ,
I am new to Hadoop.Can anybody suggest any example or procedure of
outputting TOP N items having maximum total count, where the input file has
have (Item, count ) pair in each line .
Items can repeat.
Thanks
Neil
http://neilghosh.com
--
Thanks and Regards
Neil
http://neilghosh.com
Hello ,
I am new to Hadoop.Can anybody suggest any example or procedure of
outputting TOP N items having maximum total count, where the input file has
have (Item, count ) pair in each line .
Items can repeat.
Thanks
Neil
http://neilghosh.com
Yes that seems to have done the trick!
Thanks
Lewis.
On 10 September 2010 20:39, Greg Roelofs wrote:
> Lewis Crawford wrote:
>
>> I am trying unsuccessfully to apply a patch (HADOOP-6835) to hadoop-0.20.2
>> (64bit Ubuntu 10.04)
>
>> using ant on the command line I was able to build the project
Hi,
Can someone please point me to the Maven repo for 0.21 release? Thanks.
Thanks and Regards,
Sonal
www.meghsoft.com
http://in.linkedin.com/in/sonalgoyal
Lewis Crawford wrote:
> I am trying unsuccessfully to apply a patch (HADOOP-6835) to hadoop-0.20.2
> (64bit Ubuntu 10.04)
> using ant on the command line I was able to build the project again
> and generate a new jar hadoop-0.20.3-dev-core.jar which I copied back
> into the $HADOOP_HOME and start
On Sep 10, 2010, at 11:53 AM, Mark wrote:
> If I submit a jar that has a lib directory that contains a bunch of jars,
> shouldn't those jars be in the classpath and available to all nodes?
Are you using distributed cache?
If I submit a jar that has a lib directory that contains a bunch of
jars, shouldn't those jars be in the classpath and available to all nodes?
The reason I ask this is because I am trying to submit a jar myjar.jar
that has the following structure
--src
\ (My source classes)
-- lib
\
On Fri, Sep 10, 2010 at 1:08 PM, leibnitz wrote:
>
> is it running in safemode?hadoop wil run in this case when start for a
How do I find that out if it's running in safemode?
After the clue about datanode failures given in earlier replies, I did check
the datanode logs and they were running fine
So I'm pretty new to Hadoop, just learning it for work, and starting to play
with some of our data on a VM cluster to see it work, and to make sure it can
do what we need to. By and large, very cool, I think I'm getting the hang of
it, but when I try and make a custom composite key class, it doe
is it running in safemode?hadoop wil run in this case when start for a
moment.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Problem-in-copyFromLocal-tp1446688p1453684.html
Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
Hi,
I am trying unsuccessfully to apply a patch (HADOOP-6835) to hadoop-0.20.2
(64bit Ubuntu 10.04)
I have downloaded the tar.gz and can build the project -
I tried to apply the patch from
https://issues.apache.org/jira/browse/HADOOP-6835
(specifically HADOOP-6835.v9.yahoo-0.20.2xx-branch.patch
24 matches
Mail list logo