Re: A brief report of Second Hadoop in China Salon

2009-05-16 Thread Tsz Wo (Nicholas), Sze
Congratulations! Nicholas Sze - Original Message From: He Yongqiang heyongqi...@software.ict.ac.cn To: core-...@hadoop.apache.org core-...@hadoop.apache.org; core-user@hadoop.apache.org core-user@hadoop.apache.org Sent: Friday, May 15, 2009 6:09:50 PM Subject: A brief report

Re: Is there any performance issue with Jrockit JVM for Hadoop

2009-05-16 Thread Grace
To follow up this question, I have also asked help on Jrockit forum. They kindly offered some useful and detailed suggestions according to the JRA results. After updating the option list, the performance did become better to some extend. But it is still not comparable with the Sun JVM. Maybe, it

TASKS KILLED WHEN RUNNING : bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+'

2009-05-16 Thread ashish pareek
hELLO TO EVERY BODY I AM A NEW HAOODP USER I STARTED RUNNING HADOOP USING SITE http://hadoop.apache.org/core/docs/current/quickstart.html BUT WHEN I RUN COMMAND bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+' , IN PSEDUO DISTRIBUTED MODE I GET ERROR LIKE ::: Task

Re: Loading FSEditLog fails

2009-05-16 Thread Andrew
OK, I've just solved problem with minor data lost. Steps to solve: 1) comment out FSEditLog.java:542 2) compile hadoop-core jar 3) start cluster with new jar Namenode will skip bad records in name/current/edits and write new edits file back into fs. As bad records stand for actual IO operations,

hadoop MapReduce and stop words

2009-05-16 Thread PORTO aLET
Hi, I am trying to include the stop words into hadoop map reduce, and later on, into hive. What is the accepted solution regarding the stop words in hadoop? All I can think is to load all the stop words into an array in the mapper, and then check each token against the stop words..(this would be

Re: hadoop MapReduce and stop words

2009-05-16 Thread tim robertson
Perhaps some kind of in memory index would be better than iterating an array? Binary tree or so. I did similar with polygon indexes and point data. It requires careful memory planning on the nodes if the indexes are large (mine were several GB). Just a thought, Tim On Sat, May 16, 2009 at

Re: hadoop MapReduce and stop words

2009-05-16 Thread PORTO aLET
Can you please elaborate more about in memory index? What kind of software did you used to implement this ? Regards On Sat, May 16, 2009 at 8:55 PM, tim robertson timrobertson...@gmail.comwrote: Perhaps some kind of in memory index would be better than iterating an array? Binary tree or so. I

Re: hadoop MapReduce and stop words

2009-05-16 Thread tim robertson
Try and google binary tree java and you will get loads of hits... This is a simple implementation but I am sure there are better ones that handle balancing better. Cheers Tim public class BinaryTree { public static void main(String[] args) { BinaryTree bt = new

Re: hadoop MapReduce and stop words

2009-05-16 Thread Stefan Will
Just use a java.util.HashSet for this. There should only be a few dozen stopwords, so load them into a HashSet when the Mapper starts up, and then check your tokens against it while you're processing records. -- Stefan From: tim robertson timrobertson...@gmail.com Reply-To:

Re: A brief report of Second Hadoop in China Salon

2009-05-16 Thread Yabo-Arber Xu
Congratulations! Wished I were there. :-) Best, Arber On Sat, May 16, 2009 at 9:09 AM, He Yongqiang heyongqi...@software.ict.ac.cn wrote: Hi, all In May 9, we held the second Hadoop In China salon. About 150 people attended, 46% of them are engineers/managers from industry companies, and

sort example

2009-05-16 Thread David Rio
Hi, I am trying to sort some data with hadoop(streaming mode). The input looks like: $ cat small_numbers.txt 9971681 9686036 2592322 4518219 1467363 To send my job to the cluster I use: hadoop jar /home/drio/hadoop-0.20.0/contrib/streaming/hadoop-0.20.0-streaming.jar \ -D mapred.reduce.tasks=2 \

Re: sort example

2009-05-16 Thread David Rio
BTW, Basically, this is the unix equivalent to what I am trying to do: $ cat input_file.txt | sort -n -drd On Sat, May 16, 2009 at 11:10 PM, David Rio driodei...@gmail.com wrote: Hi, I am trying to sort some data with hadoop(streaming mode). The input looks like: $ cat small_numbers.txt

Re: sort example

2009-05-16 Thread Peter Skomoroch
1) It is doing alphabetical sort by default, you can force Hadoop streaming to sort numerically with: -D mapred.text.key.comparator.options=-k2,2nr\ see the section A Useful Comparator Class in the streaming docs: http://hadoop.apache.org/core/docs/current/streaming.html and

Re: sort example

2009-05-16 Thread Peter Skomoroch
I just copy and pasted that comparator option from the docs, the -n part is what you want in this case. On Sun, May 17, 2009 at 12:40 AM, Peter Skomoroch peter.skomor...@gmail.com wrote: 1) It is doing alphabetical sort by default, you can force Hadoop streaming to sort numerically with: -D