Congratulations!
Nicholas Sze
- Original Message
From: He Yongqiang heyongqi...@software.ict.ac.cn
To: core-...@hadoop.apache.org core-...@hadoop.apache.org;
core-user@hadoop.apache.org core-user@hadoop.apache.org
Sent: Friday, May 15, 2009 6:09:50 PM
Subject: A brief report
To follow up this question, I have also asked help on Jrockit forum. They
kindly offered some useful and detailed suggestions according to the JRA
results. After updating the option list, the performance did become better
to some extend. But it is still not comparable with the Sun JVM. Maybe, it
hELLO TO EVERY BODY
I AM A NEW HAOODP USER I STARTED RUNNING HADOOP USING SITE
http://hadoop.apache.org/core/docs/current/quickstart.html BUT WHEN I RUN
COMMAND bin/hadoop jar hadoop-*-examples.jar grep input output
'dfs[a-z.]+' , IN PSEDUO DISTRIBUTED MODE I GET ERROR LIKE :::
Task
OK, I've just solved problem with minor data lost. Steps to solve:
1) comment out FSEditLog.java:542
2) compile hadoop-core jar
3) start cluster with new jar
Namenode will skip bad records in name/current/edits and write new edits
file back into fs. As bad records stand for actual IO operations,
Hi,
I am trying to include the stop words into hadoop map reduce, and later on,
into hive.
What is the accepted solution regarding the stop words in hadoop?
All I can think is to load all the stop words into an array in the mapper,
and then check each token against the stop words..(this would be
Perhaps some kind of in memory index would be better than iterating an
array? Binary tree or so.
I did similar with polygon indexes and point data. It requires
careful memory planning on the nodes if the indexes are large (mine
were several GB).
Just a thought,
Tim
On Sat, May 16, 2009 at
Can you please elaborate more about in memory index?
What kind of software did you used to implement this ?
Regards
On Sat, May 16, 2009 at 8:55 PM, tim robertson timrobertson...@gmail.comwrote:
Perhaps some kind of in memory index would be better than iterating an
array? Binary tree or so.
I
Try and google binary tree java and you will get loads of hits...
This is a simple implementation but I am sure there are better ones
that handle balancing better.
Cheers
Tim
public class BinaryTree {
public static void main(String[] args) {
BinaryTree bt = new
Just use a java.util.HashSet for this. There should only be a few dozen
stopwords, so load them into a HashSet when the Mapper starts up, and then
check your tokens against it while you're processing records.
-- Stefan
From: tim robertson timrobertson...@gmail.com
Reply-To:
Congratulations! Wished I were there. :-)
Best,
Arber
On Sat, May 16, 2009 at 9:09 AM, He Yongqiang
heyongqi...@software.ict.ac.cn wrote:
Hi, all
In May 9, we held the second Hadoop In China salon. About 150 people
attended, 46% of them are engineers/managers from industry companies, and
Hi,
I am trying to sort some data with hadoop(streaming mode). The input looks
like:
$ cat small_numbers.txt
9971681
9686036
2592322
4518219
1467363
To send my job to the cluster I use:
hadoop jar
/home/drio/hadoop-0.20.0/contrib/streaming/hadoop-0.20.0-streaming.jar \
-D mapred.reduce.tasks=2 \
BTW,
Basically, this is the unix equivalent to what I am trying to do:
$ cat input_file.txt | sort -n
-drd
On Sat, May 16, 2009 at 11:10 PM, David Rio driodei...@gmail.com wrote:
Hi,
I am trying to sort some data with hadoop(streaming mode). The input looks
like:
$ cat small_numbers.txt
1) It is doing alphabetical sort by default, you can force Hadoop streaming
to sort numerically with:
-D mapred.text.key.comparator.options=-k2,2nr\
see the section A Useful Comparator Class in the streaming docs:
http://hadoop.apache.org/core/docs/current/streaming.html
and
I just copy and pasted that comparator option from the docs, the -n part is
what you want in this case.
On Sun, May 17, 2009 at 12:40 AM, Peter Skomoroch peter.skomor...@gmail.com
wrote:
1) It is doing alphabetical sort by default, you can force Hadoop streaming
to sort numerically with:
-D
14 matches
Mail list logo