David Alves wrote:
Hi
I've been testing some different serialization techniques, to go
along with a research project.
I know motivation behind hadoop serialization mechanism (e.g.
Writable) and the enhancement of this feature through record I/O is not
only performance, but also
Hello,
When I use NLIneInputFormat, when I output:
System.out.println(mapred.map.tasks:+jobConf.get(mapred.map.tasks));
I see 51, but on the jobtracker site, the number is 18114. Yet with
TextInputFormat it shows 51.
I'm using Hadoop - 0.19
Any ideas why?
Regards
Saptarshi
--
Saptarshi
Sorry, i see - every line is now a maptask - one split,one task.(in
this case N=1 line per split)
Is that correct?
Saptarshi
On Jan 20, 2009, at 11:39 AM, Saptarshi Guha wrote:
Hello,
When I use NLIneInputFormat, when I output:
System
Hi,
I am following instructions for example wordcount version 2 execution on Hadoop
installed
under Cygwin. The quick start example worked fine. But word count version 2 is
giving following
error:
java.lang.ClassNotFoundException: org.myorg.WordCount
at
Open Source University Meetup:
Hi all,
Please Join Sun Microsystems Open Source University Meetup
Its place to share your thoughts, express your feelings ,
create your blogs and start the discussions on any open source technology
Thanks and Regards
Vinayak Katkar
Sun
Hi,
I was trying to run Hadoop wordcount version 2 example under Cygwin. I tried
without pattern.txt file -- It works fine.
I tried with pattern.txt file to skip some patterns, I get NULL POINTER
exception as follows::
09/01/20 12:56:16 INFO jvm.JvmMetrics: Initializing JVM Metrics with
Hi.
I'm running streaming on relatively big (2Tb) dataset, which is being split
by hadoop in 64mb pieces. One of the problems I have with that is my map
tasks take very long time to initialize (they need to load 3GB database into
RAM) and they are finishing these 64mb in 10 seconds.
So
Hi Dmitry,
Not a direct answer to your question but I think the right approach
would be to not load your database into memory during config() but
instead lookup the database from map() via Hbase or something similar.
That way you don't have to worry about the split sizes. In fact using
fewer
Well, database is specifically designed to fit into memory and if it is not
it will slow things down hundreds of time. One simple hack I came to is to
replace map tasks by /bin/cat and then run 150 reducers that will have
database constantly in memory. Parallelism is also not a problems, since
Saptarshi Guha wrote:
Sorry, i see - every line is now a maptask - one split,one task.(in
this case N=1 line per split)
Is that correct?
Saptarshi
You are right. NLineInputFormat splits N lines of input as one split and
each split is given to a map task.
By default, N is 1. N can configured
10 matches
Mail list logo