Re: collecting CPU, mem, iops of hadoop jobs

2011-12-20 Thread He Chen
You may need Ganglia. It is a cluster monitoring software.

On Tue, Dec 20, 2011 at 2:44 PM, Patai Sangbutsarakum <
silvianhad...@gmail.com> wrote:

> Hi Hadoopers,
>
> We're running Hadoop 0.20 CentOS5.5. I am finding the way to collect
> CPU time, memory usage, IOPS of each hadoop Job.
> What would be the good starting point ? document ? api ?
>
> Thanks in advance
> -P
>


Re: More cores Vs More Nodes ?

2011-12-13 Thread He Chen
Hi Brad

This is a really interesting experiment. I am curious why you did not use 2
cores each machine but 32 nodes. That makes the number of CPU core in two
groups equal.

Chen

On Tue, Dec 13, 2011 at 7:15 PM, Brad Sarsfield  wrote:

> Hi Prashant,
>
> In each case I had a single tasktracker per node. I oversubscribed the
> total tasks per tasktracker/node by 1.5 x # of cores.
>
> So for the 64 core allocation comparison.
>In A: 8 cores; Each machine had a single tasktracker with 8 maps /
> 4 reduce slots for 12 task slots total per machine x 8 machines (including
> head node)
>In B: 2 c   ores; Each machine had a single tasktracker with 2
> maps / 1 reduce slots for 3 slots total per machines x 29 machines
> (including head node which was running 8 cores)
>
> The experiment was done in a cloud hosted environment running set of VMs.
>
> ~Brad
>
> -Original Message-
> From: Prashant Kommireddi [mailto:prash1...@gmail.com]
> Sent: Tuesday, December 13, 2011 9:46 AM
> To: common-user@hadoop.apache.org
> Subject: Re: More cores Vs More Nodes ?
>
> Hi Brad, how many taskstrackers did you have on each node in both cases?
>
> Thanks,
> Prashant
>
> Sent from my iPhone
>
> On Dec 13, 2011, at 9:42 AM, Brad Sarsfield  wrote:
>
> > Praveenesh,
> >
> > Your question is not naïve; in fact, optimal hardware design can
> ultimately be a very difficult question to answer on what would be
> "better". If you made me pick one without much information I'd go for more
> machines.  But...
> >
> > It all depends; and there is no right answer :)
> >
> > More machines
> >+May run your workload faster
> >+Will give you a higher degree of reliability protection from node /
> hardware / hard drive failure.
> >+More aggregate IO capabilities
> >- capex / opex may be higher than allocating more cores More cores
> >+May run your workload faster
> >+More cores may allow for more tasks to run on the same machine
> >+More cores/tasks may reduce network contention and increase
> increasing task to task data flow performance.
> >
> > Notice "May run your workload faster" is in both; as it can be very
> workload dependant.
> >
> > My Experience:
> > I did a recent experiment and found that given the same number of cores
> (64) with the exact same network / machine configuration;
> >A: I had 8 machines with 8 cores
> >B: I had 28 machines with 2 cores (and 1x8 core head node)
> >
> > B was able to outperform A by 2x using teragen and terasort. These
> machines were running in a virtualized environment; where some of the IO
> capabilities behind the scenes were being regulated to 400Mbps per node
> when running in the 2 core configuration vs 1Gbps on the 8 core.  So I
> would expect the non-throttled scenario to work even better.
> >
> > ~Brad
> >
> >
> > -Original Message-
> > From: praveenesh kumar [mailto:praveen...@gmail.com]
> > Sent: Monday, December 12, 2011 8:51 PM
> > To: common-user@hadoop.apache.org
> > Subject: More cores Vs More Nodes ?
> >
> > Hey Guys,
> >
> > So I have a very naive question in my mind regarding Hadoop cluster
> nodes ?
> >
> > more cores or more nodes - Shall I spend money on going from 2-4 core
> machines, or spend money on buying more nodes less core eg. say 2 machines
> of 2 cores for example?
> >
> > Thanks,
> > Praveenesh
> >
>
>


Re: Matrix multiplication in Hadoop

2011-11-19 Thread He Chen
Right, I agree with Edward Capriolo, Hadoop + GPGPU is a better choice.



On Sat, Nov 19, 2011 at 10:53 AM, Edward Capriolo wrote:

> Sounds like a job for next gen map reduce native libraries and gpu's. A
> modern day Dr frankenstein for sure.
>
> On Saturday, November 19, 2011, Tim Broberg  wrote:
> > Perhaps this is a good candidate for a native library, then?
> >
> > 
> > From: Mike Davis [xmikeda...@gmail.com]
> > Sent: Friday, November 18, 2011 7:39 PM
> > To: common-user@hadoop.apache.org
> > Subject: Re: Matrix multiplication in Hadoop
> >
> > On Friday, November 18, 2011, Mike Spreitzer 
> wrote:
> >>  Why is matrix multiplication ill-suited for Hadoop?
> >
> > IMHO, a huge issue here is the JVM's inability to fully support cpu
> vendor
> > specific SIMD instructions and, by extension, optimized BLAS routines.
> > Running a large MM task using intel's MKL rather than relying on generic
> > compiler optimization is orders of magnitude faster on a single multicore
> > processor. I see almost no way that Hadoop could win such a CPU intensive
> > task against an mpi cluster with even a tenth of the nodes running with a
> > decently tuned BLAS library. Racing even against a single CPU might be
> > difficult, given the i/o overhead.
> >
> > Still, it's a reasonably common problem and we shouldn't murder the good
> in
> > favor of the best. I'm certain a MM/LinAlg Hadoop library with even
> > mediocre performance, wrt C, would get used.
> >
> > --
> > Mike Davis
> >
> > The information and any attached documents contained in this message
> > may be confidential and/or legally privileged.  The message is
> > intended solely for the addressee(s).  If you are not the intended
> > recipient, you are hereby notified that any use, dissemination, or
> > reproduction is strictly prohibited and may be unlawful.  If you are
> > not the intended recipient, please contact the sender immediately by
> > return e-mail and destroy all copies of the original message.
> >
>


Re: Matrix multiplication in Hadoop

2011-11-19 Thread He Chen
Did you try Hama?

There are may methods.

1) use Hadoop MPI which allows you use MPI MM code based on Hadoop;

2) Hama is designed for MM

3) Use pure Hadoop Java MapReduce;

I did this before but may not be optimal algorithm. Put your first matrix
in DistributedCache and take second matrix line as inputsplit. For each
line, use a mapper to let a array multply the first matrix in
DistributedCache. Use reducer to collect the result matrix. This algorithm
is limited by your DistributedCache size. It is suitable for a small matrix
to multiply a huge matrix.

Chen
On Sat, Nov 19, 2011 at 10:34 AM, Tim Broberg  wrote:

> Perhaps this is a good candidate for a native library, then?
>
> 
> From: Mike Davis [xmikeda...@gmail.com]
> Sent: Friday, November 18, 2011 7:39 PM
> To: common-user@hadoop.apache.org
> Subject: Re: Matrix multiplication in Hadoop
>
>  On Friday, November 18, 2011, Mike Spreitzer  wrote:
> >  Why is matrix multiplication ill-suited for Hadoop?
>
> IMHO, a huge issue here is the JVM's inability to fully support cpu vendor
> specific SIMD instructions and, by extension, optimized BLAS routines.
> Running a large MM task using intel's MKL rather than relying on generic
> compiler optimization is orders of magnitude faster on a single multicore
> processor. I see almost no way that Hadoop could win such a CPU intensive
> task against an mpi cluster with even a tenth of the nodes running with a
> decently tuned BLAS library. Racing even against a single CPU might be
> difficult, given the i/o overhead.
>
> Still, it's a reasonably common problem and we shouldn't murder the good in
> favor of the best. I'm certain a MM/LinAlg Hadoop library with even
> mediocre performance, wrt C, would get used.
>
> --
> Mike Davis
>
> The information and any attached documents contained in this message
> may be confidential and/or legally privileged.  The message is
> intended solely for the addressee(s).  If you are not the intended
> recipient, you are hereby notified that any use, dissemination, or
> reproduction is strictly prohibited and may be unlawful.  If you are
> not the intended recipient, please contact the sender immediately by
> return e-mail and destroy all copies of the original message.
>


Re: reducing mappers for a job

2011-11-16 Thread He Chen
Hi Jay Vyas

Ke yuan's method may decrease the number of mapper because in default

the number of mapper for a job = the number of blocks in this job's input
file.

Make sure you only change the block size for your specific job's input
file. Not Hadoop cluster's configuration.

If you change the block size for your Hadoop cluster configureation (in the
hdfs-site.xml file), this method may bring some side-effects.

1) waste of disk space;
2) difficulty to balance HDFS;
3) low Map stage data locality;

Bests!

Chen

On Wed, Nov 16, 2011 at 9:42 PM, ke yuan  wrote:

> just the blocksize 128M or 256M,it may reduce the number of mappers per job
>
> 2011/11/17 Jay Vyas 
>
> > Hi guys : In a shared cluster environment, whats the best way to reduce
> the
> > number of mappers per job ?  Should you do it with inputSplits ?  Or
> simply
> > toggle the values in the JobConf (i.e. increase the number of bytes in an
> > input split) ?
> >
> >
> >
> >
> >
> > --
> > Jay Vyas
> > MMSB/UCHC
> >
>


Re: HDFS file into Blocks

2011-09-25 Thread He Chen
Hi

It is interesting that a guy from Huawei is also working on Hadoop project.
:)

Chen

On Sun, Sep 25, 2011 at 11:29 PM, Uma Maheswara Rao G 72686 <
mahesw...@huawei.com> wrote:

>
> Hi,
>
>  You can find the Code in DFSOutputStream.java
>  Here there will be one thread DataStreamer thread. This thread will pick
> the packets from DataQueue and write on to the sockets.
>  Before this, when actually writing the chunks, based on the block size
> parameter passed from client, it will set the last packet parameter in
> Packet.
>  If the streamer thread finds that is the last block then it end the block.
> That means it will close the socket which were used for witing the block.
>  Streamer thread repeat the loops. When it find there is no sockets open
> then it will again create the pipeline for the next block.
>  Go throgh the flow from writeChunk in DFSOutputStream.java, where exactly
> enqueing the packets in dataQueue.
>
> Regards,
> Uma
> - Original Message -
> From: kartheek muthyala 
> Date: Sunday, September 25, 2011 11:06 am
> Subject: HDFS file into Blocks
> To: common-user@hadoop.apache.org
>
> > Hi all,
> > I am working around the code to understand where HDFS divides a
> > file into
> > blocks. Can anyone point me to this section of the code?
> > Thanks,
> > Kartheek
> >
>


Re: phases of Hadoop Jobs

2011-09-18 Thread He Chen
Or we can just seperate shuffle from reduce stage and integrate it to the
map stage
. Then we can clearly differentiate the map stage(before shuffle finish) and
(after shuffle finish)the reduce stage.


On Mon, Sep 19, 2011 at 1:20 AM, He Chen  wrote:

> Hi Kai
>
> Thank you  for the reply.
>
>  The reduce() will not start because the shuffle phase does not finish. And
> the shuffle phase will not finish untill alll mapper end.
>
> I am curious about the design purpose about overlapping the map and reduce
> stage. Was this only for saving shuffling time? Or there are some other
> reasons.
>
> Best wishes!
>
> Chen
>   On Mon, Sep 19, 2011 at 12:36 AM, Kai Voigt  wrote:
>
>> Hi Chen,
>>
>> the times when nodes running instances of the map and reduce nodes
>> overlap. But map() and reduce() execution will not.
>>
>> reduce nodes will start copying data from map nodes, that's the shuffle
>> phase. And the map nodes are still running during that copy phase. My
>> observation had been that if the map phase progresses from 0 to 100%, it
>> matches with the reduce phase progress from 0-33%. For example, if you map
>> progress shows 60%, reduce might show 20%.
>>
>> But the reduce() will not start until all the map() code has processed the
>> entire input. So you will never see the reduce progress higher than 66% when
>> map progress didn't reach 100%.
>>
>> If you see map phase reaching 100%, but reduce phase not making any higher
>> number than 66%, it means your reduce() code is broken or slow because it
>> doesn't produce any output. An infinitive loop is a common mistake.
>>
>> Kai
>>
>> Am 19.09.2011 um 07:29 schrieb He Chen:
>>
>> > Hi Arun
>> >
>> > I have a question. Do you know what is the reason that hadoop allows the
>> map
>> > and the reduce stage overlap? Or anyone knows about it. Thank you in
>> > advance.
>> >
>> > Chen
>> >
>> > On Sun, Sep 18, 2011 at 11:17 PM, Arun C Murthy 
>> wrote:
>> >
>> >> Nan,
>> >>
>> >> The 'phase' is implicitly understood by the 'progress' (value) made by
>> the
>> >> map/reduce tasks (see o.a.h.mapred.TaskStatus.Phase).
>> >>
>> >> For e.g.
>> >> Reduce:
>> >> 0-33% -> Shuffle
>> >> 34-66% -> Sort (actually, just 'merge', there is no sort in the reduce
>> >> since all map-outputs are sorted)
>> >> 67-100% -> Reduce
>> >>
>> >> With 0.23 onwards the Map has phases too:
>> >> 0-90% -> Map
>> >> 91-100% -> Final Sort/merge
>> >>
>> >> Now,about starting reduces early - this is done to ensure shuffle can
>> >> proceed for completed maps while rest of the maps run, there-by
>> pipelining
>> >> shuffle and map completion. There is a 'reduce slowstart' feature to
>> control
>> >> this - by default, reduces aren't started until 5% of maps are
>> complete.
>> >> Users can set this higher.
>> >>
>> >> Arun
>> >>
>> >> On Sep 18, 2011, at 7:24 PM, Nan Zhu wrote:
>> >>
>> >>> Hi, all
>> >>>
>> >>> recently, I was hit by a question, "how is a hadoop job divided into 2
>> >>> phases?",
>> >>>
>> >>> In textbooks, we are told that the mapreduce jobs are divided into 2
>> >> phases,
>> >>> map and reduce, and for reduce, we further divided it into 3 stages,
>> >>> shuffle, sort, and reduce, but in hadoop codes, I never think about
>> >>> this question, I didn't see any variable members in JobInProgress
>> class
>> >>> to indicate this information,
>> >>>
>> >>> and according to my understanding on the source code of hadoop, the
>> >> reduce
>> >>> tasks are unnecessarily started until all mappers are finished, in
>> >>> constract, we can see the reduce tasks are in shuffle stage while
>> there
>> >> are
>> >>> mappers which are still in running,
>> >>> So how can I indicate the phase which the job is belonging to?
>> >>>
>> >>> Thanks
>> >>> --
>> >>> Nan Zhu
>> >>> School of Electronic, Information and Electrical Engineering,229
>> >>> Shanghai Jiao Tong University
>> >>> 800,Dongchuan Road,Shanghai,China
>> >>> E-Mail: zhunans...@gmail.com
>> >>
>> >>
>>
>>  --
>> Kai Voigt
>> k...@123.org
>>
>>
>>
>>
>>
>


Re: phases of Hadoop Jobs

2011-09-18 Thread He Chen
Hi Kai

Thank you  for the reply.

 The reduce() will not start because the shuffle phase does not finish. And
the shuffle phase will not finish untill alll mapper end.

I am curious about the design purpose about overlapping the map and reduce
stage. Was this only for saving shuffling time? Or there are some other
reasons.

Best wishes!

Chen
On Mon, Sep 19, 2011 at 12:36 AM, Kai Voigt  wrote:

> Hi Chen,
>
> the times when nodes running instances of the map and reduce nodes overlap.
> But map() and reduce() execution will not.
>
> reduce nodes will start copying data from map nodes, that's the shuffle
> phase. And the map nodes are still running during that copy phase. My
> observation had been that if the map phase progresses from 0 to 100%, it
> matches with the reduce phase progress from 0-33%. For example, if you map
> progress shows 60%, reduce might show 20%.
>
> But the reduce() will not start until all the map() code has processed the
> entire input. So you will never see the reduce progress higher than 66% when
> map progress didn't reach 100%.
>
> If you see map phase reaching 100%, but reduce phase not making any higher
> number than 66%, it means your reduce() code is broken or slow because it
> doesn't produce any output. An infinitive loop is a common mistake.
>
> Kai
>
> Am 19.09.2011 um 07:29 schrieb He Chen:
>
> > Hi Arun
> >
> > I have a question. Do you know what is the reason that hadoop allows the
> map
> > and the reduce stage overlap? Or anyone knows about it. Thank you in
> > advance.
> >
> > Chen
> >
> > On Sun, Sep 18, 2011 at 11:17 PM, Arun C Murthy 
> wrote:
> >
> >> Nan,
> >>
> >> The 'phase' is implicitly understood by the 'progress' (value) made by
> the
> >> map/reduce tasks (see o.a.h.mapred.TaskStatus.Phase).
> >>
> >> For e.g.
> >> Reduce:
> >> 0-33% -> Shuffle
> >> 34-66% -> Sort (actually, just 'merge', there is no sort in the reduce
> >> since all map-outputs are sorted)
> >> 67-100% -> Reduce
> >>
> >> With 0.23 onwards the Map has phases too:
> >> 0-90% -> Map
> >> 91-100% -> Final Sort/merge
> >>
> >> Now,about starting reduces early - this is done to ensure shuffle can
> >> proceed for completed maps while rest of the maps run, there-by
> pipelining
> >> shuffle and map completion. There is a 'reduce slowstart' feature to
> control
> >> this - by default, reduces aren't started until 5% of maps are complete.
> >> Users can set this higher.
> >>
> >> Arun
> >>
> >> On Sep 18, 2011, at 7:24 PM, Nan Zhu wrote:
> >>
> >>> Hi, all
> >>>
> >>> recently, I was hit by a question, "how is a hadoop job divided into 2
> >>> phases?",
> >>>
> >>> In textbooks, we are told that the mapreduce jobs are divided into 2
> >> phases,
> >>> map and reduce, and for reduce, we further divided it into 3 stages,
> >>> shuffle, sort, and reduce, but in hadoop codes, I never think about
> >>> this question, I didn't see any variable members in JobInProgress class
> >>> to indicate this information,
> >>>
> >>> and according to my understanding on the source code of hadoop, the
> >> reduce
> >>> tasks are unnecessarily started until all mappers are finished, in
> >>> constract, we can see the reduce tasks are in shuffle stage while there
> >> are
> >>> mappers which are still in running,
> >>> So how can I indicate the phase which the job is belonging to?
> >>>
> >>> Thanks
> >>> --
> >>> Nan Zhu
> >>> School of Electronic, Information and Electrical Engineering,229
> >>> Shanghai Jiao Tong University
> >>> 800,Dongchuan Road,Shanghai,China
> >>> E-Mail: zhunans...@gmail.com
> >>
> >>
>
>  --
> Kai Voigt
> k...@123.org
>
>
>
>
>


Re: phases of Hadoop Jobs

2011-09-18 Thread He Chen
Hi Arun

I have a question. Do you know what is the reason that hadoop allows the map
and the reduce stage overlap? Or anyone knows about it. Thank you in
advance.

Chen

On Sun, Sep 18, 2011 at 11:17 PM, Arun C Murthy  wrote:

> Nan,
>
>  The 'phase' is implicitly understood by the 'progress' (value) made by the
> map/reduce tasks (see o.a.h.mapred.TaskStatus.Phase).
>
>  For e.g.
>  Reduce:
>  0-33% -> Shuffle
>  34-66% -> Sort (actually, just 'merge', there is no sort in the reduce
> since all map-outputs are sorted)
>  67-100% -> Reduce
>
>  With 0.23 onwards the Map has phases too:
>  0-90% -> Map
>  91-100% -> Final Sort/merge
>
>  Now,about starting reduces early - this is done to ensure shuffle can
> proceed for completed maps while rest of the maps run, there-by pipelining
> shuffle and map completion. There is a 'reduce slowstart' feature to control
> this - by default, reduces aren't started until 5% of maps are complete.
> Users can set this higher.
>
> Arun
>
> On Sep 18, 2011, at 7:24 PM, Nan Zhu wrote:
>
> > Hi, all
> >
> > recently, I was hit by a question, "how is a hadoop job divided into 2
> > phases?",
> >
> > In textbooks, we are told that the mapreduce jobs are divided into 2
> phases,
> > map and reduce, and for reduce, we further divided it into 3 stages,
> > shuffle, sort, and reduce, but in hadoop codes, I never think about
> > this question, I didn't see any variable members in JobInProgress class
> > to indicate this information,
> >
> > and according to my understanding on the source code of hadoop, the
> reduce
> > tasks are unnecessarily started until all mappers are finished, in
> > constract, we can see the reduce tasks are in shuffle stage while there
> are
> > mappers which are still in running,
> > So how can I indicate the phase which the job is belonging to?
> >
> > Thanks
> > --
> > Nan Zhu
> > School of Electronic, Information and Electrical Engineering,229
> > Shanghai Jiao Tong University
> > 800,Dongchuan Road,Shanghai,China
> > E-Mail: zhunans...@gmail.com
>
>


Re: phases of Hadoop Jobs

2011-09-18 Thread He Chen
Hi Nan

I have the same question for a while. In some research papers, people like
to make the reduce stage to be slow start. In this way, the map stage and
reduce stage are easy to differentiate. You can use the number of remaining
unallocated map tasks to detect in which stage your job is.

To let the reduce stage overlap with the map stage, it blurs the boundary
between two stages. I think it may decreases the execution time of the whole
job (I am not sure whether this is the main reason that people allow "fast
start" happen or not).

However, "fast start" has its side-effect. It is hard to get a global view
of the map stage's output, and then the reduce stage's balance and data
locality are not easy to be solved.

Chen
Research Assistant of Holland Computing Center
PhD student of CSE Department
University of Nebraska-Lincoln


On Sun, Sep 18, 2011 at 9:24 PM, Nan Zhu  wrote:

> Hi, all
>
>  recently, I was hit by a question, "how is a hadoop job divided into 2
> phases?",
>
> In textbooks, we are told that the mapreduce jobs are divided into 2
> phases,
> map and reduce, and for reduce, we further divided it into 3 stages,
> shuffle, sort, and reduce, but in hadoop codes, I never think about
> this question, I didn't see any variable members in JobInProgress class
> to indicate this information,
>
> and according to my understanding on the source code of hadoop, the reduce
> tasks are unnecessarily started until all mappers are finished, in
> constract, we can see the reduce tasks are in shuffle stage while there are
> mappers which are still in running,
> So how can I indicate the phase which the job is belonging to?
>
> Thanks
> --
> Nan Zhu
> School of Electronic, Information and Electrical Engineering,229
> Shanghai Jiao Tong University
> 800,Dongchuan Road,Shanghai,China
> E-Mail: zhunans...@gmail.com
>


Re: Poor IO performance on a 10 node cluster.

2011-05-30 Thread He Chen
Hi Gyuribácsi

I would suggest you divide MapReduce program execution time into 3 parts

a) Map Stage
In this stage, wc splits input data and generates map tasks. Each map task
process one block (in default, you can change it in FileInputFormat.java).
As Brian said, if you have larger blocks size, you may have less number of
map tasks, and then probably less overhead.

b) Reduce Stage
2) shuffle phase
 In this phase, reduce task collect intermediate results from every node
that has executed map tasks. Each reduce task can have many current threads
to obtain data(you can configure it in mapred-site.xml, it is
"mapreduce.reduce.shuffle.parallelcopies"). But, be careful to your data
popularity. For example, you have "Hadoop, Hadoop, Hadoop,hello". The
default Hadoop partitioner will assign 3   key-value pairs to one
node. Thus, if you have two nodes run reduce tasks, one of them will copy 3
times more data than the other. This will cause one node slower than the
other. You may rewrite the partitioner.

3) sort and reduce phase
I think the Hadoop UI will give you some hints about how long this phase
takes.

By dividing MapReduce application into these 3 parts, you can easily find
which one is your bottleneck and do some profiling. And I don't know why my
font change to this type.:(

Hope it will be helpful.
Chen

On Mon, May 30, 2011 at 12:32 PM, Harsh J  wrote:

> Psst. The cats speak in their own language ;-)
>
> On Mon, May 30, 2011 at 10:31 PM, James Seigel  wrote:
> > Not sure that will help ;)
> >
> > Sent from my mobile. Please excuse the typos.
> >
> > On 2011-05-30, at 9:23 AM, Boris Aleksandrovsky 
> wrote:
> >
> >>
> Ljddfjfjfififfifjftjiifjfjjjffkxbznzsjxodiewisshsudddudsjidhddueiweefiuftttoitfiirriifoiffkllddiririiriioerorooiieirrioeekroooeoooirjjfdijdkkduddjudiiehs
> >> On May 30, 2011 5:28 AM, "Gyuribácsi"  wrote:
> >>>
> >>>
> >>> Hi,
> >>>
> >>> I have a 10 node cluster (IBM blade servers, 48GB RAM, 2x500GB Disk, 16
> HT
> >>> cores).
> >>>
> >>> I've uploaded 10 files to HDFS. Each file is 10GB. I used the streaming
> >> jar
> >>> with 'wc -l' as mapper and 'cat' as reducer.
> >>>
> >>> I use 64MB block size and the default replication (3).
> >>>
> >>> The wc on the 100 GB took about 220 seconds which translates to about
> 3.5
> >>> Gbit/sec processing speed. One disk can do sequential read with
> 1Gbit/sec
> >> so
> >>> i would expect someting around 20 GBit/sec (minus some overhead), and
> I'm
> >>> getting only 3.5.
> >>>
> >>> Is my expectaion valid?
> >>>
> >>> I checked the jobtracked and it seems all nodes are working, each
> reading
> >>> the right blocks. I have not played with the number of mapper and
> reducers
> >>> yet. It seems the number of mappers is the same as the number of blocks
> >> and
> >>> the number of reducers is 20 (there are 20 disks). This looks ok for
> me.
> >>>
> >>> We also did an experiment with TestDFSIO with similar results.
> Aggregated
> >>> read io speed is around 3.5Gbit/sec. It is just too far from my
> >>> expectation:(
> >>>
> >>> Please help!
> >>>
> >>> Thank you,
> >>> Gyorgy
> >>> --
> >>> View this message in context:
> >>
> http://old.nabble.com/Poor-IO-performance-on-a-10-node-cluster.-tp31732971p31732971.html
> >>> Sent from the Hadoop core-user mailing list archive at Nabble.com.
> >>>
> >
>
>
>
> --
> Harsh J
>


How can I let datanode do not save blocks from other datanodes after a MapReduce job

2011-05-25 Thread He Chen
Hi all,

I remember there is a parameter that we can turn this off. I mean we do not
allow tasktracker to keep the blocks from other datanode after a MapReduce
job finished.

I met a problem when I using hadoop-0.21.0.

First of all, I balanced cluster according to number of blocks on every
datanode. That's to say, for example, under "/user/test/", I have 100 blocks
data. The replication number is 2. Then there are total 200 block under
"/user/test". I have 10 datanodes. What I do is to let every datanode to
have 20 blocks of the total.

However, after about 300 MapReduce jobs finished. I found out the number of
blocks in datanodes changed. It is not 20 for every datanode. someone got 21
and someone got 19. I turned off the hadoop balancer.

What is the reason caused this problem? Any suggestion will be appreciated!

Best wishes!

Chen


Re: Change block size from 64M to 128M does not work on Hadoop-0.21

2011-05-04 Thread He Chen
Got it. Thankyou Harsh. BTW
It is `hadoop dfs -Ddfs.blocksize=size -put file file`. No dot between
"block" and "size"

On Wed, May 4, 2011 at 3:18 PM, He Chen  wrote:

> Tried second solution. Does not work, still 2 64M blocks. h
>
>
> On Wed, May 4, 2011 at 3:16 PM, He Chen  wrote:
>
>> Hi Harsh
>>
>> Thank you for the reply.
>>
>> Actually, the hadoop directory is on my NFS server, every node reads the
>> same file from NFS server. I think this is not a problem.
>>
>> I like your second solution. But I am not sure, whether the namenode
>> will divide those 128MB
>>
>>  blocks to smaller ones in future or not.
>>
>> Chen
>>
>> On Wed, May 4, 2011 at 3:00 PM, Harsh J  wrote:
>>
>>> Your client (put) machine must have the same block size configuration
>>> during upload as well.
>>>
>>> Alternatively, you may do something explicit like `hadoop dfs
>>> -Ddfs.block.size=size -put file file`
>>>
>>> On Thu, May 5, 2011 at 12:59 AM, He Chen  wrote:
>>> > Hi all
>>> >
>>> > I met a problem about changing block size from 64M to 128M. I am sure I
>>> > modified the correct configuration file hdfs-site.xml. Because I can
>>> change
>>> > the replication number correctly. However, it does not work on block
>>> size
>>> > changing.
>>> >
>>> > For example:
>>> >
>>> > I change the dfs.block.size to 134217728 bytes.
>>> >
>>> > I upload a file which is 128M and use "fsck" to find how many blocks
>>> this
>>> > file has. It shows:
>>> > /user/file1/file 134217726 bytes, 2 blocks(s): OK
>>> > 0. blk_xx len=67108864 repl=2 [192.168.0.3:50010,
>>> 192.168.0.32:50010
>>> > ]
>>> > 1. blk_xx len=67108862 repl=2 [192.168.0.9:50010,
>>> 192.168.0.8:50010]
>>> >
>>> > The hadoop version is 0.21. Any suggestion will be appreciated!
>>> >
>>> > thanks
>>> >
>>> > Chen
>>> >
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>
>>
>


Re: Change block size from 64M to 128M does not work on Hadoop-0.21

2011-05-04 Thread He Chen
Tried second solution. Does not work, still 2 64M blocks. h

On Wed, May 4, 2011 at 3:16 PM, He Chen  wrote:

> Hi Harsh
>
> Thank you for the reply.
>
> Actually, the hadoop directory is on my NFS server, every node reads the
> same file from NFS server. I think this is not a problem.
>
> I like your second solution. But I am not sure, whether the namenode
> will divide those 128MB
>
>  blocks to smaller ones in future or not.
>
> Chen
>
> On Wed, May 4, 2011 at 3:00 PM, Harsh J  wrote:
>
>> Your client (put) machine must have the same block size configuration
>> during upload as well.
>>
>> Alternatively, you may do something explicit like `hadoop dfs
>> -Ddfs.block.size=size -put file file`
>>
>> On Thu, May 5, 2011 at 12:59 AM, He Chen  wrote:
>> > Hi all
>> >
>> > I met a problem about changing block size from 64M to 128M. I am sure I
>> > modified the correct configuration file hdfs-site.xml. Because I can
>> change
>> > the replication number correctly. However, it does not work on block
>> size
>> > changing.
>> >
>> > For example:
>> >
>> > I change the dfs.block.size to 134217728 bytes.
>> >
>> > I upload a file which is 128M and use "fsck" to find how many blocks
>> this
>> > file has. It shows:
>> > /user/file1/file 134217726 bytes, 2 blocks(s): OK
>> > 0. blk_xx len=67108864 repl=2 [192.168.0.3:50010,
>> 192.168.0.32:50010
>> > ]
>> > 1. blk_xx len=67108862 repl=2 [192.168.0.9:50010,
>> 192.168.0.8:50010]
>> >
>> > The hadoop version is 0.21. Any suggestion will be appreciated!
>> >
>> > thanks
>> >
>> > Chen
>> >
>>
>>
>>
>> --
>> Harsh J
>>
>
>


Re: Change block size from 64M to 128M does not work on Hadoop-0.21

2011-05-04 Thread He Chen
Hi Harsh

Thank you for the reply.

Actually, the hadoop directory is on my NFS server, every node reads the
same file from NFS server. I think this is not a problem.

I like your second solution. But I am not sure, whether the namenode
will divide those 128MB

 blocks to smaller ones in future or not.

Chen

On Wed, May 4, 2011 at 3:00 PM, Harsh J  wrote:

> Your client (put) machine must have the same block size configuration
> during upload as well.
>
> Alternatively, you may do something explicit like `hadoop dfs
> -Ddfs.block.size=size -put file file`
>
> On Thu, May 5, 2011 at 12:59 AM, He Chen  wrote:
> > Hi all
> >
> > I met a problem about changing block size from 64M to 128M. I am sure I
> > modified the correct configuration file hdfs-site.xml. Because I can
> change
> > the replication number correctly. However, it does not work on block size
> > changing.
> >
> > For example:
> >
> > I change the dfs.block.size to 134217728 bytes.
> >
> > I upload a file which is 128M and use "fsck" to find how many blocks this
> > file has. It shows:
> > /user/file1/file 134217726 bytes, 2 blocks(s): OK
> > 0. blk_xx len=67108864 repl=2 [192.168.0.3:50010,
> 192.168.0.32:50010
> > ]
> > 1. blk_xx len=67108862 repl=2 [192.168.0.9:50010,
> 192.168.0.8:50010]
> >
> > The hadoop version is 0.21. Any suggestion will be appreciated!
> >
> > thanks
> >
> > Chen
> >
>
>
>
> --
> Harsh J
>


Change block size from 64M to 128M does not work on Hadoop-0.21

2011-05-04 Thread He Chen
Hi all

I met a problem about changing block size from 64M to 128M. I am sure I
modified the correct configuration file hdfs-site.xml. Because I can change
the replication number correctly. However, it does not work on block size
changing.

For example:

I change the dfs.block.size to 134217728 bytes.

I upload a file which is 128M and use "fsck" to find how many blocks this
file has. It shows:
/user/file1/file 134217726 bytes, 2 blocks(s): OK
0. blk_xx len=67108864 repl=2 [192.168.0.3:50010, 192.168.0.32:50010
]
1. blk_xx len=67108862 repl=2 [192.168.0.9:50010, 192.168.0.8:50010]

The hadoop version is 0.21. Any suggestion will be appreciated!

thanks

Chen


Apply HADOOP-4667 to branch-0.20

2011-04-26 Thread He Chen
Hey everyone

I tried to apply HADOOP-4667 patch to branch-0.20, but always failed.
Because my cluster is based on branch-0.20, however, I want to test the
delay scheduling method performance. I do not want to re-format the HDFS.

Then I tried to apply HADOOP-4667 to branch-0.20. Anyone did this before or
any suggestion?   Thank you in advance.

Best wishes!

Chen


Any one know where to get Hadoop production cluster log

2011-03-16 Thread He Chen
Hi all

I am working on Hadoop scheduler. But I do not know where to get log from
Hadoop production clusters. Any suggestions?

Bests

Chen


Anyone knows how to attach a figure on Hadoop Wiki page?

2011-03-14 Thread He Chen
Hi all

Any suggestions?

Bests

Chen


Re: Not able to Run C++ code in Hadoop Cluster

2011-03-14 Thread He Chen
Agree with Keith Wiley, we use streaming also.

On Mon, Mar 14, 2011 at 11:40 AM, Keith Wiley  wrote:

> Not to speak against pipes because I don't have much experience with it,
> but I eventually abandoned my pipes efforts and went with streaming.  If you
> don't get pipes to work, you might take a look at streaming as an
> alternative.
>
> Cheers!
>
>
> 
> Keith Wiley kwi...@keithwiley.com keithwiley.com
> music.keithwiley.com
>
> "I used to be with it, but then they changed what it was.  Now, what I'm
> with
> isn't it, and what's it seems weird and scary to me."
>   --  Abe (Grandpa) Simpson
>
> 
>
>


Re: Cuda Program in Hadoop Cluster

2011-03-09 Thread He Chen
Hi, Adarsh Sharma

For C code

My friend employ hadoop streaming to run CUDA C code. You can send email to
him. p...@cse.unl.edu.

Best wishes!

Chen


On Thu, Mar 3, 2011 at 11:18 PM, Adarsh Sharma wrote:

> Dear all,
>
> I followed a fantastic tutorial and able to run the Wordcont C++ Program in
> Hadoop Cluster.
>
>
> http://cs.smith.edu/dftwiki/index.php/Hadoop_Tutorial_2.2_--_Running_C%2B%2B_Programs_on_Hadoop
>
> But know I want to run a Cuda Program in the Hadoop Cluster but results in
> errors.
> Is anyone has done it before and guide me how to do this.
>
> I attached the both files. Please find the attachment.
>
>
> Thanks & best Regards,
>
> Adarsh Sharma
>


Re: hadoop balancer

2011-03-04 Thread He Chen
Thank you very much Icebergs.

I rewrite the balancer. Now, given a directory like "/user/foo/", I can
balance the blocks under this directory evenly to every node in the cluster.

Best wishes!

Chen

On Thu, Mar 3, 2011 at 11:14 PM, icebergs  wrote:

> try this command
> hadoop fs -setrep -R -w 2 xx
> maybe help
>
> 2011/3/2 He Chen 
>
> > Hi all
> >
> > I met a problem when I try to balance certain hdfs directory among the
> > clusters. For example, I have a directory "/user/xxx/", and there 100
> > blocks. I want to balance them among my 5 nodes clusters. Each node has
> 40
> > blocks (2 replicas). The problem is about transfer block from one
> datanode
> > to another. Actually, I followed the balancer's method. However, it
> always
> > waits for the response of destination datanode and halt. I attached the
> > code:
> > .
> >
> >  Socket sock = new Socket();
> >
> >  DataOutputStream out = null;
> >
> >  DataInputStream in = null;
> >
> >  try{
> >
> >sock.connect(NetUtils.createSocketAddr(
> >
> >target.getName()), HdfsConstants.READ_TIMEOUT);
> >
> >sock.setKeepAlive(true);
> >
> >System.out.println(sock.isConnected());
> >
> >out = new DataOutputStream( new BufferedOutputStream(
> >
> >sock.getOutputStream(), FSConstants.BUFFER_SIZE));
> >
> >out.writeShort(DataTransferProtocol.DATA_TRANSFER_VERSION);
> >
> >out.writeByte(DataTransferProtocol.OP_REPLACE_BLOCK);
> >
> >out.writeLong(block2move.getBlockId());
> >
> >out.writeLong(block2move.getGenerationStamp());
> >
> >Text.writeString(out, source.getStorageID());
> >
> >System.out.println("Ready to move");
> >
> >source.write(out);
> >
> >System.out.println("Write to output Stream");
> >
> >out.flush();
> >
> >System.out.println("out has been flushed!");
> >
> >in = new DataInputStream( new BufferedInputStream(
> >
> >sock.getInputStream(), FSConstants.BUFFER_SIZE));
> >
> >It stop here and wait for response.
> >
> >short status = in.readShort();
> >
> >System.out.println("Got the response from input stream!"+status);
> >
> >if (status != DataTransferProtocol.OP_STATUS_SUCCESS) {
> >
> >   throw new IOException("block move is failed\t"+status);
> >
> >}
> >
> >
> >
> >  } catch (IOException e) {
> >
> >LOG.warn("Error moving block "+block2move.getBlockId()+
> >
> >" from " + source.getName() + " to " +
> >
> >target.getName() + " through " +
> >
> >source.getName() +
> >
> >": "+e.toString());
> >
> >
> >   } finally {
> >
> >IOUtils.closeStream(out);
> >
> >IOUtils.closeStream(in);
> >
> >IOUtils.closeSocket(sock);
> >   }
> > ..
> >
> > Any reply will be appreciated. Thank you in advance!
> >
> > Chen
> >
>


hadoop balancer

2011-03-01 Thread He Chen
Hi all

I met a problem when I try to balance certain hdfs directory among the
clusters. For example, I have a directory "/user/xxx/", and there 100
blocks. I want to balance them among my 5 nodes clusters. Each node has 40
blocks (2 replicas). The problem is about transfer block from one datanode
to another. Actually, I followed the balancer's method. However, it always
waits for the response of destination datanode and halt. I attached the
code:
.

  Socket sock = new Socket();

  DataOutputStream out = null;

  DataInputStream in = null;

  try{

sock.connect(NetUtils.createSocketAddr(

target.getName()), HdfsConstants.READ_TIMEOUT);

sock.setKeepAlive(true);

System.out.println(sock.isConnected());

out = new DataOutputStream( new BufferedOutputStream(

sock.getOutputStream(), FSConstants.BUFFER_SIZE));

out.writeShort(DataTransferProtocol.DATA_TRANSFER_VERSION);

out.writeByte(DataTransferProtocol.OP_REPLACE_BLOCK);

out.writeLong(block2move.getBlockId());

out.writeLong(block2move.getGenerationStamp());

Text.writeString(out, source.getStorageID());

System.out.println("Ready to move");

source.write(out);

System.out.println("Write to output Stream");

out.flush();

System.out.println("out has been flushed!");

in = new DataInputStream( new BufferedInputStream(

sock.getInputStream(), FSConstants.BUFFER_SIZE));

It stop here and wait for response.

short status = in.readShort();

System.out.println("Got the response from input stream!"+status);

if (status != DataTransferProtocol.OP_STATUS_SUCCESS) {

   throw new IOException("block move is failed\t"+status);

}



  } catch (IOException e) {

LOG.warn("Error moving block "+block2move.getBlockId()+

" from " + source.getName() + " to " +

target.getName() + " through " +

source.getName() +

": "+e.toString());


   } finally {

IOUtils.closeStream(out);

IOUtils.closeStream(in);

IOUtils.closeSocket(sock);
   }
..

Any reply will be appreciated. Thank you in advance!

Chen


Re: CUDA on Hadoop

2011-02-10 Thread He Chen
Thank you Steve Loughran. I just created a new page on Hadoop wiki, however,
how can I create a new document page on Hadoop Wiki?

Best wishes

Chen

On Thu, Feb 10, 2011 at 5:38 AM, Steve Loughran  wrote:

> On 09/02/11 17:31, He Chen wrote:
>
>> Hi sharma
>>
>> I shared our slides about CUDA performance on Hadoop clusters. Feel free
>> to
>> modified it, please mention the copyright!
>>
>
> This is nice. If you stick it up online you should link to it from the
> Hadoop wiki pages -maybe start a hadoop+cuda page and refer to it
>
>


Re: CUDA on Hadoop

2011-02-09 Thread He Chen
Hi  Sharma

I have some experiences on working Hybrid Hadoop with GPU. Our group has
tested CUDA performance on Hadoop clusters. We obtain 20 times speedup and
save up to 95% power consumption in some computation-intensive test case.

You can parallel your Java code by using JCUDA which is a kind of API to
help you call CUDA in your Java code.

Chen

On Wed, Feb 9, 2011 at 8:45 AM, Steve Loughran  wrote:

> On 09/02/11 13:58, Harsh J wrote:
>
>> You can check-out this project which did some work for Hama+CUDA:
>> http://code.google.com/p/mrcl/
>>
>
> Amazon let you bring up a Hadoop cluster on machines with GPUs you can code
> against, but I haven't heard of anyone using it. The big issue is bandwidth;
> it just doesn't make sense for a classic "scan through the logs" kind of
> problem as the disk:GPU bandwidth ratio is even worse than disk:CPU.
>
> That said, if you were doing something that involved a lot of compute on a
> block of data (e.g. rendering tiles in a map), this could work.
>


Re: Question about Hadoop Default FCFS Job Scheduler

2011-01-17 Thread He Chen
Hi Nan,

Thank you for the reply. I understand what you mean. What I concern is
inside the "obtainNewLocalMapTask(...)" method, it only assigns one tasks a
time.

Now I understand why it only assigns one task at a time. It is because the
outside loop:

for (i = 0; i < MapperCapacity; ++i){

(..)

}

I mean why this loop exists here. Why does the scheduler use this type of
loop. It imposes overhead to the task assigning process if only assign one
task at a time. It is obviously that a node can be assigned all available
local tasks it can in one "afford obtainNewLocalMapTask(..)" method
call.

Bests

Chen

On Mon, Jan 17, 2011 at 8:28 AM, Nan Zhu  wrote:

> Hi, Chen
>
> How is it going recently?
>
> Actually I think you misundertand the code in assignTasks() in
> JobQueueTaskScheduler.java, see the following structure of the interesting
> codes:
>
> //I'm sorry, I hacked the code so much, the name of the variables may be
> different from the original version
>
> for (i = 0; i < MapperCapacity; ++i){
>   ...
>   for (JobInProgress job:jobQueue){
>   //try to shedule a node-local or rack-local map tasks
>   //here is the interesting place
>   t = job.obtainNewLocalMapTask(...);
>   if (t != null){
>  ...
>  break;//the break statement here will make the control flow back
> to "for (job:jobQueue)" which means that it will restart map tasks
> selection
> procedure from the first job, so , it is actually schedule all of the first
> job's local mappers first until the map slots are full
>   }
>   }
> }
>
> BTW, we can only schedule a reduce task in a single heartbeat
>
>
>
> Best,
> Nan
> On Sat, Jan 15, 2011 at 1:45 PM, He Chen  wrote:
>
> > Hey all
> >
> > Why does the FCFS scheduler only let a node chooses one task at a time in
> > one job? In order to increase the data locality,
> > it is reasonable to let a node to choose all its local tasks (if it can)
> > from a job at a time.
> >
> > Any reply will be appreciated.
> >
> > Thanks
> >
> > Chen
> >
>


Question about Hadoop Default FCFS Job Scheduler

2011-01-14 Thread He Chen
Hey all

Why does the FCFS scheduler only let a node chooses one task at a time in
one job? In order to increase the data locality,
it is reasonable to let a node to choose all its local tasks (if it can)
from a job at a time.

Any reply will be appreciated.

Thanks

Chen


Re: Why Hadoop uses HTTP for file transmission between Map and Reduce?

2011-01-13 Thread He Chen
Actually, PhedEx is using GridFTP for its data transferring.

On Thu, Jan 13, 2011 at 5:34 AM, Steve Loughran  wrote:

> On 13/01/11 08:34, li ping wrote:
>
>> That is also my concerns. Is it efficient for data transmission.
>>
>
> It's long lived TCP connections, reasonably efficient for bulk data xfer,
> has all the throttling of TCP built in, and comes with some excellently
> debugged client and server code in the form of jetty and httpclient. In
> maintenance costs alone, those libraries justify HTTP unless you have a
> vastly superior option *and are willing to maintain it forever*
>
> FTPs limits are well known (security), NFS limits well known (security, UDP
> version doesn't throttle), self developed protocols will have whatever
> problems you want.
>
> There are better protocols for long-haul data transfer over fat pipes, such
> as GridFTP , PhedEX ( http://www.gridpp.ac.uk/papers/ah05_phedex.pdf ),
> which use multiple TCP channels in parallel to reduce the impact of a single
> lost packet, but within a datacentre, you shouldn't have to worry about
> this. If you do find lots of packets get lost, raise the issue with the
> networking team.
>
> -Steve
>
>
>
>> On Thu, Jan 13, 2011 at 4:27 PM, Nan Zhu  wrote:
>>
>>  Hi, all
>>>
>>> I have a question about the file transmission between Map and Reduce
>>> stage,
>>> in current implementation, the Reducers get the results generated by
>>> Mappers
>>> through HTTP Get, I don't understand why HTTP is selected, why not FTP,
>>> or
>>> a
>>> self-developed protocal?
>>>
>>> Just for HTTP's simple?
>>>
>>> thanks
>>>
>>> Nan
>>>
>>>
>>
>>
>>
>


FW:FW

2010-12-31 Thread He Chen
I bought some items from a commercial site, because of the unique
channel of purchases,
product prices unexpected, I think you can go to see: elesales.com ,
high-quality products can also attract you.


Re: Jcuda on Hadoop

2010-12-09 Thread He Chen
Thank you Mathias, It works

On Thu, Dec 9, 2010 at 6:26 PM, Mathias Herberts  wrote:

> Put the native libs (.so) with the other common libs that hadoop
> already has (libhadoop, libcompression, ...). Or put them in a
> specific 'lib/native' in your job jar and set 'java.library.path' to
> 'job.local.dir'/../jars/lib/native in your MR job.
>
> Mathias.
>
> On Thu, Dec 9, 2010 at 23:17, He Chen  wrote:
> > I am still in the test stage. I mean I only start one JobTracker and one
> > TaskTracker.
> > I copied all jcuda.*.jar into  HADOOP_HOME/lib/
> >
> > from the ps aux|grep java
> >
> > I can confirm the JT and TT processes all contain the jcuda.*.jar files
> >
> > On Thu, Dec 9, 2010 at 4:05 PM, Mathias Herberts <
> mathias.herbe...@gmail.com
> >> wrote:
> >
> >> You need to have the native libs on all tasktrackers and have
> >> java.library.path correctly set.
> >> On Dec 9, 2010 11:01 PM, "He Chen"  wrote:
> >> > Hello everyone, I 've got a problem when I write some Jcuda program
> based
> >> on
> >> > Hadoop MapReduce. I use the jcudaUtill. The KernelLauncherSample can
> be
> >> > successfully executed on my worker node. However, When I submit a
> program
> >> > containing jcuda to Hadoop MapReduce. I got following errors. Any
> reply
> >> will
> >> > be appreciated! 10/12/09 15:41:39 INFO mapred.JobClient: Running job:
> >> > job_201012091523_0002 10/12/09 15:41:40 INFO mapred.JobClient: map 0%
> >> reduce
> >> > 0% 10/12/09 15:41:53 INFO mapred.JobClient: Task Id :
> >> > attempt_201012091523_0002_m_00_0, Status : FAILED Error: Could not
> >> load
> >> > native library attempt_201012091523_0002_m_00_0: [GC
> >> > 7402K->1594K(12096K), 0.0045650 secs]
> >> attempt_201012091523_0002_m_00_0:
> >> > [GC 108666K->104610K(116800K), 0.0106860 secs]
> >> > attempt_201012091523_0002_m_00_0: [Full GC
> 104610K->104276K(129856K),
> >> > 0.0482530 secs] attempt_201012091523_0002_m_00_0: Error while
> loading
> >> > native library with base name "JCudaDriver"
> >> > attempt_201012091523_0002_m_00_0: Operating system name: Linux
> >> > attempt_201012091523_0002_m_00_0: Architecture : amd64
> >> > attempt_201012091523_0002_m_00_0: Architecture bit size: 64
> 10/12/09
> >> > 15:42:00 INFO mapred.JobClient: Task Id :
> >> > attempt_201012091523_0002_m_00_1, Status : FAILED Error: Could not
> >> load
> >> > native library attempt_201012091523_0002_m_00_1: [GC
> >> > 7373K->1573K(18368K), 0.0045230 secs]
> >> attempt_201012091523_0002_m_00_1:
> >> > Error while loading native library with base name "JCudaDriver"
> >> > attempt_201012091523_0002_m_00_1: Operating system name: Linux
> >> > attempt_201012091523_0002_m_00_1: Architecture : amd64
> >> > attempt_201012091523_0002_m_00_1: Architecture bit size: 64 It
> looks
> >> > like The jcuda library file can not be sucessfully loaded. Actually, I
> >> tried
> >> > many combinations. 1) I include all the jcuda library files in my
> >> > TaskTracker's classpath, and also include jcuda library file into my
> >> > mapreduce program. 2) TaskTracker's classpath w/o jcuda library, but
> my
> >> > program contains them 3) TaskTracker's classpath w/ jcuda librara, but
> my
> >> > program w/o them All of them report the same error above.
> >>
> >
>


Re: Jcuda on Hadoop

2010-12-09 Thread He Chen
Thank you so much, Mathias Herberts

This really helps

On Thu, Dec 9, 2010 at 4:17 PM, He Chen  wrote:

> I am still in the test stage. I mean I only start one JobTracker and one
> TaskTracker.
> I copied all jcuda.*.jar into  HADOOP_HOME/lib/
>
> from the ps aux|grep java
>
> I can confirm the JT and TT processes all contain the jcuda.*.jar files
>
>
> On Thu, Dec 9, 2010 at 4:05 PM, Mathias Herberts <
> mathias.herbe...@gmail.com> wrote:
>
>> You need to have the native libs on all tasktrackers and have
>> java.library.path correctly set.
>> On Dec 9, 2010 11:01 PM, "He Chen"  wrote:
>> > Hello everyone, I 've got a problem when I write some Jcuda program
>> based
>> on
>> > Hadoop MapReduce. I use the jcudaUtill. The KernelLauncherSample can be
>> > successfully executed on my worker node. However, When I submit a
>> program
>> > containing jcuda to Hadoop MapReduce. I got following errors. Any reply
>> will
>> > be appreciated! 10/12/09 15:41:39 INFO mapred.JobClient: Running job:
>> > job_201012091523_0002 10/12/09 15:41:40 INFO mapred.JobClient: map 0%
>> reduce
>> > 0% 10/12/09 15:41:53 INFO mapred.JobClient: Task Id :
>> > attempt_201012091523_0002_m_00_0, Status : FAILED Error: Could not
>> load
>> > native library attempt_201012091523_0002_m_00_0: [GC
>> > 7402K->1594K(12096K), 0.0045650 secs]
>> attempt_201012091523_0002_m_00_0:
>> > [GC 108666K->104610K(116800K), 0.0106860 secs]
>> > attempt_201012091523_0002_m_00_0: [Full GC
>> 104610K->104276K(129856K),
>> > 0.0482530 secs] attempt_201012091523_0002_m_00_0: Error while
>> loading
>> > native library with base name "JCudaDriver"
>> > attempt_201012091523_0002_m_00_0: Operating system name: Linux
>> > attempt_201012091523_0002_m_00_0: Architecture : amd64
>> > attempt_201012091523_0002_m_00_0: Architecture bit size: 64 10/12/09
>> > 15:42:00 INFO mapred.JobClient: Task Id :
>> > attempt_201012091523_0002_m_00_1, Status : FAILED Error: Could not
>> load
>> > native library attempt_201012091523_0002_m_00_1: [GC
>> > 7373K->1573K(18368K), 0.0045230 secs]
>> attempt_201012091523_0002_m_00_1:
>> > Error while loading native library with base name "JCudaDriver"
>> > attempt_201012091523_0002_m_00_1: Operating system name: Linux
>> > attempt_201012091523_0002_m_00_1: Architecture : amd64
>> > attempt_201012091523_0002_m_00_1: Architecture bit size: 64 It looks
>> > like The jcuda library file can not be sucessfully loaded. Actually, I
>> tried
>> > many combinations. 1) I include all the jcuda library files in my
>> > TaskTracker's classpath, and also include jcuda library file into my
>> > mapreduce program. 2) TaskTracker's classpath w/o jcuda library, but my
>> > program contains them 3) TaskTracker's classpath w/ jcuda librara, but
>> my
>> > program w/o them All of them report the same error above.
>>
>
>


Re: Jcuda on Hadoop

2010-12-09 Thread He Chen
I am still in the test stage. I mean I only start one JobTracker and one
TaskTracker.
I copied all jcuda.*.jar into  HADOOP_HOME/lib/

from the ps aux|grep java

I can confirm the JT and TT processes all contain the jcuda.*.jar files

On Thu, Dec 9, 2010 at 4:05 PM, Mathias Herberts  wrote:

> You need to have the native libs on all tasktrackers and have
> java.library.path correctly set.
> On Dec 9, 2010 11:01 PM, "He Chen"  wrote:
> > Hello everyone, I 've got a problem when I write some Jcuda program based
> on
> > Hadoop MapReduce. I use the jcudaUtill. The KernelLauncherSample can be
> > successfully executed on my worker node. However, When I submit a program
> > containing jcuda to Hadoop MapReduce. I got following errors. Any reply
> will
> > be appreciated! 10/12/09 15:41:39 INFO mapred.JobClient: Running job:
> > job_201012091523_0002 10/12/09 15:41:40 INFO mapred.JobClient: map 0%
> reduce
> > 0% 10/12/09 15:41:53 INFO mapred.JobClient: Task Id :
> > attempt_201012091523_0002_m_00_0, Status : FAILED Error: Could not
> load
> > native library attempt_201012091523_0002_m_00_0: [GC
> > 7402K->1594K(12096K), 0.0045650 secs]
> attempt_201012091523_0002_m_00_0:
> > [GC 108666K->104610K(116800K), 0.0106860 secs]
> > attempt_201012091523_0002_m_00_0: [Full GC 104610K->104276K(129856K),
> > 0.0482530 secs] attempt_201012091523_0002_m_00_0: Error while loading
> > native library with base name "JCudaDriver"
> > attempt_201012091523_0002_m_00_0: Operating system name: Linux
> > attempt_201012091523_0002_m_00_0: Architecture : amd64
> > attempt_201012091523_0002_m_00_0: Architecture bit size: 64 10/12/09
> > 15:42:00 INFO mapred.JobClient: Task Id :
> > attempt_201012091523_0002_m_00_1, Status : FAILED Error: Could not
> load
> > native library attempt_201012091523_0002_m_00_1: [GC
> > 7373K->1573K(18368K), 0.0045230 secs]
> attempt_201012091523_0002_m_00_1:
> > Error while loading native library with base name "JCudaDriver"
> > attempt_201012091523_0002_m_00_1: Operating system name: Linux
> > attempt_201012091523_0002_m_00_1: Architecture : amd64
> > attempt_201012091523_0002_m_00_1: Architecture bit size: 64 It looks
> > like The jcuda library file can not be sucessfully loaded. Actually, I
> tried
> > many combinations. 1) I include all the jcuda library files in my
> > TaskTracker's classpath, and also include jcuda library file into my
> > mapreduce program. 2) TaskTracker's classpath w/o jcuda library, but my
> > program contains them 3) TaskTracker's classpath w/ jcuda librara, but my
> > program w/o them All of them report the same error above.
>


Jcuda on Hadoop

2010-12-09 Thread He Chen
Hello everyone, I 've got a problem when I write some Jcuda program based on
Hadoop MapReduce. I use the jcudaUtill. The KernelLauncherSample can be
successfully executed on my worker node. However, When I submit a program
containing jcuda to Hadoop MapReduce. I got following errors. Any reply will
be appreciated! 10/12/09 15:41:39 INFO mapred.JobClient: Running job:
job_201012091523_0002 10/12/09 15:41:40 INFO mapred.JobClient: map 0% reduce
0% 10/12/09 15:41:53 INFO mapred.JobClient: Task Id :
attempt_201012091523_0002_m_00_0, Status : FAILED Error: Could not load
native library attempt_201012091523_0002_m_00_0: [GC
7402K->1594K(12096K), 0.0045650 secs] attempt_201012091523_0002_m_00_0:
[GC 108666K->104610K(116800K), 0.0106860 secs]
attempt_201012091523_0002_m_00_0: [Full GC 104610K->104276K(129856K),
0.0482530 secs] attempt_201012091523_0002_m_00_0: Error while loading
native library with base name "JCudaDriver"
attempt_201012091523_0002_m_00_0: Operating system name: Linux
attempt_201012091523_0002_m_00_0: Architecture : amd64
attempt_201012091523_0002_m_00_0: Architecture bit size: 64 10/12/09
15:42:00 INFO mapred.JobClient: Task Id :
attempt_201012091523_0002_m_00_1, Status : FAILED Error: Could not load
native library attempt_201012091523_0002_m_00_1: [GC
7373K->1573K(18368K), 0.0045230 secs] attempt_201012091523_0002_m_00_1:
Error while loading native library with base name "JCudaDriver"
attempt_201012091523_0002_m_00_1: Operating system name: Linux
attempt_201012091523_0002_m_00_1: Architecture : amd64
attempt_201012091523_0002_m_00_1: Architecture bit size: 64 It looks
like The jcuda library file can not be sucessfully loaded. Actually, I tried
many combinations. 1) I include all the jcuda library files in my
TaskTracker's classpath, and also include jcuda library file into my
mapreduce program. 2) TaskTracker's classpath w/o jcuda library, but my
program contains them 3) TaskTracker's classpath w/ jcuda librara, but my
program w/o them All of them report the same error above.


Re: Two questions.

2010-11-03 Thread He Chen
Both option 1 and 3 will work.

On Wed, Nov 3, 2010 at 9:28 PM, James Seigel  wrote:

> Option 1 = good
>
> Sent from my mobile. Please excuse the typos.
>
> On 2010-11-03, at 8:27 PM, "shangan"  wrote:
>
> > I don't think the first two options can work, even you stop the
> tasktracker these to-be-retired nodes are still connected to the namenode.
> > Option 3 can work.  You only need to add this exclude file on the
> namenode, and it is an regular file. Add a key named dfs.hosts.exclude to
> your conf/hadoop-site.xml file,The value associated with this key provides
> the full path to a file on the NameNode's local file system which contains a
> list of machines which are not permitted to connect to HDFS.
> >
> > Then you can run the command bin/hadoop dfsadmin -refreshNodes, then the
> cluster will decommission the nodes in the exclude file.This might take a
> period of time as the cluster need to move data from those retired nodes to
> left nodes.
> >
> > After this you can use these retired nodes as a new cluster.But remember
> to remove those nodes from the slave nodes file and you can delete the
> exclude file afterward.
> >
> >
> > 2010-11-04
> >
> >
> >
> > shangan
> >
> >
> >
> >
> > 发件人: Raj V
> > 发送时间: 2010-11-04  10:05:44
> > 收件人: common-user
> > 抄送:
> > 主题: Two questions.
> >
> > 1. I have a 512 node cluster. I need to have 32 nodes do something else.
> They
> > can be datanodes but I cannot run any map or reduce jobs on them. So I
> see three
> > options.
> > 1. Stop the tasktracker on those nodes. leave the datanode running.
> > 2. Set  mapred.tasktracker.reduce.tasks.maximum and
> > mapred.tasktracker.map.tasks.maximum to 0 on these nodes and make these
> final.
> > 3. Use the parameter mapred.hosts.exclude.
> > I am assuming that any of the three methods would work.  To start with, I
> went
> > with option 3. I used a local file /home/hadoop/myjob.exclude and the
> file
> > myjob.exclude had the hostname of one host per line ( hadoop-480 ..
> hadoop-511.
> > But I see both map and reduce jobs being scheduled to all the 511 nodes.
> > I understand there is an inherent inefficieny by running only the data
> node on
> > these 32 nodess.
> > Here are my questions.
> > 1. Will all three methods work?
> > 2. If I choose method 3, does this file exist as a dfs file or a regular
> file.
> > If regular file , does it need to exist on all the nodes or only the node
> where
> > teh job is submitted?
> > Many thanks in advance/
> > Raj
> > __ Information from ESET NOD32 Antivirus, version of virus
> signature database 5574 (20101029) __
> > The message was checked by ESET NOD32 Antivirus.
> > http://www.eset.com
>


Re: Can not upload local file to HDFS

2010-09-28 Thread He Chen
I found the problem. It is because the system disk error. Then the whole "/"
directory became read-only. When I copyFromLocal, it will use local /tmp
directory as buffer. However, Hadoop does not know it is read-only. That is
why it reported datanode problem.

On Mon, Sep 27, 2010 at 10:34 AM, He Chen  wrote:

> Thanks, but I think you goes too far to focus on the problem itself.
>
>
> On Sun, Sep 26, 2010 at 11:43 AM, Nan Zhu  wrote:
>
>> Have you ever check the log file in the directory?
>>
>> I always find some important information there,
>>
>> I suggest you to recompile hadoop with ant since mapred daemons also don't
>> work
>>
>> Nan
>>
>> On Sun, Sep 26, 2010 at 7:29 PM, He Chen  wrote:
>>
>> > The problem is every datanode may be listed in the error report. That
>> means
>> > all my datanodes are bad?
>> >
>> > One thing I forgot to mention. I can not use start-all.sh and
>> stop-all.sh
>> > to
>> > start and stop all dfs and mapred processes on my clusters. But the
>> > jobtracker and namenode web interface still work.
>> >
>> > I think I can solve this problem by ssh to every node and kill current
>> > hadoop processes and restart them again. The previous problem will also
>> be
>> > solved( it's my opinion). But I really want to know why the HDFS reports
>> me
>> > previous errors.
>> >
>> >
>> > On Sat, Sep 25, 2010 at 11:20 PM, Nan Zhu  wrote:
>> >
>> > > Hi Chen,
>> > >
>> > > It seems that you have a bad datanode? maybe you should reformat them?
>> > >
>> > > Nan
>> > >
>> > > On Sun, Sep 26, 2010 at 10:42 AM, He Chen  wrote:
>> > >
>> > > > Hello Neil
>> > > >
>> > > > No matter how big the file is. It always report this to me. The file
>> > size
>> > > > is
>> > > > from 10KB to 100MB.
>> > > >
>> > > > On Sat, Sep 25, 2010 at 6:08 PM, Neil Ghosh 
>> > > wrote:
>> > > >
>> > > > > How Big is the file? Did you try Formatting Name node and
>> Datanode?
>> > > > >
>> > > > > On Sun, Sep 26, 2010 at 2:12 AM, He Chen 
>> wrote:
>> > > > >
>> > > > > > Hello everyone
>> > > > > >
>> > > > > > I can not load local file to HDFS. It gave the following errors.
>> > > > > >
>> > > > > > WARN hdfs.DFSClient: DFSOutputStream ResponseProcessor exception
>> >  for
>> > > > > block
>> > > > > > blk_-236192853234282209_419415java.io.EOFException
>> > > > > >at
>> > java.io.DataInputStream.readFully(DataInputStream.java:197)
>> > > > > >at
>> > java.io.DataInputStream.readLong(DataInputStream.java:416)
>> > > > > >at
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2397)
>> > > > > > 10/09/25 15:38:25 WARN hdfs.DFSClient: Error Recovery for block
>> > > > > > blk_-236192853234282209_419415 bad datanode[0]
>> 192.168.0.23:50010
>> > > > > > 10/09/25 15:38:25 WARN hdfs.DFSClient: Error Recovery for block
>> > > > > > blk_-236192853234282209_419415 in pipeline 192.168.0.23:50010,
>> > > > > > 192.168.0.39:50010: bad datanode 192.168.0.23:50010
>> > > > > > Any response will be appreciated!
>> > > > > >
>> > > > > >
>> > >
>> >
>>
>
>
>
> --
> Best Wishes!
> 顺送商祺!
>
> --
> Chen He
> (402)613-9298
> PhD. student of CSE Dept.
> Research Assistant of Holland Computing Center
> University of Nebraska-Lincoln
> Lincoln NE 68588
>


Re: Can not upload local file to HDFS

2010-09-27 Thread He Chen
Thanks, but I think you goes too far to focus on the problem itself.

On Sun, Sep 26, 2010 at 11:43 AM, Nan Zhu  wrote:

> Have you ever check the log file in the directory?
>
> I always find some important information there,
>
> I suggest you to recompile hadoop with ant since mapred daemons also don't
> work
>
> Nan
>
> On Sun, Sep 26, 2010 at 7:29 PM, He Chen  wrote:
>
> > The problem is every datanode may be listed in the error report. That
> means
> > all my datanodes are bad?
> >
> > One thing I forgot to mention. I can not use start-all.sh and stop-all.sh
> > to
> > start and stop all dfs and mapred processes on my clusters. But the
> > jobtracker and namenode web interface still work.
> >
> > I think I can solve this problem by ssh to every node and kill current
> > hadoop processes and restart them again. The previous problem will also
> be
> > solved( it's my opinion). But I really want to know why the HDFS reports
> me
> > previous errors.
> >
> >
> > On Sat, Sep 25, 2010 at 11:20 PM, Nan Zhu  wrote:
> >
> > > Hi Chen,
> > >
> > > It seems that you have a bad datanode? maybe you should reformat them?
> > >
> > > Nan
> > >
> > > On Sun, Sep 26, 2010 at 10:42 AM, He Chen  wrote:
> > >
> > > > Hello Neil
> > > >
> > > > No matter how big the file is. It always report this to me. The file
> > size
> > > > is
> > > > from 10KB to 100MB.
> > > >
> > > > On Sat, Sep 25, 2010 at 6:08 PM, Neil Ghosh 
> > > wrote:
> > > >
> > > > > How Big is the file? Did you try Formatting Name node and Datanode?
> > > > >
> > > > > On Sun, Sep 26, 2010 at 2:12 AM, He Chen 
> wrote:
> > > > >
> > > > > > Hello everyone
> > > > > >
> > > > > > I can not load local file to HDFS. It gave the following errors.
> > > > > >
> > > > > > WARN hdfs.DFSClient: DFSOutputStream ResponseProcessor exception
> >  for
> > > > > block
> > > > > > blk_-236192853234282209_419415java.io.EOFException
> > > > > >at
> > java.io.DataInputStream.readFully(DataInputStream.java:197)
> > > > > >at
> > java.io.DataInputStream.readLong(DataInputStream.java:416)
> > > > > >at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2397)
> > > > > > 10/09/25 15:38:25 WARN hdfs.DFSClient: Error Recovery for block
> > > > > > blk_-236192853234282209_419415 bad datanode[0]
> 192.168.0.23:50010
> > > > > > 10/09/25 15:38:25 WARN hdfs.DFSClient: Error Recovery for block
> > > > > > blk_-236192853234282209_419415 in pipeline 192.168.0.23:50010,
> > > > > > 192.168.0.39:50010: bad datanode 192.168.0.23:50010
> > > > > > Any response will be appreciated!
> > > > > >
> > > > > >
> > >
> >
>



-- 
Best Wishes!
顺送商祺!

--
Chen He
(402)613-9298
PhD. student of CSE Dept.
Research Assistant of Holland Computing Center
University of Nebraska-Lincoln
Lincoln NE 68588


Re: Can not upload local file to HDFS

2010-09-26 Thread He Chen
The problem is every datanode may be listed in the error report. That means
all my datanodes are bad?

One thing I forgot to mention. I can not use start-all.sh and stop-all.sh to
start and stop all dfs and mapred processes on my clusters. But the
jobtracker and namenode web interface still work.

I think I can solve this problem by ssh to every node and kill current
hadoop processes and restart them again. The previous problem will also be
solved( it's my opinion). But I really want to know why the HDFS reports me
previous errors.


On Sat, Sep 25, 2010 at 11:20 PM, Nan Zhu  wrote:

> Hi Chen,
>
> It seems that you have a bad datanode? maybe you should reformat them?
>
> Nan
>
> On Sun, Sep 26, 2010 at 10:42 AM, He Chen  wrote:
>
> > Hello Neil
> >
> > No matter how big the file is. It always report this to me. The file size
> > is
> > from 10KB to 100MB.
> >
> > On Sat, Sep 25, 2010 at 6:08 PM, Neil Ghosh 
> wrote:
> >
> > > How Big is the file? Did you try Formatting Name node and Datanode?
> > >
> > > On Sun, Sep 26, 2010 at 2:12 AM, He Chen  wrote:
> > >
> > > > Hello everyone
> > > >
> > > > I can not load local file to HDFS. It gave the following errors.
> > > >
> > > > WARN hdfs.DFSClient: DFSOutputStream ResponseProcessor exception  for
> > > block
> > > > blk_-236192853234282209_419415java.io.EOFException
> > > >at java.io.DataInputStream.readFully(DataInputStream.java:197)
> > > >at java.io.DataInputStream.readLong(DataInputStream.java:416)
> > > >at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2397)
> > > > 10/09/25 15:38:25 WARN hdfs.DFSClient: Error Recovery for block
> > > > blk_-236192853234282209_419415 bad datanode[0] 192.168.0.23:50010
> > > > 10/09/25 15:38:25 WARN hdfs.DFSClient: Error Recovery for block
> > > > blk_-236192853234282209_419415 in pipeline 192.168.0.23:50010,
> > > > 192.168.0.39:50010: bad datanode 192.168.0.23:50010
> > > > Any response will be appreciated!
> > > >
> > > >
>


Re: Can not upload local file to HDFS

2010-09-25 Thread He Chen
Hello Neil

No matter how big the file is. It always report this to me. The file size is
from 10KB to 100MB.

On Sat, Sep 25, 2010 at 6:08 PM, Neil Ghosh  wrote:

> How Big is the file? Did you try Formatting Name node and Datanode?
>
> On Sun, Sep 26, 2010 at 2:12 AM, He Chen  wrote:
>
> > Hello everyone
> >
> > I can not load local file to HDFS. It gave the following errors.
> >
> > WARN hdfs.DFSClient: DFSOutputStream ResponseProcessor exception  for
> block
> > blk_-236192853234282209_419415java.io.EOFException
> >at java.io.DataInputStream.readFully(DataInputStream.java:197)
> >at java.io.DataInputStream.readLong(DataInputStream.java:416)
> >at
> >
> >
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2397)
> > 10/09/25 15:38:25 WARN hdfs.DFSClient: Error Recovery for block
> > blk_-236192853234282209_419415 bad datanode[0] 192.168.0.23:50010
> > 10/09/25 15:38:25 WARN hdfs.DFSClient: Error Recovery for block
> > blk_-236192853234282209_419415 in pipeline 192.168.0.23:50010,
> > 192.168.0.39:50010: bad datanode 192.168.0.23:50010
> > Any response will be appreciated!
> >
> >
> > --
> > Best Wishes!
> > 顺送商祺!
> >
> > --
> > Chen He
> >
>
>
>
> --
> Thanks and Regards
> Neil
> http://neilghosh.com
>



-- 
Best Wishes!
顺送商祺!

--
Chen He
(402)613-9298
PhD. student of CSE Dept.
Research Assistant of Holland Computing Center
University of Nebraska-Lincoln
Lincoln NE 68588


Can not upload local file to HDFS

2010-09-25 Thread He Chen
Hello everyone

I can not load local file to HDFS. It gave the following errors.

WARN hdfs.DFSClient: DFSOutputStream ResponseProcessor exception  for block
blk_-236192853234282209_419415java.io.EOFException
at java.io.DataInputStream.readFully(DataInputStream.java:197)
at java.io.DataInputStream.readLong(DataInputStream.java:416)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2397)
10/09/25 15:38:25 WARN hdfs.DFSClient: Error Recovery for block
blk_-236192853234282209_419415 bad datanode[0] 192.168.0.23:50010
10/09/25 15:38:25 WARN hdfs.DFSClient: Error Recovery for block
blk_-236192853234282209_419415 in pipeline 192.168.0.23:50010,
192.168.0.39:50010: bad datanode 192.168.0.23:50010
Any response will be appreciated!


-- 
Best Wishes!
顺送商祺!

--
Chen He


Re: Job performance issue: output.collect()

2010-09-01 Thread He Chen
Hey Oded Rosen

I am not sure what is the functionality of your map() method. Intuitively,
move the map() method computation to the reduce() method if your map()
output is problematic. I mean just let the map() method act as a data input
reader and divider and let reduce() method do all you computation. In this
way, your intermediate results are less than before. Shuffle time can also
be reduced.

If the computation is still slow, I think it may not be the MapReduce
framework problem, but your programs. Hope this helps.


Chen

On Wed, Sep 1, 2010 at 7:18 AM, Oded Rosen  wrote:

> Hi all,
>
> My job (written in old 0.18 api, but that's not the issue here) is
> producing
> large amounts of map output.
> Each map() call generates about ~20 output.collects, and each output is
> pretty big (~1K) => each map() produces about 20K.
> All of this data is fed to a combiner that really reduces the output's size
> + amounts.
> the job input is not so big: there are about 120M map input records.
>
> This job is pretty slow. Other jobs that work on the same input are much
> faster, since they do not produce so much output.
> Analyzing the job performance (timing the map() function parts), I've seen
> that much time is spent on the output.collect() line itself.
>
> I know that during the output.collect() command the output is being written
> to local filesystem spills (when the spill buffer reaches a 80% limit),
> so I guessed that reducing the size of each output will improve
> performance.
> This was not the case - after cutting 30% of the map output size, the job
> took the same amount of time. The thing that I cannot reduce is the amount
> of output lines being written out of the map.
>
> I would like to know what happens in the output.collect line that takes
> lots
> of time, in order to cut down this job's running time.
> Please keep in mind that I have a combiner, and to my understanding
> different things happen to the map output when a combiner is present.
>
> Can anyone help me understand how can I save this precious time?
> Thanks,
>
> --
> Oded
>


Re: Best way to reduce a 8-node cluster in half and get hdfs to come out of safe mode

2010-08-06 Thread He Chen
Way#3

1) bring up all 8 dn and the nn
2) retire one of your 4 nodes:
   kill the datanode process
   hadoop dfsadmin -refreshNodes  (this should be done on nn)
3) do 2) extra three times

On Fri, Aug 6, 2010 at 1:21 AM, Allen Wittenauer
wrote:

>
> On Aug 5, 2010, at 10:42 PM, Steve Kuo wrote:
>
> > As part of our experimentation, the plan is to pull 4 slave nodes out of
> a
> > 8-slave/1-master cluster. With replication factor set to 3, I thought
> > losing half of the cluster may be too much for hdfs to recover.  Thus I
> > copied out all relevant data from hdfs to local disk and reconfigure the
> > cluster.
>
> It depends.  If you have configured Hadoop to have a topology such that the
> 8 nodes were in 2 logical racks, then it would have worked just fine.  If
> you didn't have any topology configured, then each node is considered its
> own rack.  So pulling half of the grid down means you are likely losing a
> good chunk of all your blocks.
>
>
>
>
> >
> > The 4 slave nodes started okay but hdfs never left safe mode.  The nn.log
> > has the following line.  What is the best way to deal with this?  Shall I
> > restart the cluster with 8-node and then delete
> > /data/hadoop-hadoop/mapred/system?  Or shall I reformat hdfs?
>
> Two ways to go:
>
> Way #1:
>
> 1) configure dfs.hosts
> 2) bring up all 8 nodes
> 3) configure dfs.hosts.exclude to include the 4 you don't want
> 4) dfsadmin -refreshNodes to start decommissioning the 4 you don't want
>
> Way #2:
>
> 1) configure a topology
> 2) bring up all 8 nodes
> 3) setrep all files +1
> 4) wait for nn to finish replication
> 5) pull 4 nodes
> 6) bring down nn
> 7) remove topology
> 8) bring nn up
> 9) setrep -1
>
>
>
>


-- 
Best Wishes!
顺送商祺!

--
Chen He
(402)613-9298
PhD. student of CSE Dept.
Research Assistant of Holland Computing Center
University of Nebraska-Lincoln
Lincoln NE 68588


Re: hadoop on unstable nodes

2010-08-03 Thread He Chen
Condor has a hadoop subproject in UW-Madison, and there are also some
scientists from VT. They worked on security Hadoop MapReduce on Internet.

In my opinion, Alex is correct, Hadoop MR is communication intensive
especially in the map and shuffle stage. In the map stage, every mapper
needs input data from File System. If your data distributed among Internet,
you may encounter heavy delay. Also in the shuffle stage, reducer collect
mapper's intermediate results through Internet. This is another bottleneck
we can not overlook.

Hope this will help.

Chen

On Tue, Aug 3, 2010 at 11:37 AM, Alex Loddengaard  wrote:

> I don't know of any research, but such a scenario is likely not going to
> turn out so well.  Hadoop is very network hungry and is designed to be run
> in a datacenter.  Sorry I don't have more information for you.
>
> Alex
>
> On Mon, Aug 2, 2010 at 9:14 PM, Rahul.V.  >wrote:
>
> > Hi,
> > Is there any research currently going on where map reduce is applied to
> > nodes in normal internet scenarios?.In environments where network
> bandwidth
> > is at premium what are the tweaks applied to hadoop?
> > I would be very thankful if you can post me links in this direction.
> >
> > --
> > Regards,
> > R.V.
> >
>


Re: Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.

2010-07-27 Thread He Chen
Hey Deepak Diwakar

Try to keep the /etc/hosts file as the same among all your cluster nodes.
See whether this problem will disappear.

On Tue, Jul 27, 2010 at 2:31 PM, Deepak Diwakar  wrote:

> Hey friends,
>
> I got stuck on setting up hdfs cluster and getting this error while running
> simple wordcount example(I did that 2 yrs back not had any problem).
>
> Currently testing over hadoop-0.20.1 with 2 nodes. instruction followed
> from
> (
>
> http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29
> ).
>
>  I checked the firewall settings and /etc/hosts there is no issue there.
> Also master and slave are accessible both ways.
>
> Also the input size very low ~ 3 MB  and hence there shouldn't be no issue
> because ulimit(its btw of 4096).
>
> Would be really thankful  if  anyone can guide me to resolve this.
>
> Thanks & regards,
> - Deepak Diwakar,
>
>
>
>
> On 28 June 2010 18:39, bmdevelopment  wrote:
>
> > Hi, Sorry for the cross-post. But just trying to see if anyone else
> > has had this issue before.
> > Thanks
> >
> >
> > -- Forwarded message --
> > From: bmdevelopment 
> > Date: Fri, Jun 25, 2010 at 10:56 AM
> > Subject: Re: Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES;
> > bailing-out.
> > To: mapreduce-u...@hadoop.apache.org
> >
> >
> > Hello,
> > Thanks so much for the reply.
> > See inline.
> >
> > On Fri, Jun 25, 2010 at 12:40 AM, Hemanth Yamijala 
> > wrote:
> > > Hi,
> > >
> > >> I've been getting the following error when trying to run a very simple
> > >> MapReduce job.
> > >> Map finishes without problem, but error occurs as soon as it enters
> > >> Reduce phase.
> > >>
> > >> 10/06/24 18:41:00 INFO mapred.JobClient: Task Id :
> > >> attempt_201006241812_0001_r_00_0, Status : FAILED
> > >> Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
> > >>
> > >> I am running a 5 node cluster and I believe I have all my settings
> > correct:
> > >>
> > >> * ulimit -n 32768
> > >> * DNS/RDNS configured properly
> > >> * hdfs-site.xml : http://pastebin.com/xuZ17bPM
> > >> * mapred-site.xml : http://pastebin.com/JraVQZcW
> > >>
> > >> The program is very simple - just counts a unique string in a log
> file.
> > >> See here: http://pastebin.com/5uRG3SFL
> > >>
> > >> When I run, the job fails and I get the following output.
> > >> http://pastebin.com/AhW6StEb
> > >>
> > >> However, runs fine when I do *not* use substring() on the value (see
> > >> map function in code above).
> > >>
> > >> This runs fine and completes successfully:
> > >>String str = val.toString();
> > >>
> > >> This causes error and fails:
> > >>String str = val.toString().substring(0,10);
> > >>
> > >> Please let me know if you need any further information.
> > >> It would be greatly appreciated if anyone could shed some light on
> this
> > problem.
> > >
> > > It catches attention that changing the code to use a substring is
> > > causing a difference. Assuming it is consistent and not a red herring,
> >
> > Yes, this has been consistent over the last week. I was running 0.20.1
> > first and then
> > upgrade to 0.20.2 but results have been exactly the same.
> >
> > > can you look at the counters for the two jobs using the JobTracker web
> > > UI - things like map records, bytes etc and see if there is a
> > > noticeable difference ?
> >
> > Ok, so here is the first job using write.set(value.toString()); having
> > *no* errors:
> > http://pastebin.com/xvy0iGwL
> >
> > And here is the second job using
> > write.set(value.toString().substring(0, 10)); that fails:
> > http://pastebin.com/uGw6yNqv
> >
> > And here is even another where I used a longer, and therefore unique
> > string,
> > by write.set(value.toString().substring(0, 20)); This makes every line
> > unique, similar to first job.
> > Still fails.
> > http://pastebin.com/GdQ1rp8i
> >
> > >Also, are the two programs being run against
> > > the exact same input data ?
> >
> > Yes, exactly the same input: a single csv file with 23K lines.
> > Using a shorter string leads to more like keys and therefore more
> > combining/reducing, but going
> > by the above it seems to fail whether the substring/key is entirely
> > unique (23000 combine output records) or
> > mostly the same (9 combine output records).
> >
> > >
> > > Also, since the cluster size is small, you could also look at the
> > > tasktracker logs on the machines where the maps have run to see if
> > > there are any failures when the reduce attempts start failing.
> >
> > Here is the TT log from the last failed job. I do not see anything
> > besides the shuffle failure, but there
> > may be something I am overlooking or simply do not understand.
> > http://pastebin.com/DKFTyGXg
> >
> > Thanks again!
> >
> > >
> > > Thanks
> > > Hemanth
> > >
> >
>



-- 
Best Wishes!
顺送商祺!

--
Chen He
(402)613-9298
PhD. student of CSE Dept.
Research Assistant of Holland Computing Center
University of Nebraska-Lincoln
Lincoln NE 68588


Re: hybrid map/reducer scheduler?

2010-06-28 Thread He Chen
You can write your own one based on them. They are open source.

On Mon, Jun 28, 2010 at 6:13 PM, jiang licht  wrote:

> In addition to default FIFO scheduler, there are fair scheduler and
> capacity scheduler. In some sense, fair scheduler can be considered a
> user-based scheduling while capacity scheduler does a queue-based
> scheduling. Is there or will there be a hybrid scheduler that combines the
> good parts of the two (or a capacity scheduler that allows preemption, then
> different users are asked to submit jobs to different queues, in this way
> implicitly follow user-based scheduling as well, more or less)?
>
> Thanks,
>
> --Michael
>
>
>


Re: Shuffle error

2010-05-24 Thread He Chen
problem solved. This is caused by the inconsistency of the /etc/hosts file.

2010/5/24 He Chen 

> Hey, every one
>
> I have a problem when I run hadoop programs. Some of my worker nodes always
> report following "Shuffle Error".
>
> Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.Shuffle Error: 
> Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.Shuffle Error: Exceeded 
> MAX_FAILED_UNIQUE_FETCHES; bailing-out.Shuffle Error: Exceeded 
> MAX_FAILED_UNIQUE_FETCHES; bailing-out.Shuffle Error: Exceeded 
> MAX_FAILED_UNIQUE_FETCHES; bailing-out.Shuffle Error: Exceeded 
> MAX_FAILED_UNIQUE_FETCHES; bailing-out.Shuffle Error: Exceeded 
> MAX_FAILED_UNIQUE_FETCHES; bailing-out.Shuffle Error: Exceeded 
> MAX_FAILED_UNIQUE_FETCHES; bailing-out.Shuffle Error: Exceeded 
> MAX_FAILED_UNIQUE_FETCHES; bailing-out.Shuffle Error: Exceeded 
> MAX_FAILED_UNIQUE_FETCHES; bailing-out.Shuffle Error: Exceeded 
> MAX_FAILED_UNIQUE_FETCHES; bailing-out.Shuffle Error: Exceeded 
> MAX_FAILED_UNIQUE_FETCHES; bailing-out.Shuffle Error: Exceeded 
> MAX_FAILED_UNIQUE_FETCHES; bailing-out.Shuffle Error: Exceeded 
> MAX_FAILED_UNIQUE_FETCHES; bailing-out.Shuffle Error: Exceeded 
> MAX_FAILED_UNIQUE_FETCHES; bailing-out.Shuffle Error: Exceeded 
> MAX_FAILED_UNIQUE_FETCHES; bailing-out.Shuffle Error: Exceeded 
> MAX_FAILED_UNIQUE_FETCHES; bailing-out.Shuffle Error: Exceeded 
> MAX_FAILED_UNIQUE_FETCHES; bailing-out.Shuffle Error: Exceeded 
> MAX_FAILED_UNIQUE_FETCHES; bailing-out.
>
> I tried to find out whether this is caused by certain nodes' hardware. For
> example, memory, hard drive. But it looks like any of the worker nodes
> incline to have this error. I am using 0.20.1.
>
> Any suggestion will be appreciated!
>
> --
> Best Wishes!
> 顺送商祺!
>
> --
> Chen He
> (402)613-9298
> PhD. student of CSE Dept.
> Holland Computing Center
> University of Nebraska-Lincoln
> Lincoln NE 68588
>



-- 
Best Wishes!
顺送商祺!

--
Chen He
(402)613-9298
PhD. student of CSE Dept.
Holland Computing Center
University of Nebraska-Lincoln
Lincoln NE 68588


Shuffle error

2010-05-24 Thread He Chen
Hey, every one

I have a problem when I run hadoop programs. Some of my worker nodes always
report following "Shuffle Error".

Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.Shuffle
Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.Shuffle Error:
Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.Shuffle Error:
Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.Shuffle Error:
Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.Shuffle Error:
Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.Shuffle Error:
Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.Shuffle Error:
Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.Shuffle Error:
Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.Shuffle Error:
Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.Shuffle Error:
Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.Shuffle Error:
Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.Shuffle Error:
Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.Shuffle Error:
Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.Shuffle Error:
Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.Shuffle Error:
Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.Shuffle Error:
Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.Shuffle Error:
Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.Shuffle Error:
Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.

I tried to find out whether this is caused by certain nodes' hardware. For
example, memory, hard drive. But it looks like any of the worker nodes
incline to have this error. I am using 0.20.1.

Any suggestion will be appreciated!

-- 
Best Wishes!
顺送商祺!

--
Chen He
(402)613-9298
PhD. student of CSE Dept.
Holland Computing Center
University of Nebraska-Lincoln
Lincoln NE 68588


Re: Any possible to set hdfs block size to a value smaller than 64MB?

2010-05-18 Thread He Chen
If you know how to use AspectJ to do aspect oriented programming. You can
write a aspect class. Let it just monitors the whole process of MapReduce

On Tue, May 18, 2010 at 10:00 AM, Patrick Angeles wrote:

> Should be evident in the total job running time... that's the only metric
> that really matters :)
>
> On Tue, May 18, 2010 at 10:39 AM, Pierre ANCELOT  >wrote:
>
> > Thank you,
> > Any way I can measure the startup overhead in terms of time?
> >
> >
> > On Tue, May 18, 2010 at 4:27 PM, Patrick Angeles  > >wrote:
> >
> > > Pierre,
> > >
> > > Adding to what Brian has said (some things are not explicitly mentioned
> > in
> > > the HDFS design doc)...
> > >
> > > - If you have small files that take up < 64MB you do not actually use
> the
> > > entire 64MB block on disk.
> > > - You *do* use up RAM on the NameNode, as each block represents
> meta-data
> > > that needs to be maintained in-memory in the NameNode.
> > > - Hadoop won't perform optimally with very small block sizes. Hadoop
> I/O
> > is
> > > optimized for high sustained throughput per single file/block. There is
> a
> > > penalty for doing too many seeks to get to the beginning of each block.
> > > Additionally, you will have a MapReduce task per small file. Each
> > MapReduce
> > > task has a non-trivial startup overhead.
> > > - The recommendation is to consolidate your small files into large
> files.
> > > One way to do this is via SequenceFiles... put the filename in the
> > > SequenceFile key field, and the file's bytes in the SequenceFile value
> > > field.
> > >
> > > In addition to the HDFS design docs, I recommend reading this blog
> post:
> > > http://www.cloudera.com/blog/2009/02/the-small-files-problem/
> > >
> > > Happy Hadooping,
> > >
> > > - Patrick
> > >
> > > On Tue, May 18, 2010 at 9:11 AM, Pierre ANCELOT 
> > > wrote:
> > >
> > > > Okay, thank you :)
> > > >
> > > >
> > > > On Tue, May 18, 2010 at 2:48 PM, Brian Bockelman <
> bbock...@cse.unl.edu
> > > > >wrote:
> > > >
> > > > >
> > > > > On May 18, 2010, at 7:38 AM, Pierre ANCELOT wrote:
> > > > >
> > > > > > Hi, thanks for this fast answer :)
> > > > > > If so, what do you mean by blocks? If a file has to be splitted,
> it
> > > > will
> > > > > be
> > > > > > splitted when larger than 64MB?
> > > > > >
> > > > >
> > > > > For every 64MB of the file, Hadoop will create a separate block.
>  So,
> > > if
> > > > > you have a 32KB file, there will be one block of 32KB.  If the file
> > is
> > > > 65MB,
> > > > > then it will have one block of 64MB and another block of 1MB.
> > > > >
> > > > > Splitting files is very useful for load-balancing and distributing
> > I/O
> > > > > across multiple nodes.  At 32KB / file, you don't really need to
> > split
> > > > the
> > > > > files at all.
> > > > >
> > > > > I recommend reading the HDFS design document for background issues
> > like
> > > > > this:
> > > > >
> > > > > http://hadoop.apache.org/common/docs/r0.20.0/hdfs_design.html
> > > > >
> > > > > Brian
> > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Tue, May 18, 2010 at 2:34 PM, Brian Bockelman <
> > > bbock...@cse.unl.edu
> > > > > >wrote:
> > > > > >
> > > > > >> Hey Pierre,
> > > > > >>
> > > > > >> These are not traditional filesystem blocks - if you save a file
> > > > smaller
> > > > > >> than 64MB, you don't lose 64MB of file space..
> > > > > >>
> > > > > >> Hadoop will use 32KB to store a 32KB file (ok, plus a KB of
> > metadata
> > > > or
> > > > > >> so), not 64MB.
> > > > > >>
> > > > > >> Brian
> > > > > >>
> > > > > >> On May 18, 2010, at 7:06 AM, Pierre ANCELOT wrote:
> > > > > >>
> > > > > >>> Hi,
> > > > > >>> I'm porting a legacy application to hadoop and it uses a bunch
> of
> > > > small
> > > > > >>> files.
> > > > > >>> I'm aware that having such small files ain't a good idea but
> I'm
> > > not
> > > > > >> doing
> > > > > >>> the technical decisions and the port has to be done for
> > > yesterday...
> > > > > >>> Of course such small files are a problem, loading 64MB blocks
> for
> > a
> > > > few
> > > > > >>> lines of text is an evident loss.
> > > > > >>> What will happen if I set a smaller, or even way smaller (32kB)
> > > > blocks?
> > > > > >>>
> > > > > >>> Thank you.
> > > > > >>>
> > > > > >>> Pierre ANCELOT.
> > > > > >>
> > > > > >>
> > > > > >
> > > > > >
> > > > > > --
> > > > > > http://www.neko-consulting.com
> > > > > > Ego sum quis ego servo
> > > > > > "Je suis ce que je protège"
> > > > > > "I am what I protect"
> > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > http://www.neko-consulting.com
> > > > Ego sum quis ego servo
> > > > "Je suis ce que je protège"
> > > > "I am what I protect"
> > > >
> > >
> >
> >
> >
> > --
> > http://www.neko-consulting.com
> > Ego sum quis ego servo
> > "Je suis ce que je protège"
> > "I am what I protect"
> >
>



-- 
Best Wishes!
顺送商祺!

--
Chen He
(402)613-9298
PhD. student of CSE Dept.
Holland Computing Center
University of Nebraska-Lincoln
Lincoln NE 68588


Re: Hadoop does not follow my setting

2010-04-22 Thread He Chen
In some extents, for 30GB file, if it is well balanced the overhead imposed
by data locality may not be too much. We will see. I will report my results
to this mail-list.

On Thu, Apr 22, 2010 at 2:44 PM, Allen Wittenauer
wrote:

>
> On Apr 22, 2010, at 11:46 AM, He Chen wrote:
>
> > Yes, but if you have more mappers, you may have more waves to execute. I
> > mean if I have 110 mappers for a job and I only have 22 cores. Then, it
> will
> > execute 5 waves approximately, If I have only 22 mappers, It will save
> the
> > overhead time.
>
> But you'll sacrifice data locality, which means that instead of testing the
> cpu, you'll be testing cpu+network.
>
>
>


-- 
Best Wishes!
顺送商祺!

--
Chen He
(402)613-9298
PhD. student of CSE Dept.
Holland Computing Center
University of Nebraska-Lincoln
Lincoln NE 68588


Re: Hadoop does not follow my setting

2010-04-22 Thread He Chen
Yes, but if you have more mappers, you may have more waves to execute. I
mean if I have 110 mappers for a job and I only have 22 cores. Then, it will
execute 5 waves approximately, If I have only 22 mappers, It will save the
overhead time.

2010/4/22 Edward Capriolo 

> 2010/4/22 He Chen 
>
> > Hi Raymond Jennings III
> >
> > I use 22 mappers because I have 22 cores in my clusters. Is this what you
> > want?
> >
> > On Thu, Apr 22, 2010 at 11:55 AM, Raymond Jennings III <
> > raymondj...@yahoo.com> wrote:
> >
> > > Isn't the number of mappers specified "only a suggestion" ?
> > >
> > > --- On Thu, 4/22/10, He Chen  wrote:
> > >
> > > > From: He Chen 
> > > > Subject: Hadoop does not follow my setting
> > > > To: common-user@hadoop.apache.org
> > > > Date: Thursday, April 22, 2010, 12:50 PM
> > >  > Hi everyone
> > > >
> > > > I am doing a benchmark by using Hadoop 0.20.0's wordcount
> > > > example. I have a
> > > > 30GB file. I plan to test differenct number of mappers'
> > > > performance. For
> > > > example, for a wordcount job, I plan to test 22 mappers, 44
> > > > mappers, 66
> > > > mappers and 110 mappers.
> > > >
> > > > However, I set the "mapred.map.tasks" equals to 22. But
> > > > when I ran the job,
> > > > it shows 436 mappers total.
> > > >
> > > > I think maybe the wordcount set its parameters inside the
> > > > its own program. I
> > > > give "-Dmapred.map.tasks=22" to this program. But it is
> > > > still 436 again in
> > > > my another try.  I found out that 30GB divide by 436
> > > > is just 64MB, it is
> > > > just my block size.
> > > >
> > > > Any suggestions will be appreciated.
> > > >
> > > > Thank you in advance!
> > > >
> > > > --
> > > > Best Wishes!
> > > > 顺送商祺!
> > > >
> > > > --
> > > > Chen He
> > > > (402)613-9298
> > > > PhD. student of CSE Dept.
> > > > Holland Computing Center
> > > > University of Nebraska-Lincoln
> > > > Lincoln NE 68588
> > > >
> > >
> > >
> > >
> > >
> >
> >
> > --
> > Best Wishes!
> > 顺送商祺!
> >
> > --
> > Chen He
> > (402)613-9298
> > PhD. student of CSE Dept.
> > Holland Computing Center
> > University of Nebraska-Lincoln
> > Lincoln NE 68588
> >
>
> No matter how many total mappers exist for the job only a certain number of
> them run at once.
>



-- 
Best Wishes!
顺送商祺!

--
Chen He
(402)613-9298
PhD. student of CSE Dept.
Holland Computing Center
University of Nebraska-Lincoln
Lincoln NE 68588


Re: Hadoop does not follow my setting

2010-04-22 Thread He Chen
Hi Raymond Jennings III

I use 22 mappers because I have 22 cores in my clusters. Is this what you
want?

On Thu, Apr 22, 2010 at 11:55 AM, Raymond Jennings III <
raymondj...@yahoo.com> wrote:

> Isn't the number of mappers specified "only a suggestion" ?
>
> --- On Thu, 4/22/10, He Chen  wrote:
>
> > From: He Chen 
> > Subject: Hadoop does not follow my setting
> > To: common-user@hadoop.apache.org
> > Date: Thursday, April 22, 2010, 12:50 PM
>  > Hi everyone
> >
> > I am doing a benchmark by using Hadoop 0.20.0's wordcount
> > example. I have a
> > 30GB file. I plan to test differenct number of mappers'
> > performance. For
> > example, for a wordcount job, I plan to test 22 mappers, 44
> > mappers, 66
> > mappers and 110 mappers.
> >
> > However, I set the "mapred.map.tasks" equals to 22. But
> > when I ran the job,
> > it shows 436 mappers total.
> >
> > I think maybe the wordcount set its parameters inside the
> > its own program. I
> > give "-Dmapred.map.tasks=22" to this program. But it is
> > still 436 again in
> > my another try.  I found out that 30GB divide by 436
> > is just 64MB, it is
> > just my block size.
> >
> > Any suggestions will be appreciated.
> >
> > Thank you in advance!
> >
> > --
> > Best Wishes!
> > 顺送商祺!
> >
> > --
> > Chen He
> > (402)613-9298
> > PhD. student of CSE Dept.
> > Holland Computing Center
> > University of Nebraska-Lincoln
> > Lincoln NE 68588
> >
>
>
>
>


-- 
Best Wishes!
顺送商祺!

--
Chen He
(402)613-9298
PhD. student of CSE Dept.
Holland Computing Center
University of Nebraska-Lincoln
Lincoln NE 68588


Re: Hadoop does not follow my setting

2010-04-22 Thread He Chen
Hey Eric Sammer

Thank you for the reply. Actually, I only care about the number of mappers
in my circumstance. Looks like, I should write the wordcount program with my
own InputFormat class.

2010/4/22 Eric Sammer 

> This is normal and expected. The mapred.map.tasks parameter is only a
> hint. The InputFormat gets to decide how to calculate splits.
> FileInputFormat and all subclasses, including TextInputFormat, use a
> few parameters to figure out what the appropriate split size will be
> but under most circumstances, this winds up being the block size. If
> you used fewer map tasks than blocks, you would sacrifice data
> locality which would only hurt performance.
>
> 2010/4/22 He Chen :
>  > Hi everyone
> >
> > I am doing a benchmark by using Hadoop 0.20.0's wordcount example. I have
> a
> > 30GB file. I plan to test differenct number of mappers' performance. For
> > example, for a wordcount job, I plan to test 22 mappers, 44 mappers, 66
> > mappers and 110 mappers.
> >
> > However, I set the "mapred.map.tasks" equals to 22. But when I ran the
> job,
> > it shows 436 mappers total.
> >
> > I think maybe the wordcount set its parameters inside the its own
> program. I
> > give "-Dmapred.map.tasks=22" to this program. But it is still 436 again
> in
> > my another try.  I found out that 30GB divide by 436 is just 64MB, it is
> > just my block size.
> >
> > Any suggestions will be appreciated.
> >
> > Thank you in advance!
> >
> > --
> > Best Wishes!
> > 顺送商祺!
> >
> > --
> > Chen He
> > (402)613-9298
> > PhD. student of CSE Dept.
> > Holland Computing Center
> > University of Nebraska-Lincoln
> > Lincoln NE 68588
> >
>
>
>
> --
> Eric Sammer
> phone: +1-917-287-2675
> twitter: esammer
> data: www.cloudera.com
>


Hadoop does not follow my setting

2010-04-22 Thread He Chen
Hi everyone

I am doing a benchmark by using Hadoop 0.20.0's wordcount example. I have a
30GB file. I plan to test differenct number of mappers' performance. For
example, for a wordcount job, I plan to test 22 mappers, 44 mappers, 66
mappers and 110 mappers.

However, I set the "mapred.map.tasks" equals to 22. But when I ran the job,
it shows 436 mappers total.

I think maybe the wordcount set its parameters inside the its own program. I
give "-Dmapred.map.tasks=22" to this program. But it is still 436 again in
my another try.  I found out that 30GB divide by 436 is just 64MB, it is
just my block size.

Any suggestions will be appreciated.

Thank you in advance!

-- 
Best Wishes!
顺送商祺!

--
Chen He
(402)613-9298
PhD. student of CSE Dept.
Holland Computing Center
University of Nebraska-Lincoln
Lincoln NE 68588


Re: Reducer-side join example

2010-04-05 Thread He Chen
For the Map function:
Input key: default
input value: File A and File B lines

output key: A, B, C,(first colomn of the final result)
output value: 12, 24, Car, 13, Van, SUV...

Reduce function:
take the Map output and do:
for each key
{   if the value of a key is integer
then same it to array1;
   else save it to array2
}
for ith element in array1
  for jth element in array2
   output(key, array1[i]+"\t"+array2[j]);
done

Hope this helps.


On Mon, Apr 5, 2010 at 4:10 PM, M B  wrote:

> Hi, I need a good java example to get me started with some joining we need
> to do, any examples would be appreciated.
>
> File A:
> Field1  Field2
> A12
> B13
> C22
> A24
>
> File B:
>  Field1  Field2   Field3
> ACar   ...
> BTruck...
> BSUV ...
> BVan  ...
>
> So, we need to first join File A and B on Field1 (say both are string
> fields).  The result would just be:
> A   12   Car   ...
> A   24   Car   ...
> B   13   Truck   ...
> B   13   SUV   ...
>  B   13   Van   ...
> and so on - with all the fields from both files returning.
>
> Once we have that, we sometimes need to then transform it so we have a
> single record per key (Field1):
> A (12,Car) (24,Car)
> B (13,Truck) (13,SUV) (13,Van)
> --however it looks, basically tuples for each key (we'll modify this later
> to return a conatenated set of fields from B, etc)
>
> At other times, instead of transforming to a single row, we just need to
> modify rows based on values.  So if B.Field2 equals "Van", we need to set
> Output.Field2 = whatever then output to file ...
>
> Are there any good examples of this in native java (we can't use
> pig/hive/etc)?
>
> thanks.
>



-- 
Best Wishes!


--
Chen He
  PhD. student of CSE Dept.
Holland Computing Center
University of Nebraska-Lincoln
Lincoln NE 68588


Re: Defining the number of map tasks

2009-12-29 Thread He Chen
in the hadoop-site.xml or hadoop-default.xml file. you can find a parameter:
"mapred.map.tasks". Change it value to 3. At the same time set
"mapred.tasktracker.map.tasks.maximum" to 3 if you use only one tasktracker.

On Wed, Dec 16, 2009 at 3:26 PM, psdc1978  wrote:

> Hi,
>
> I would like to have several Map tasks that execute the same tasks.
> For example, I've 3 map tasks (M1, M2 and M3) and a 1Gb of input data
> to be read by each map. Each map should read the same input data and
> send the result to the same Reduce. At the end, the reduce should
> produce the same 3 results.
>
> Put in conf/slaves file 3 instances of the same machine
>
> 
> localhost
> localhost
> localhost
> 
>
> does it solve the problem?
>
>
> How I define the number of map tasks to run?
>
>
>
> Best regards,
> --
> xeon
>

Chen