Hello,

We are trying to use Hadoop-0.20.203.0rc1 for parallel computation.  Below are 
queries

Assume single node of high configuration machine 8 cores and 8gb memory.

(a) How do we know number of  map tasks spawned?  Can this be controlled? We 
notice only 4 jvms running on a single node - namenode, datanode, jobtracker, 
tasktracker. As we understand depending on number of splits that many map tasks 
are spawned - so we should see that many increase in jvms.

(b) Our mapper class should perform complex computations - it has plenty of 
dependent jars so how do we add all jars in class path  while running 
application? Since we require to perform parallel computations - we need many 
map tasks running in parallel with different data. All are in same machine with 
different jvms.

(c) How does data split happen?  JobClient does not talk about data splits? As 
we understand we create format for distributed file system, start-all.sh and 
then "hadoop fs -put". Do this write data to all datanodes? But we are unable 
to see physical location? How does split happen from this hdfs source?

(d) Can we control number of reduce tasks? Is this seperate jvm?  How are  
optimal numbers for  map and reduce tasks determined?

(e) Any good documentation/links which speaks about namenode, datanode, 
jobtracker and tasktracker.

Kindly help.

Thanks

________________________________________
From: mapreduce-user-h...@hadoop.apache.org 
[mapreduce-user-h...@hadoop.apache.org]
Sent: Thursday, January 05, 2012 10:49 PM
To: Satish Setty (HCL Financial Services)
Subject: WELCOME to mapreduce-user@hadoop.apache.org

Hi! This is the ezmlm program. I'm managing the
mapreduce-user@hadoop.apache.org mailing list.

Acknowledgment: I have added the address

   satish.se...@hcl.com

to the mapreduce-user mailing list.

Welcome to mapreduce-user@hadoop.apache.org!

Please save this message so that you know the address you are
subscribed under, in case you later want to unsubscribe or change your
subscription address.


--- Administrative commands for the mapreduce-user list ---

I can handle administrative requests automatically. Please
do not send them to the list address! Instead, send
your message to the correct command address:

To subscribe to the list, send a message to:
   <mapreduce-user-subscr...@hadoop.apache.org>

To remove your address from the list, send a message to:
   <mapreduce-user-unsubscr...@hadoop.apache.org>

Send mail to the following for info and FAQ for this list:
   <mapreduce-user-i...@hadoop.apache.org>
   <mapreduce-user-...@hadoop.apache.org>

Similar addresses exist for the digest list:
   <mapreduce-user-digest-subscr...@hadoop.apache.org>
   <mapreduce-user-digest-unsubscr...@hadoop.apache.org>

To get messages 123 through 145 (a maximum of 100 per request), mail:
   <mapreduce-user-get.123_...@hadoop.apache.org>

To get an index with subject and author for messages 123-456 , mail:
   <mapreduce-user-index.123_...@hadoop.apache.org>

They are always returned as sets of 100, max 2000 per request,
so you'll actually get 100-499.

To receive all messages with the same subject as message 12345,
send a short message to:
   <mapreduce-user-thread.12...@hadoop.apache.org>

The messages should contain one line or word of text to avoid being
treated as sp@m, but I will ignore their content.
Only the ADDRESS you send to is important.

You can start a subscription for an alternate address,
for example "john@host.domain", just add a hyphen and your
address (with '=' instead of '@') after the command word:
<mapreduce-user-subscribe-john=host.dom...@hadoop.apache.org>

To stop subscription for this address, mail:
<mapreduce-user-unsubscribe-john=host.dom...@hadoop.apache.org>

In both cases, I'll send a confirmation message to that address. When
you receive it, simply reply to it to complete your subscription.

If despite following these instructions, you do not get the
desired results, please contact my owner at
mapreduce-user-ow...@hadoop.apache.org. Please be patient, my owner is a
lot slower than I am ;-)

--- Enclosed is a copy of the request I received.

Return-Path: <satish.se...@hcl.com>
Received: (qmail 88603 invoked by uid 99); 5 Jan 2012 17:19:18 -0000
Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136)
    by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 05 Jan 2012 17:19:18 +0000
X-ASF-Spam-Status: No, hits=-0.0 required=5.0
        tests=SPF_PASS
X-Spam-Check-By: apache.org
Received-SPF: pass (athena.apache.org: domain of satish.se...@hcl.com 
designates 203.105.186.23 as permitted sender)
Received: from [203.105.186.23] (HELO gws07.hcl.com) (203.105.186.23)
    by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 05 Jan 2012 17:19:13 +0000
Received: from chn-hclin-ht01.CORP.HCL.IN (10.249.64.35) by
 CHN-HCLIN-EDGE3.HCL.COM (10.249.64.140) with Microsoft SMTP Server id
 8.2.254.0; Thu, 5 Jan 2012 22:45:40 +0530
Received: from CHN-HCLT-HT04.HCLT.CORP.HCL.IN (10.108.45.37) by
 chn-hclin-ht01.CORP.HCL.IN (10.249.64.35) with Microsoft SMTP Server (TLS) id
 8.2.254.0; Thu, 5 Jan 2012 22:48:48 +0530
Received: from CHN-HCLT-EVS07.HCLT.CORP.HCL.IN ([fe80::3d0d:efa3:3da8:2ae9])
 by CHN-HCLT-HT04.HCLT.CORP.HCL.IN ([::1]) with mapi; Thu, 5 Jan 2012 22:48:47
 +0530
From: "Satish Setty (HCL Financial Services)" <satish.se...@hcl.com>
To:
        
"mapreduce-user-sc.1325782989.apjoeicfclfanpacjgbo-Satish.Setty=hcl....@hadoop.apache.org"
        
<mapreduce-user-sc.1325782989.apjoeicfclfanpacjgbo-Satish.Setty=hcl....@hadoop.apache.org>
Date: Thu, 5 Jan 2012 22:48:15 +0530
Subject: RE: confirm subscribe to mapreduce-user@hadoop.apache.org
Thread-Topic: confirm subscribe to mapreduce-user@hadoop.apache.org
Thread-Index: AczLy+7sWw3//jlYTHOm2lkHBLCB8wAAhPxh
Message-ID: 
<620012c16ac105498bb52ac8fd9745280265386...@chn-hclt-evs07.hclt.corp.hcl.in>
References: <1325782989.49529.ez...@hadoop.apache.org>
In-Reply-To: <1325782989.49529.ez...@hadoop.apache.org>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
acceptlanguage: en-US
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0

thanks

________________________________________
From: mapreduce-user-h...@hadoop.apache.org [mapreduce-user-h...@hadoop.apa=
che.org]
Sent: Thursday, January 05, 2012 10:33 PM
To: Satish Setty (HCL Financial Services)
Subject: confirm subscribe to mapreduce-user@hadoop.apache.org

Hi! This is the ezmlm program. I'm managing the
mapreduce-user@hadoop.apache.org mailing list.

To confirm that you would like

   satish.se...@hcl.com

added to the mapreduce-user mailing list, please send
a short reply to this address:

   mapreduce-user-sc.1325782989.apjoeicfclfanpacjgbo-Satish.Setty=3Dhcl.com=
@hadoop.apache.org

Usually, this happens when you just hit the "reply" button.
If this does not work, simply copy the address and paste it into
the "To:" field of a new message.

This confirmation serves two purposes. First, it verifies that I am able
to get mail through to you. Second, it protects you in case someone
forges a subscription request in your name.

Some mail programs are broken and cannot handle long addresses. If you
cannot reply to this request, instead send a message to
<mapreduce-user-requ...@hadoop.apache.org> and put the
entire address listed above into the "Subject:" line.


--- Administrative commands for the mapreduce-user list ---

I can handle administrative requests automatically. Please
do not send them to the list address! Instead, send
your message to the correct command address:

To subscribe to the list, send a message to:
   <mapreduce-user-subscr...@hadoop.apache.org>

To remove your address from the list, send a message to:
   <mapreduce-user-unsubscr...@hadoop.apache.org>

Send mail to the following for info and FAQ for this list:
   <mapreduce-user-i...@hadoop.apache.org>
   <mapreduce-user-...@hadoop.apache.org>

Similar addresses exist for the digest list:
   <mapreduce-user-digest-subscr...@hadoop.apache.org>
   <mapreduce-user-digest-unsubscr...@hadoop.apache.org>

To get messages 123 through 145 (a maximum of 100 per request), mail:
   <mapreduce-user-get.123_...@hadoop.apache.org>

To get an index with subject and author for messages 123-456 , mail:
   <mapreduce-user-index.123_...@hadoop.apache.org>

They are always returned as sets of 100, max 2000 per request,
so you'll actually get 100-499.

To receive all messages with the same subject as message 12345,
send a short message to:
   <mapreduce-user-thread.12...@hadoop.apache.org>

The messages should contain one line or word of text to avoid being
treated as sp@m, but I will ignore their content.
Only the ADDRESS you send to is important.

You can start a subscription for an alternate address,
for example "john@host.domain", just add a hyphen and your
address (with '=3D' instead of '@') after the command word:
<mapreduce-user-subscribe-john=3dhost.dom...@hadoop.apache.org>

To stop subscription for this address, mail:
<mapreduce-user-unsubscribe-john=3dhost.dom...@hadoop.apache.org>

In both cases, I'll send a confirmation message to that address. When
you receive it, simply reply to it to complete your subscription.

If despite following these instructions, you do not get the
desired results, please contact my owner at
mapreduce-user-ow...@hadoop.apache.org. Please be patient, my owner is a
lot slower than I am ;-)

--- Enclosed is a copy of the request I received.

Return-Path: <satish.se...@hcl.com>
Received: (qmail 49524 invoked by uid 99); 5 Jan 2012 17:03:09 -0000
Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230)
    by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 05 Jan 2012 17:03:09 +000=
0
X-ASF-Spam-Status: No, hits=3D3.7 required=3D10.0
        tests=3DASF_LIST_OPS,HTML_MESSAGE,MIME_HTML_ONLY,SPF_PASS
X-Spam-Check-By: apache.org
Received-SPF: pass (nike.apache.org: domain of satish.se...@hcl.com designa=
tes 203.105.186.23 as permitted sender)
Received: from [203.105.186.23] (HELO gws07.hcl.com) (203.105.186.23)
    by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 05 Jan 2012 17:03:02 +000=
0
Received: from chn-hclin-ht01.CORP.HCL.IN (10.249.64.35) by
 CHN-HCLIN-EDGE3.HCL.COM (10.249.64.140) with Microsoft SMTP Server id
 8.2.254.0; Thu, 5 Jan 2012 22:29:28 +0530
Received: from CHN-HCLT-HT03.HCLT.CORP.HCL.IN (10.108.45.35) by
 chn-hclin-ht01.CORP.HCL.IN (10.249.64.35) with Microsoft SMTP Server (TLS)=
 id
 8.2.254.0; Thu, 5 Jan 2012 22:32:35 +0530
Received: from CHN-HCLT-EVS07.HCLT.CORP.HCL.IN ([fe80::3d0d:efa3:3da8:2ae9]=
)
 by CHN-HCLT-HT03.HCLT.CORP.HCL.IN ([::1]) with mapi; Thu, 5 Jan 2012 22:32=
:34
 +0530
From: "Satish Setty (HCL Financial Services)" <satish.se...@hcl.com>
To: "mapreduce-user-subscr...@hadoop.apache.org"
        <mapreduce-user-subscr...@hadoop.apache.org>
Date: Thu, 5 Jan 2012 22:32:22 +0530
Subject: hadoop
Thread-Topic: hadoop
Thread-Index: AQHMy8vRHiu5SOW+OEqHfqc++oiU+w=3D=3D
Message-ID: <620012c16ac105498bb52ac8fd9745280265386...@chn-hclt-evs07.hclt=
.CORP.HCL.IN>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
acceptlanguage: en-US
Content-Type: text/html; charset=3D"iso-8859-1"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-Virus-Checked: Checked by ClamAV on apache.org

<html dir=3D3D"ltr">
<head>
<meta http-equiv=3D3D"Content-Type" content=3D3D"text/html; charset=3D3Diso=
-8859-=3D
1">
<style title=3D3D"owaParaStyle"><!--P {
        MARGIN-TOP: 0px; MARGIN-BOTTOM: 0px
}
--></style>
<meta content=3D3D"MSHTML 6.00.6000.17063" name=3D3D"GENERATOR">
</head>
<body ocsi=3D3D"x">
<div dir=3D3D"ltr"><font face=3D3D"Tahoma" color=3D3D"#000000" size=3D3D"2"=
>Hello,<=3D
/font></div>
<div>
<div>
<div>
<div dir=3D3D"ltr"><font face=3D3D"tahoma" size=3D3D"2"></font>&nbsp;</div>
<div dir=3D3D"ltr"><font face=3D3D"tahoma" size=3D3D"2">We are trying to us=
e Hado=3D
op-0.20.203.0rc1 for parallel computation.&nbsp; Below are queries</font></=
=3D
div>
<div dir=3D3D"ltr"><font face=3D3D"tahoma" size=3D3D"2"></font>&nbsp;</div>
<div dir=3D3D"ltr"><font face=3D3D"tahoma" size=3D3D"2">Assume single node =
of hig=3D
h configuration machine 8 cores and 8gb memory.</font></div>
<div dir=3D3D"ltr"><font face=3D3D"tahoma" size=3D3D"2"></font>&nbsp;</div>
<div dir=3D3D"ltr"><font face=3D3D"tahoma" size=3D3D"2">(a) How do we know&=
nbsp;n=3D
umber of &nbsp;map tasks spawned?&nbsp; Can this be controlled? We notice o=
=3D
nly 4 jvms running on a single node - namenode, datanode, jobtracker, taskt=
=3D
racker. As we understand depending on number of splits
 that many map tasks are spawned - so we should see that many increase in j=
=3D
vms. </font>
</div>
<div dir=3D3D"ltr"><font face=3D3D"tahoma" size=3D3D"2"></font>&nbsp;</div>
<div dir=3D3D"ltr"><font face=3D3D"tahoma" size=3D3D"2">(b) Our mapper clas=
s shou=3D
ld perform complex computations - it has plenty of dependent jars so how do=
=3D
 we add all jars in class path&nbsp; while running application? Since we re=
=3D
quire to perform parallel computations - we
 need many map tasks running in parallel with different data. All are in sa=
=3D
me machine with different jvms.</font></div>
<div dir=3D3D"ltr"><font face=3D3D"tahoma" size=3D3D"2"></font>&nbsp;</div>
<div dir=3D3D"ltr"><font face=3D3D"tahoma" size=3D3D"2">(c) How does data s=
plit h=3D
appen?&nbsp; JobClient does not talk about data splits? As we understand we=
=3D
 create format for distributed file system, start-all.sh and then &quot;had=
=3D
oop fs -put&quot;. Do this write data to all datanodes?
 But we are unable to see physical location? How does split happen from thi=
=3D
s hdfs source?</font></div>
<div dir=3D3D"ltr"><font face=3D3D"tahoma" size=3D3D"2"></font>&nbsp;</div>
<div dir=3D3D"ltr"><font face=3D3D"tahoma" size=3D3D"2">(d) Can we control =
number=3D
 of reduce tasks? Is this seperate jvm?&nbsp; How&nbsp;are&nbsp; optimal nu=
=3D
mbers&nbsp;for &nbsp;map&nbsp;and reduce tasks determined?</font></div>
<div dir=3D3D"ltr"><font face=3D3D"tahoma" size=3D3D"2"></font>&nbsp;</div>
<div dir=3D3D"ltr"><font face=3D3D"tahoma" size=3D3D"2">(e) Any good docume=
ntatio=3D
n/links which speaks about namenode, datanode, jobtracker and tasktracker.<=
=3D
/font></div>
<div dir=3D3D"ltr"><font face=3D3D"tahoma" size=3D3D"2"></font>&nbsp;</div>
<div dir=3D3D"ltr"><font face=3D3D"tahoma" size=3D3D"2">Thanks</font></div>
</div>
</div>
</div>
<br>
<hr>
<font face=3D3D"Arial" color=3D3D"Gray" size=3D3D"1">::DISCLAIMER::<br>
---------------------------------------------------------------------------=
=3D
--------------------------------------------<br>
<br>
The contents of this e-mail and any attachment(s) are confidential and inte=
=3D
nded for the named recipient(s) only.<br>
It shall not attach any liability on the originator or HCL or its affiliate=
=3D
s. Any views or opinions presented in<br>
this email are solely those of the author and may not necessarily reflect t=
=3D
he opinions of HCL or its affiliates.<br>
Any form of reproduction, dissemination, copying, disclosure, modification,=
=3D
 distribution and / or publication of<br>
this message without the prior written consent of the author of this e-mail=
=3D
 is strictly prohibited. If you have<br>
received this email in error please delete it and notify the sender immedia=
=3D
tely. Before opening any mail and<br>
attachments please check them for viruses and defect.<br>
<br>
---------------------------------------------------------------------------=
=3D
--------------------------------------------<br>
</font>
</body>
</html>=

Reply via email to