Re: Non utf-8 chars in input

2012-09-11 Thread Joshi, Rekha
Hi Ajay,

Try SequenceFileAsBinaryInputFormat ?


Thanks
Rekha

On 11/09/12 11:24 AM, Ajay Srivastava ajay.srivast...@guavus.com wrote:

Hi,

I am using default inputFormat class for reading input from text files
but the input file has some non utf-8 characters.
I guess that TextInputFormat class is default inputFormat class and it
replaces these non utf-8 chars by \uFFFD. If I do not want this
behavior and need actual char in my mapper what should be the correct
inputFormat class ?



Regards,
Ajay Srivastava



what happens when a datanode rejoins?

2012-09-11 Thread Mehul Choube
Hi,

What happens when an existing (not new) datanode rejoins a cluster for 
following scenarios:


1.   Some of the blocks it was managing are deleted/modified?

2.   The size of the blocks are now modified say from 64MB to 128MB?

3.   What if the block replication factor was one (yea not in most 
deployments but say incase) so does the namenode recreate a file once the 
datanode rejoins?



Thanks,
Mehul



Re: what happens when a datanode rejoins?

2012-09-11 Thread George Datskos

Hi Mehul


Some of the blocks it was managing are deleted/modified?



The namenode will asynchronously replicate the blocks to other datanodes 
in order to maintain the replication factor after a datanode has not 
been in contact for 10 minutes.



The size of the blocks are now modified say from 64MB to 128MB?



Block size is a per-file setting so new files will be 128MB, but the old 
ones will remain at 64MB.


What if the block replication factor was one (yea not in most 
deployments but say incase) so does the namenode recreate a file once 
the datanode rejoins?




(assuming you didn't perform a decommission) Blocks that lived only on 
that datanode will be declared missing and the files associated with 
those blocks will be not be able to be fully read, until the datanode 
rejoins.




George


Re: what happens when a datanode rejoins?

2012-09-11 Thread George Datskos

Mehul,

Let me make an addition.


Some of the blocks it was managing are deleted/modified?


Blocks that are deleted in the interim will deleted on the rejoining 
node as well, after it rejoins .  Regarding the modified, I'd advise 
against modifying blocks after they have been fully written.



George


Re: what happens when a datanode rejoins?

2012-09-11 Thread Harsh J
George has answered most of these. I'll just add on:

On Tue, Sep 11, 2012 at 12:44 PM, Mehul Choube
mehul_cho...@symantec.com wrote:
 1.   Some of the blocks it was managing are deleted/modified?

A DN runs a block report upon start, and sends the list of blocks to
the NN. NN validates them and if it finds any files to miss block
replicas post-report, it will schedule a re-replication from one of
the good DNs that still carry it. The modified (out-of-HDFS) blocks
fail their stored checksums so are treated as corrupt and deleted, and
are re-replicated in the same manner.

 2.   The size of the blocks are now modified say from 64MB to 128MB?

George's got this already. Changing of block size does not impact any
existing blocks. It is a per-file metadata prop.

 3.   What if the block replication factor was one (yea not in most
 deployments but say incase) so does the namenode recreate a file once the
 datanode rejoins?

Files exist at the NN metadata (its fsimage/edits persist this).
Blocks pertaining to a file exists at a DN. If the file had a single
replica and that replica was lost, then the file's data is lost and
the NameNode will tell you as much in its metrics/fsck.

-- 
Harsh J


Re: how to make different mappers execute different processing on same data ?

2012-09-11 Thread Narasingu Ramesh
Hi Jason,
Mehmet said is exactly correct ,without reducers we cannot
increase performance please you can add mappers and reducers in any
processing data you can get output and performance is good.
Thanks  Regards,
Ramesh.Narasingu

On Tue, Sep 11, 2012 at 9:31 AM, Mehmet Tepedelenlioglu 
mehmets...@gmail.com wrote:

 If you have n processes to evaluate, make a reducer that calls the process
 i when it receives key i, 1=i=n. Either replicate the data for the n
 reducers, or cache it for it to be read on the reducer side. The reducers
 will output the process id i and the performance.


 On Sep 10, 2012, at 8:25 PM, Jason Yang wrote:

  Hi, all
 
  I've got a question about how to make different mappers execute
 different processing on a same data?
 
  Here is my scenario:
  I got to process a data, however, there multiple choices to process this
 data and I have no idea which one is better, so I was thinking that maybe I
 could execute multiple mappers, in which different processing solution is
 applied, and eventually the best one is chosen according to some evaluation
 functions.
 
  But I'm not sure whether this could be done in MapReduce.
 
  Any help would be appreciated.
 
  --
  YANG, Lin
 




Re: Non utf-8 chars in input

2012-09-11 Thread Ajay Srivastava
Rekha,

I guess that problem is that Text class uses utf-8 encoding and one can not set 
other encoding for this class.
I have not seen any other Text like class which supports other encoding 
otherwise I have written my custom input format class.

Thanks for your inputs.


Regards,
Ajay Srivastava


On 11-Sep-2012, at 1:31 PM, Joshi, Rekha wrote:

 Actually even if that works, it does not seem an ideal solution.
 
 I think format and encoding are distinct, and enforcing format must not
 enforce an encoding.So that means there must be a possibility to pass
 encoding as a user choice on construction,
 e.g.:TextInputFormat(your-encoding).
 But I do not see that in api, so even if I extend
 InputFormat/RecordReader, I will not be able to have a feature of
 setEncoding() on my file format.Having that would be a good solution.
 
 Thanks
 Rekha
 
 On 11/09/12 12:37 PM, Joshi, Rekha rekha_jo...@intuit.com wrote:
 
 Hi Ajay,
 
 Try SequenceFileAsBinaryInputFormat ?
 
 
 Thanks
 Rekha
 
 On 11/09/12 11:24 AM, Ajay Srivastava ajay.srivast...@guavus.com
 wrote:
 
 Hi,
 
 I am using default inputFormat class for reading input from text files
 but the input file has some non utf-8 characters.
 I guess that TextInputFormat class is default inputFormat class and it
 replaces these non utf-8 chars by \uFFFD. If I do not want this
 behavior and need actual char in my mapper what should be the correct
 inputFormat class ?
 
 
 
 Regards,
 Ajay Srivastava
 
 



Re: configure hadoop-0.22 fairscheduler

2012-09-11 Thread Jameson Li
Hi Harsh,

Thanks for your reply. And I am sorry for my unclear description.

As I mentioned previous, I think I configured the fairsheduler correctly in
hadoop-0.22.0.

But when I commit lots of the jobs:
   many big jobs (map number and reduce number is bigger than the
map/reduce slot) commit first.
   and many small jobs(just 1-2map/reduce per job) commit later.
And I find in the jobtracker http page that it used
the JobQueueTaskScheduler, and even the http://jobtracker:port/scheduler
page is not found. When the big jobs is running, the small job can't start
before big job' complete.

So I guess hadoop-0.22 do not support fairscheduler or somewhere I
configured wrong.

专注于Mysql,MSSQL,Oracle,Hadoop


2012/9/8 Harsh J ha...@cloudera.com

 Hey Jameson,

 When calling something inefficient, perhaps also share some details on
 how/why/what? How else would we know what you wish to see and what
 you're seeing instead? :)

 On Thu, Sep 6, 2012 at 2:47 PM, Jameson Li hovlj...@gmail.com wrote:
  I want to test version hadoop-0.22.
  But when configurate the fairescheduler, I have some troublesome. The
  fairscheduler is not efficient.
  And I have configured this items in the mapred-site.xml, and also I hava
  copy the fairscheduler jar file to the $HADOOP_HOME/lib:
 
property
  namemapreduce.jobtracker.taskScheduler/name
  valueorg.apache.hadoop.mapred.FairScheduler/value
/property
 
  property
  namemapred.fairscheduler.allocation.file/name
  valueconf/pools.xml/value
  /property
  property
  namemapred.fairscheduler.preemption/name
  valuetrue/value
  /property
  property
  namemapred.fairscheduler.assignmultiple/name
  valuetrue/value
  /property
  property
  namemapred.fairscheduler.poolnameproperty/name
  valuemapred.queue.name/value
  descriptionjob.set(mapred.queue.name,pool); // pool is set
 to
  either 'high' or 'low' /description
  /property
  property
  namemapred.queue.names/name
  valuedefault,aaa,bbb/value
  /property
 
  And the pools.xml in $HADOOP_HOME/conf 's content:
 
  ?xml version=1.0?
  allocations
pool name=putindb
 minMaps72/minMaps
 minReduces16/minReduces
 maxRunningJobs20/maxRunningJobs
 weight3.0/weight
 minSharePreemptionTimeout60/minSharePreemptionTimeout
/pool
 
pool name=machinelearning
 minMaps9/minMaps
 minReduces2/minReduces
 maxRunningJobs10/maxRunningJobs
 weight2.0/weight
 minSharePreemptionTimeout60/minSharePreemptionTimeout
/pool
 
pool name=default
 minMaps9/minMaps
 minReduces2/minReduces
 maxRunningJobs10/maxRunningJobs
 weight1.0/weight
 minSharePreemptionTimeout60/minSharePreemptionTimeout
/pool
 
defaultMinSharePreemptionTimeout60/defaultMinSharePreemptionTimeout
fairSharePreemptionTimeout60/fairSharePreemptionTimeout
  /allocations
 
  Can someone help me?
 
 
  专注于Mysql,MSSQL,Oracle,Hadoop



 --
 Harsh J



what happens when a datanode rejoins?

2012-09-11 Thread mehul choube
Hi,


What happens when an existing (not new) datanode rejoins a cluster for
following scenarios:


a) Some of the blocks it was managing are deleted/modified?

b) The size of the blocks are now modified say from 64MB to 128MB?

c) What if the block replication factor was one (yea not in most
deployments but say in case) so does the namenode recreate a file once the
datanode rejoins?




Thanks,

Mehul


RE: build failure - trying to build hadoop trunk checkout

2012-09-11 Thread Tony Burton

Changing the hostname to lowercase fixed this particular problem - thanks for 
your replies.

The build is failing elsewhere now, I'll post a new thread for that.

Tony



From: Tony Burton [mailto:tbur...@sportingindex.com]
Sent: 10 September 2012 10:44
To: user@hadoop.apache.org
Subject: RE: build failure - trying to build hadoop trunk checkout

Thanks for the replies all. I'll investigate changing my hostname and report 
back.

(Seems a bit hacky though - can someone explain using easy words why this 
happens in Kerberos?)

Tony



From: Vinod Kumar Vavilapalli [mailto:vino...@hortonworks.com]
Sent: 06 September 2012 18:51
To: user@hadoop.apache.org
Subject: Re: build failure - trying to build hadoop trunk checkout

Never mind filing. I recalled that we debugged this issue long time back and 
cornered this down to problems with kerberos. See 
https://issues.apache.org/jira/browse/HADOOP-7988.

Given that, Tony, changing your hostname seems to be the only option.

Thanks,
+Vinod

On Sep 6, 2012, at 4:24 AM, Steve Loughran wrote:

file this as a JIRA against hadoop-common, though your machine's hostname looks 
suspiciously close to breaking the RFC-952 rules on hostnames

-steve
On 5 September 2012 14:49, Tony Burton 
tbur...@sportingindex.commailto:tbur...@sportingindex.com wrote:
 Hi,

I'm following the steps at http://wiki.apache.org/hadoop/HowToContribute for 
building hadoop in preparation for submitting a patch.

I've checked out the trunk, and when I run mvn test from the top-level 
directory, I get the following error (snipped for clarity):

Results :

Failed tests:   
testLocalHostNameForNullOrWild(org.apache.hadoop.security.TestSecurityUtil): 
expected:hdfs/tonyb-[Precision-WorkS]tation-390@REALM but 
was:hdfs/tonyb-[precision-works]tation-390@REALM

The test in question is:

@Test
  public void testLocalHostNameForNullOrWild() throws Exception {
String local = SecurityUtil.getLocalHostName();
assertEquals(hdfs/ + local + @REALM,
 SecurityUtil.getServerPrincipal(hdfs/_HOST@REALM, 
(String)null));
assertEquals(hdfs/ + local + @REALM,
 SecurityUtil.getServerPrincipal(hdfs/_HOST@REALM, 
0.0.0.0));
  }

And my hostname is tonyb-Precision-WorkStation-390.

Any idea how I can get this test to pass?

Thanks,

Tony






**
This email and any attachments are confidential, protected by copyright and may 
be legally privileged.  If you are not the intended recipient, then the 
dissemination or copying of this email is prohibited. If you have received this 
in error, please notify the sender by replying by email and then delete the 
email completely from your system.  Neither Sporting Index nor the sender 
accepts responsibility for any virus, or any other defect which might affect 
any computer or IT system into which the email is received and/or opened.  It 
is the responsibility of the recipient to scan the email and no responsibility 
is accepted for any loss or damage arising in any way from receipt or use of 
this email.  Sporting Index Ltd is a company registered in England and Wales 
with company number 2636842, whose registered office is at Gateway House, 
Milverton Street, London, SE11 4AP.  Sporting Index Ltd is authorised and 
regulated by the UK Financial Services Authority (reg. no. 150404) and Gambling 
Commission (reg. no. 000-027343-R-308898-001).  Any financial promotion 
contained herein has been issued
and approved by Sporting Index Ltd.

Outbound email has been scanned for viruses and SPAM



Inbound Email has been scanned for viruses and SPAM
P Think of the environment: please don't print this email unless you really 
need to.

Outbound Email has been scanned for viruses and SPAM

This email and any attachments are confidential, protected by copyright and may 
be legally privileged. If you are not the intended recipient, then the 
dissemination or copying of this email is prohibited. If you have received this 
in error, please notify the sender by replying by email and then delete the 
email completely from your system. Neither Sporting Index nor the sender 
accepts responsibility for any virus, or any other defect which might affect 
any computer or IT system into which the email is received and/or opened. It is 
the responsibility of the recipient to scan the email and no responsibility is 
accepted for any loss or damage arising in any way from receipt or use of this 
email. Sporting Index Ltd is a company registered in England and Wales with 
company number 2636842, whose registered office is at Gateway House, Milverton 
Street, London, SE11 4AP. Sporting Index Ltd is authorised and regulated by the 
UK Financial Services Authority (reg. no. 150404) and Gambling Commission (reg. 
no. 000-027343-R-308898-001). Any financial promotion contained herein has been 
issued and approved by Sporting Index Ltd.

Inbound Email has been scanned for viruses and SPAM

Re: Can't run PI example on hadoop 0.23.1

2012-09-11 Thread Narasingu Ramesh
Hi Vinod,
Please check whether input file location and output file
location doesnt match. please find your input file first put into HDFS and
then run MR job it is working fine.
Thanks  Regards,
Ramesh.Narasingu

On Tue, Sep 11, 2012 at 4:23 AM, Vinod Kumar Vavilapalli 
vino...@hortonworks.com wrote:


 The AM corresponding to your MR job is failing continuously. Can you check
 the container logs for your AM ? They should be in
 ${yarn.nodemanager.log-dirs}/${application-id}/container_[0-9]*_0001_01_01/stderr

 Thanks,
 +Vinod

 On Sep 10, 2012, at 3:19 PM, Smarty Juice wrote:

 Hello Champions,

 I am a newbie to hadoop and trying to run first hadoop example : PI , but
 I get the FIleNotFoundException, I do not know why its looking for this
 file on hdfs, anyone pls help

 Thank you,

 bin/hadoop jar
 share/hadoop/mapreduce/hadoop-mapreduce-examples-0.23.1.jar pi -
 Dmapreduce.clientfactory.class.name=org.apache.hadoop.mapred.YarnClientFactory
 -libjars
 share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-0.23.1.jar 16 1
 12/09/06 19:56:10 WARN conf.Configuration:
 mapred.used.genericoptionsparser is deprecated. Instead, use
 mapreduce.client.genericoptionsparser.used
 Number of Maps = 16
 Samples per Map = 1
 Wrote input for Map #0
 Wrote input for Map #1
 Wrote input for Map #2
 Wrote input for Map #3
 Wrote input for Map #4
 Wrote input for Map #5
 Wrote input for Map #6
 Wrote input for Map #7
 Wrote input for Map #8
 Wrote input for Map #9
 Wrote input for Map #10
 Wrote input for Map #11
 Wrote input for Map #12
 Wrote input for Map #13
 Wrote input for Map #14
 Wrote input for Map #15
 Starting Job
 12/09/06 19:56:22 WARN conf.Configuration: fs.default.name is deprecated.
 Instead, use fs.defaultFS
 12/09/06 19:56:22 INFO input.FileInputFormat: Total input paths to process
 : 16
 12/09/06 19:56:23 INFO mapreduce.JobSubmitter: number of splits:16
 12/09/06 19:56:25 INFO mapred.ResourceMgrDelegate: Submitted application
 application_1346972652940_0002 to ResourceManager at /0.0.0.0:8040
 12/09/06 19:56:27 INFO mapreduce.Job: The url to track the job:
 http://1.1.1.11:8088/proxy/application_1346972652940_0002/http://10.215.12.11:8088/proxy/application_1346972652940_0002/
 12/09/06 19:56:27 INFO mapreduce.Job: Running job: job_1346972652940_0002
 12/09/06 19:56:55 INFO mapreduce.Job: Job job_1346972652940_0002 running
 in uber mode : false
 12/09/06 19:56:55 INFO mapreduce.Job: map 0% reduce 0%
 12/09/06 19:56:56 INFO mapreduce.Job: Job job_1346972652940_0002 failed
 with state FAILED due to: Application application_1346972652940_0002 failed
 1 times due to AM Container for appattempt_1346972652940_0002_01 exited
 with exitCode: 1 due to:
 .Failing this attempt.. Failing the application.
 12/09/06 19:56:56 INFO mapreduce.Job: Counters: 0
 Job Finished in 35.048 seconds
 java.io.FileNotFoundException: File does not exist:
 hdfs://localhost:9000/user/X/QuasiMonteCarlo_TMP_3_141592654/out/reduce-out
 at
 org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:729)
 at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1685)
 at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1709)
 at
 org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:314)
 at org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:351)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
 at
 org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:360)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at
 org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
 at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:144)
 at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:68)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:200)





Re: hadoop trunk build failure - yarn, surefire related?

2012-09-11 Thread Narasingu Ramesh
Hi,
 Please find i think one command is there then only build the all
applications.
Thanks  Regards,
Ramesh.Narasingu

On Tue, Sep 11, 2012 at 2:28 PM, Tony Burton tbur...@sportingindex.comwrote:

  Hi,

 ** **

 I’ve checked out the hadoop trunk, and I’m running “mvn test” on the
 codebase as part of following the “How To Contribute” guide at
 http://wiki.apache.org/hadoop/HowToContribute. The tests are currently
 failing in hadoop-mapreduce-client-jobclient, the failure message is below
 – something to do with org.hadoop.yarn.client and/or the maven surefire
 plugin. Can anyone suggest how to proceed?  Thanks!

 ** **

 ** **

 ** **

 [INFO]
 
 [INFO] Reactor Summary:
 [INFO] [snipped SKIPPED test groups]
 [INFO] hadoop-mapreduce-client-jobclient . FAILURE
 [13.988s]
 [INFO]
 [INFO]
 
 [INFO] BUILD FAILURE
 [INFO]
 
 [INFO] Total time: 17.229s
 [INFO] Finished at: Tue Sep 11 09:53:44 BST 2012
 [INFO] Final Memory: 21M/163M
 [INFO]
 
 [ERROR] Failed to execute goal
 org.apache.maven.plugins:maven-surefire-plugin:2.12:test (default-test) on
 project hadoop-mapreduce-client-jobclient: Execution default-test of goal
 org.apache.maven.plugins:maven-surefire-plugin:2.12:test failed:
 java.lang.reflect.InvocationTargetException; nested exception is
 java.lang.reflect.InvocationTargetException: null:
 org/hadoop/yarn/client/YarnClientImpl:
 org.hadoop.yarn.client.YarnClientImpl - [Help 1]
 [ERROR]
 [ERROR] To see the full stack trace of the errors, re-run Maven with the
 -e switch.
 [ERROR] Re-run Maven using the -X switch to enable full debug logging.
 [ERROR]
 [ERROR] For more information about the errors and possible solutions,
 please read the following articles:
 [ERROR] [Help 1]
 http://cwiki.apache.org/confluence/display/MAVEN/PluginExecutionException*
 ***

 *P **Think of the environment: please don't print this email unless you
 really need to.*

 Outbound Email has been scanned for viruses and SPAM

 This email and any attachments are confidential, protected by copyright
 and may be legally privileged. If you are not the intended recipient, then
 the dissemination or copying of this email is prohibited. If you have
 received this in error, please notify the sender by replying by email and
 then delete the email completely from your system. Neither Sporting Index
 nor the sender accepts responsibility for any virus, or any other defect
 which might affect any computer or IT system into which the email is
 received and/or opened. It is the responsibility of the recipient to scan
 the email and no responsibility is accepted for any loss or damage arising
 in any way from receipt or use of this email. Sporting Index Ltd is a
 company registered in England and Wales with company number 2636842, whose
 registered office is at Gateway House, Milverton Street, London, SE11 4AP.
 Sporting Index Ltd is authorised and regulated by the UK Financial Services
 Authority (reg. no. 150404) and Gambling Commission (reg. no.
 000-027343-R-308898-001). Any financial promotion contained herein has been
 issued and approved by Sporting Index Ltd.



RE: what happens when a datanode rejoins?

2012-09-11 Thread Mehul Choube
 The namenode will asynchronously replicate the blocks to other datanodes in 
 order to maintain the replication factor after a datanode has not been in 
 contact for 10 minutes.
What happens when the datanode rejoins after namenode has already re-replicated 
the blocs it was managing?
Will namenode ask the datanode to discard the blocks and start managing new 
blocks?
Or will namenode discard the new blocks which were replicated due to 
unavailability of this datanode?



Thanks,
Mehul


From: George Datskos [mailto:george.dats...@jp.fujitsu.com]
Sent: Tuesday, September 11, 2012 12:56 PM
To: user@hadoop.apache.org
Subject: Re: what happens when a datanode rejoins?

Hi Mehul
Some of the blocks it was managing are deleted/modified?

The namenode will asynchronously replicate the blocks to other datanodes in 
order to maintain the replication factor after a datanode has not been in 
contact for 10 minutes.


The size of the blocks are now modified say from 64MB to 128MB?

Block size is a per-file setting so new files will be 128MB, but the old ones 
will remain at 64MB.


What if the block replication factor was one (yea not in most deployments but 
say incase) so does the namenode recreate a file once the datanode rejoins?

(assuming you didn't perform a decommission) Blocks that lived only on that 
datanode will be declared missing and the files associated with those blocks 
will be not be able to be fully read, until the datanode rejoins.



George


Re: what happens when a datanode rejoins?

2012-09-11 Thread Harsh J
Hi,

Inline.

On Tue, Sep 11, 2012 at 2:36 PM, Mehul Choube mehul_cho...@symantec.com wrote:
 The namenode will asynchronously replicate the blocks to other datanodes
 in order to maintain the replication factor after a datanode has not been in
 contact for 10 minutes.

 What happens when the datanode rejoins after namenode has already
 re-replicated the blocs it was managing?

The block count total goes +1, and the file's block is treated as an
over-replicated one.

 Will namenode ask the datanode to discard the blocks and start managing new
 blocks?

Yes, this may happen.

 Or will namenode discard the new blocks which were replicated due to
 unavailability of this datanode?

It deletes extra blocks while still keeping the block placement policy
in mind. It may delete any block replica as long as the placement
policy is not violated by doing so.

-- 
Harsh J


RE: what happens when a datanode rejoins?

2012-09-11 Thread Mehul Choube
DataNode rejoins take care of only NameNode.
Sorry didn't get this


From: Narasingu Ramesh [mailto:ramesh.narasi...@gmail.com]
Sent: Tuesday, September 11, 2012 2:38 PM
To: user@hadoop.apache.org
Subject: Re: what happens when a datanode rejoins?

Hi Mehul,
 DataNode rejoins take care of only NameNode.
Thanks  Regards,
Ramesh.Narasingu
On Tue, Sep 11, 2012 at 2:36 PM, Mehul Choube 
mehul_cho...@symantec.commailto:mehul_cho...@symantec.com wrote:
 The namenode will asynchronously replicate the blocks to other datanodes in 
 order to maintain the replication factor after a datanode has not been in 
 contact for 10 minutes.
What happens when the datanode rejoins after namenode has already re-replicated 
the blocs it was managing?
Will namenode ask the datanode to discard the blocks and start managing new 
blocks?
Or will namenode discard the new blocks which were replicated due to 
unavailability of this datanode?



Thanks,
Mehul


From: George Datskos 
[mailto:george.dats...@jp.fujitsu.commailto:george.dats...@jp.fujitsu.com]
Sent: Tuesday, September 11, 2012 12:56 PM
To: user@hadoop.apache.orgmailto:user@hadoop.apache.org
Subject: Re: what happens when a datanode rejoins?

Hi Mehul
Some of the blocks it was managing are deleted/modified?

The namenode will asynchronously replicate the blocks to other datanodes in 
order to maintain the replication factor after a datanode has not been in 
contact for 10 minutes.

The size of the blocks are now modified say from 64MB to 128MB?

Block size is a per-file setting so new files will be 128MB, but the old ones 
will remain at 64MB.

What if the block replication factor was one (yea not in most deployments but 
say incase) so does the namenode recreate a file once the datanode rejoins?

(assuming you didn't perform a decommission) Blocks that lived only on that 
datanode will be declared missing and the files associated with those blocks 
will be not be able to be fully read, until the datanode rejoins.



George



Re: Undeliverable messages

2012-09-11 Thread Harsh J
Ha, good sleuthing.

I just moved it to INFRA, as no one from our side has gotten to this
yet. I guess we can only moderate, not administrate. So the ticket now
awaits action from INFRA on ejecting it out.

On Tue, Sep 11, 2012 at 2:34 PM, Tony Burton tbur...@sportingindex.com wrote:
 Thanks Harsh. By the looks of Stanislav's linkedin profile, he's moved on 
 from Sungard, so Outlook's filtering rules will look after his list bounce 
 messages from now on :)


 -Original Message-
 From: Harsh J [mailto:ha...@cloudera.com]
 Sent: 10 September 2012 12:11
 To: user@hadoop.apache.org
 Subject: Re: Undeliverable messages

 I'd filed this as much, but unsure how to get it done:
 https://issues.apache.org/jira/browse/HADOOP-8062. I'm not an admin on
 the mailing lists.

 On Mon, Sep 10, 2012 at 3:16 PM, Tony Burton tbur...@sportingindex.com 
 wrote:
 Sorry for admin-only content: can we remove this address from the list? I
 get the bounce message below whenever I post to user@hadoop.apache.org.

 Thanks!

 Tony


 _
 From: postmas...@sas.sungardrs.com [mailto:postmas...@sas.sungardrs.com]
 Sent: 10 September 2012 10:45
 To: Tony Burton
 Subject: Undeliverable: build failure - trying to build hadoop trunk
 checkout


 Delivery has failed to these recipients or distribution lists:

 NOTES:Stanislav_Seltser/SSP/SUNGARD%SSP-Bedford@sas.sungardrs.com
 An error occurred while trying to deliver this message to the recipient's
 e-mail address. Microsoft Exchange will not try to redeliver this message
 for you. Please try resending this message, or provide the following
 diagnostic text to your system administrator.







 Diagnostic information for administrators:

 Generating server: SUNGARD

 NOTES:Stanislav_Seltser/SSP/SUNGARD%SSP-Bedford@sas.sungardrs.com
 # #5.0.0 X-Notes;The message could not be delivered because the recipient's
 destination email system is unknown or invalid. Please check the address and
 try again, or contact your system administrator to verify connectivity to
 the email system of  #SMTP#

 Original message headers:

 To: user@hadoop.apache.org
 CC:
 Reply-To: user@hadoop.apache.org
 X-Priority: 3 (Normal)
 Importance: Normal
 Date: Mon, 10 Sep 2012 05:43:30 -0400
 From: tbur...@sportingindex.com
 Subject: RE: build failure - trying to build hadoop trunk checkout
 Message-ID: of32a4aa74.63265ba7-on85257a75.00358...@sas.sungardrs.com
 X-MIMETrack: Serialize by Router on Notes9/Corporate/SunGard(Release
 8.5.2FP2|March 22, 2011) at
  09/10/2012 05:44:23 AM,
 Serialize complete at 09/10/2012 05:44:23 AM
 MIME-Version: 1.0
 Content-Type: multipart/alternative;
 boundary=_=_NextPart_001_01CD8F38.BC144D00;
 charset=iso-8859-1


 P Think of the environment: please don't print this email unless you really
 need to.

 Outbound Email has been scanned for viruses and SPAM

 This email and any attachments are confidential, protected by copyright and
 may be legally privileged. If you are not the intended recipient, then the
 dissemination or copying of this email is prohibited. If you have received
 this in error, please notify the sender by replying by email and then delete
 the email completely from your system. Neither Sporting Index nor the sender
 accepts responsibility for any virus, or any other defect which might affect
 any computer or IT system into which the email is received and/or opened. It
 is the responsibility of the recipient to scan the email and no
 responsibility is accepted for any loss or damage arising in any way from
 receipt or use of this email. Sporting Index Ltd is a company registered in
 England and Wales with company number 2636842, whose registered office is at
 Gateway House, Milverton Street, London, SE11 4AP. Sporting Index Ltd is
 authorised and regulated by the UK Financial Services Authority (reg. no.
 150404) and Gambling Commission (reg. no. 000-027343-R-308898-001). Any
 financial promotion contained herein has been issued and approved by
 Sporting Index Ltd.



 --
 Harsh J

 www.sportingindex.com
 Inbound Email has been scanned for viruses and SPAM
 **
 This email and any attachments are confidential, protected by copyright and 
 may be legally privileged.  If you are not the intended recipient, then the 
 dissemination or copying of this email is prohibited. If you have received 
 this in error, please notify the sender by replying by email and then delete 
 the email completely from your system.  Neither Sporting Index nor the sender 
 accepts responsibility for any virus, or any other defect which might affect 
 any computer or IT system into which the email is received and/or opened.  It 
 is the responsibility of the recipient to scan the email and no 
 responsibility is accepted for any loss or damage arising in any way from 
 receipt or use of this email.  Sporting Index Ltd is a company registered in 
 England and 

RE: hadoop trunk build failure - yarn, surefire related?

2012-09-11 Thread Tony Burton
Hi Ramesh

Thanks for the quick reply, but I'm having trouble following your English. Are 
you saying that there is one command to build everything? If so, can you tell 
me what it is?

Tony




From: Narasingu Ramesh [mailto:ramesh.narasi...@gmail.com]
Sent: 11 September 2012 10:06
To: user@hadoop.apache.org
Subject: Re: hadoop trunk build failure - yarn, surefire related?

Hi,
 Please find i think one command is there then only build the all 
applications.
Thanks  Regards,
Ramesh.Narasingu
On Tue, Sep 11, 2012 at 2:28 PM, Tony Burton 
tbur...@sportingindex.commailto:tbur...@sportingindex.com wrote:
Hi,

I've checked out the hadoop trunk, and I'm running mvn test on the codebase 
as part of following the How To Contribute guide at 
http://wiki.apache.org/hadoop/HowToContribute. The tests are currently failing 
in hadoop-mapreduce-client-jobclient, the failure message is below - something 
to do with org.hadoop.yarn.client and/or the maven surefire plugin. Can anyone 
suggest how to proceed?  Thanks!



[INFO] 
[INFO] Reactor Summary:
[INFO] [snipped SKIPPED test groups]
[INFO] hadoop-mapreduce-client-jobclient . FAILURE [13.988s]
[INFO]
[INFO] 
[INFO] BUILD FAILURE
[INFO] 
[INFO] Total time: 17.229s
[INFO] Finished at: Tue Sep 11 09:53:44 BST 2012
[INFO] Final Memory: 21M/163M
[INFO] 
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-surefire-plugin:2.12:test (default-test) on 
project hadoop-mapreduce-client-jobclient: Execution default-test of goal 
org.apache.maven.plugins:maven-surefire-plugin:2.12:test failed: 
java.lang.reflect.InvocationTargetException; nested exception is 
java.lang.reflect.InvocationTargetException: null: 
org/hadoop/yarn/client/YarnClientImpl: org.hadoop.yarn.client.YarnClientImpl - 
[Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/PluginExecutionException
P Think of the environment: please don't print this email unless you really 
need to.

Outbound Email has been scanned for viruses and SPAM

This email and any attachments are confidential, protected by copyright and may 
be legally privileged. If you are not the intended recipient, then the 
dissemination or copying of this email is prohibited. If you have received this 
in error, please notify the sender by replying by email and then delete the 
email completely from your system. Neither Sporting Index nor the sender 
accepts responsibility for any virus, or any other defect which might affect 
any computer or IT system into which the email is received and/or opened. It is 
the responsibility of the recipient to scan the email and no responsibility is 
accepted for any loss or damage arising in any way from receipt or use of this 
email. Sporting Index Ltd is a company registered in England and Wales with 
company number 2636842, whose registered office is at Gateway House, Milverton 
Street, London, SE11 4AP. Sporting Index Ltd is authorised and regulated by the 
UK Financial Services Authority (reg. no. 150404) and Gambling Commission (reg. 
no. 000-027343-R-308898-001). Any financial promotion contained herein has been 
issued and approved by Sporting Index Ltd.


Inbound Email has been scanned for viruses and SPAM
**
This email and any attachments are confidential, protected by copyright and may 
be legally privileged.  If you are not the intended recipient, then the 
dissemination or copying of this email is prohibited. If you have received this 
in error, please notify the sender by replying by email and then delete the 
email completely from your system.  Neither Sporting Index nor the sender 
accepts responsibility for any virus, or any other defect which might affect 
any computer or IT system into which the email is received and/or opened.  It 
is the responsibility of the recipient to scan the email and no responsibility 
is accepted for any loss or damage arising in any way from receipt or use of 
this email.  Sporting Index Ltd is a company registered in England and Wales 
with company number 2636842, whose registered office is at Gateway House, 
Milverton Street, London, SE11 4AP.  Sporting Index Ltd is authorised and 
regulated by the UK Financial Services Authority (reg. no. 150404) and Gambling 
Commission (reg. no. 000-027343-R-308898-001).  Any financial promotion 
contained herein has been issued 
and approved by 

Re: Undeliverable messages

2012-09-11 Thread Harsh J
And done. We shouldn't get this anymore. Thanks for bumping on this issue Tony!

On Tue, Sep 11, 2012 at 2:44 PM, Harsh J ha...@cloudera.com wrote:
 Ha, good sleuthing.

 I just moved it to INFRA, as no one from our side has gotten to this
 yet. I guess we can only moderate, not administrate. So the ticket now
 awaits action from INFRA on ejecting it out.

 On Tue, Sep 11, 2012 at 2:34 PM, Tony Burton tbur...@sportingindex.com 
 wrote:
 Thanks Harsh. By the looks of Stanislav's linkedin profile, he's moved on 
 from Sungard, so Outlook's filtering rules will look after his list bounce 
 messages from now on :)


 -Original Message-
 From: Harsh J [mailto:ha...@cloudera.com]
 Sent: 10 September 2012 12:11
 To: user@hadoop.apache.org
 Subject: Re: Undeliverable messages

 I'd filed this as much, but unsure how to get it done:
 https://issues.apache.org/jira/browse/HADOOP-8062. I'm not an admin on
 the mailing lists.

 On Mon, Sep 10, 2012 at 3:16 PM, Tony Burton tbur...@sportingindex.com 
 wrote:
 Sorry for admin-only content: can we remove this address from the list? I
 get the bounce message below whenever I post to user@hadoop.apache.org.

 Thanks!

 Tony


 _
 From: postmas...@sas.sungardrs.com [mailto:postmas...@sas.sungardrs.com]
 Sent: 10 September 2012 10:45
 To: Tony Burton
 Subject: Undeliverable: build failure - trying to build hadoop trunk
 checkout


 Delivery has failed to these recipients or distribution lists:

 NOTES:Stanislav_Seltser/SSP/SUNGARD%SSP-Bedford@sas.sungardrs.com
 An error occurred while trying to deliver this message to the recipient's
 e-mail address. Microsoft Exchange will not try to redeliver this message
 for you. Please try resending this message, or provide the following
 diagnostic text to your system administrator.







 Diagnostic information for administrators:

 Generating server: SUNGARD

 NOTES:Stanislav_Seltser/SSP/SUNGARD%SSP-Bedford@sas.sungardrs.com
 # #5.0.0 X-Notes;The message could not be delivered because the recipient's
 destination email system is unknown or invalid. Please check the address and
 try again, or contact your system administrator to verify connectivity to
 the email system of  #SMTP#

 Original message headers:

 To: user@hadoop.apache.org
 CC:
 Reply-To: user@hadoop.apache.org
 X-Priority: 3 (Normal)
 Importance: Normal
 Date: Mon, 10 Sep 2012 05:43:30 -0400
 From: tbur...@sportingindex.com
 Subject: RE: build failure - trying to build hadoop trunk checkout
 Message-ID: of32a4aa74.63265ba7-on85257a75.00358...@sas.sungardrs.com
 X-MIMETrack: Serialize by Router on Notes9/Corporate/SunGard(Release
 8.5.2FP2|March 22, 2011) at
  09/10/2012 05:44:23 AM,
 Serialize complete at 09/10/2012 05:44:23 AM
 MIME-Version: 1.0
 Content-Type: multipart/alternative;
 boundary=_=_NextPart_001_01CD8F38.BC144D00;
 charset=iso-8859-1


 P Think of the environment: please don't print this email unless you really
 need to.

 Outbound Email has been scanned for viruses and SPAM

 This email and any attachments are confidential, protected by copyright and
 may be legally privileged. If you are not the intended recipient, then the
 dissemination or copying of this email is prohibited. If you have received
 this in error, please notify the sender by replying by email and then delete
 the email completely from your system. Neither Sporting Index nor the sender
 accepts responsibility for any virus, or any other defect which might affect
 any computer or IT system into which the email is received and/or opened. It
 is the responsibility of the recipient to scan the email and no
 responsibility is accepted for any loss or damage arising in any way from
 receipt or use of this email. Sporting Index Ltd is a company registered in
 England and Wales with company number 2636842, whose registered office is at
 Gateway House, Milverton Street, London, SE11 4AP. Sporting Index Ltd is
 authorised and regulated by the UK Financial Services Authority (reg. no.
 150404) and Gambling Commission (reg. no. 000-027343-R-308898-001). Any
 financial promotion contained herein has been issued and approved by
 Sporting Index Ltd.



 --
 Harsh J

 www.sportingindex.com
 Inbound Email has been scanned for viruses and SPAM
 **
 This email and any attachments are confidential, protected by copyright and 
 may be legally privileged.  If you are not the intended recipient, then the 
 dissemination or copying of this email is prohibited. If you have received 
 this in error, please notify the sender by replying by email and then delete 
 the email completely from your system.  Neither Sporting Index nor the 
 sender accepts responsibility for any virus, or any other defect which might 
 affect any computer or IT system into which the email is received and/or 
 opened.  It is the responsibility of the recipient to scan the email and no 
 responsibility 

Re: hadoop trunk build failure - yarn, surefire related?

2012-09-11 Thread Steve Loughran
It's probably some maven thing -in particular Maven's habit of grabbing the
online nightly snapshots off apache rather than local,

try  mvn clean install -DskipTests -offline
to force in all the artifacts, then run the MR tests

Tony -why not get on the mapreduce-dev mailing list, as this is the place
to discuss build and test process.

On 11 September 2012 09:58, Tony Burton tbur...@sportingindex.com wrote:

  Hi,

 ** **

 I’ve checked out the hadoop trunk, and I’m running “mvn test” on the
 codebase as part of following the “How To Contribute” guide at
 http://wiki.apache.org/hadoop/HowToContribute. The tests are currently
 failing in hadoop-mapreduce-client-jobclient, the failure message is below
 – something to do with org.hadoop.yarn.client and/or the maven surefire
 plugin. Can anyone suggest how to proceed?  Thanks!

 ** **

 ** **

 ** **

 [INFO]
 
 [INFO] Reactor Summary:
 [INFO] [snipped SKIPPED test groups]
 [INFO] hadoop-mapreduce-client-jobclient . FAILURE
 [13.988s]
 [INFO]
 [INFO]
 
 [INFO] BUILD FAILURE
 [INFO]
 
 [INFO] Total time: 17.229s
 [INFO] Finished at: Tue Sep 11 09:53:44 BST 2012
 [INFO] Final Memory: 21M/163M
 [INFO]
 
 [ERROR] Failed to execute goal
 org.apache.maven.plugins:maven-surefire-plugin:2.12:test (default-test) on
 project hadoop-mapreduce-client-jobclient: Execution default-test of goal
 org.apache.maven.plugins:maven-surefire-plugin:2.12:test failed:
 java.lang.reflect.InvocationTargetException; nested exception is
 java.lang.reflect.InvocationTargetException: null:
 org/hadoop/yarn/client/YarnClientImpl:
 org.hadoop.yarn.client.YarnClientImpl - [Help 1]
 [ERROR]
 [ERROR] To see the full stack trace of the errors, re-run Maven with the
 -e switch.
 [ERROR] Re-run Maven using the -X switch to enable full debug logging.
 [ERROR]
 [ERROR] For more information about the errors and possible solutions,
 please read the following articles:
 [ERROR] [Help 1]
 http://cwiki.apache.org/confluence/display/MAVEN/PluginExecutionException*
 ***

 *P **Think of the environment: please don't print this email unless you
 really need to.*

 Outbound Email has been scanned for viruses and SPAM

 This email and any attachments are confidential, protected by copyright
 and may be legally privileged. If you are not the intended recipient, then
 the dissemination or copying of this email is prohibited. If you have
 received this in error, please notify the sender by replying by email and
 then delete the email completely from your system. Neither Sporting Index
 nor the sender accepts responsibility for any virus, or any other defect
 which might affect any computer or IT system into which the email is
 received and/or opened. It is the responsibility of the recipient to scan
 the email and no responsibility is accepted for any loss or damage arising
 in any way from receipt or use of this email. Sporting Index Ltd is a
 company registered in England and Wales with company number 2636842, whose
 registered office is at Gateway House, Milverton Street, London, SE11 4AP.
 Sporting Index Ltd is authorised and regulated by the UK Financial Services
 Authority (reg. no. 150404) and Gambling Commission (reg. no.
 000-027343-R-308898-001). Any financial promotion contained herein has been
 issued and approved by Sporting Index Ltd.



RE: hadoop trunk build failure - yarn, surefire related?

2012-09-11 Thread Tony Burton
Thanks Steve, I’ll try the mvn command you suggest. All the snapshots I can see 
came from repository.apache.org though.

How do I run the MR tests only?

Thanks for the mapreduce-dev mailing list suggestion, I thought all lists had 
merged into one though – did I get the wrong end of the stick?

Tony




From: Steve Loughran [mailto:ste...@hortonworks.com]
Sent: 11 September 2012 10:45
To: user@hadoop.apache.org
Subject: Re: hadoop trunk build failure - yarn, surefire related?

It's probably some maven thing -in particular Maven's habit of grabbing the 
online nightly snapshots off apache rather than local,

try  mvn clean install -DskipTests -offline
to force in all the artifacts, then run the MR tests

Tony -why not get on the mapreduce-dev mailing list, as this is the place to 
discuss build and test process.
On 11 September 2012 09:58, Tony Burton 
tbur...@sportingindex.commailto:tbur...@sportingindex.com wrote:
Hi,

I’ve checked out the hadoop trunk, and I’m running “mvn test” on the codebase 
as part of following the “How To Contribute” guide at 
http://wiki.apache.org/hadoop/HowToContribute. The tests are currently failing 
in hadoop-mapreduce-client-jobclient, the failure message is below – something 
to do with org.hadoop.yarn.client and/or the maven surefire plugin. Can anyone 
suggest how to proceed?  Thanks!



[INFO] 
[INFO] Reactor Summary:
[INFO] [snipped SKIPPED test groups]
[INFO] hadoop-mapreduce-client-jobclient . FAILURE [13.988s]
[INFO]
[INFO] 
[INFO] BUILD FAILURE
[INFO] 
[INFO] Total time: 17.229s
[INFO] Finished at: Tue Sep 11 09:53:44 BST 2012
[INFO] Final Memory: 21M/163M
[INFO] 
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-surefire-plugin:2.12:test (default-test) on 
project hadoop-mapreduce-client-jobclient: Execution default-test of goal 
org.apache.maven.plugins:maven-surefire-plugin:2.12:test failed: 
java.lang.reflect.InvocationTargetException; nested exception is 
java.lang.reflect.InvocationTargetException: null: 
org/hadoop/yarn/client/YarnClientImpl: org.hadoop.yarn.client.YarnClientImpl - 
[Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/PluginExecutionException
P Think of the environment: please don't print this email unless you really 
need to.

Outbound Email has been scanned for viruses and SPAM

This email and any attachments are confidential, protected by copyright and may 
be legally privileged. If you are not the intended recipient, then the 
dissemination or copying of this email is prohibited. If you have received this 
in error, please notify the sender by replying by email and then delete the 
email completely from your system. Neither Sporting Index nor the sender 
accepts responsibility for any virus, or any other defect which might affect 
any computer or IT system into which the email is received and/or opened. It is 
the responsibility of the recipient to scan the email and no responsibility is 
accepted for any loss or damage arising in any way from receipt or use of this 
email. Sporting Index Ltd is a company registered in England and Wales with 
company number 2636842, whose registered office is at Gateway House, Milverton 
Street, London, SE11 4AP. Sporting Index Ltd is authorised and regulated by the 
UK Financial Services Authority (reg. no. 150404) and Gambling Commission (reg. 
no. 000-027343-R-308898-001). Any financial promotion contained herein has been 
issued and approved by Sporting Index Ltd.


Inbound Email has been scanned for viruses and SPAM

**
This email and any attachments are confidential, protected by copyright and may 
be legally privileged.  If you are not the intended recipient, then the 
dissemination or copying of this email is prohibited. If you have received this 
in error, please notify the sender by replying by email and then delete the 
email completely from your system.  Neither Sporting Index nor the sender 
accepts responsibility for any virus, or any other defect which might affect 
any computer or IT system into which the email is received and/or opened.  It 
is the responsibility of the recipient to scan the email and no responsibility 
is accepted for any loss or damage arising in any way from receipt or use of 
this email.  Sporting Index Ltd is a company registered in England and Wales 
with company number 2636842, whose 

RE: Undeliverable messages

2012-09-11 Thread Tony Burton
No problem! I'll remove that Outlook filter now... :)

-Original Message-
From: Harsh J [mailto:ha...@cloudera.com] 
Sent: 11 September 2012 10:34
To: user@hadoop.apache.org
Subject: Re: Undeliverable messages

And done. We shouldn't get this anymore. Thanks for bumping on this issue Tony!

On Tue, Sep 11, 2012 at 2:44 PM, Harsh J ha...@cloudera.com wrote:
 Ha, good sleuthing.

 I just moved it to INFRA, as no one from our side has gotten to this
 yet. I guess we can only moderate, not administrate. So the ticket now
 awaits action from INFRA on ejecting it out.

 On Tue, Sep 11, 2012 at 2:34 PM, Tony Burton tbur...@sportingindex.com 
 wrote:
 Thanks Harsh. By the looks of Stanislav's linkedin profile, he's moved on 
 from Sungard, so Outlook's filtering rules will look after his list bounce 
 messages from now on :)


 -Original Message-
 From: Harsh J [mailto:ha...@cloudera.com]
 Sent: 10 September 2012 12:11
 To: user@hadoop.apache.org
 Subject: Re: Undeliverable messages

 I'd filed this as much, but unsure how to get it done:
 https://issues.apache.org/jira/browse/HADOOP-8062. I'm not an admin on
 the mailing lists.

 On Mon, Sep 10, 2012 at 3:16 PM, Tony Burton tbur...@sportingindex.com 
 wrote:
 Sorry for admin-only content: can we remove this address from the list? I
 get the bounce message below whenever I post to user@hadoop.apache.org.

 Thanks!

 Tony


 _
 From: postmas...@sas.sungardrs.com [mailto:postmas...@sas.sungardrs.com]
 Sent: 10 September 2012 10:45
 To: Tony Burton
 Subject: Undeliverable: build failure - trying to build hadoop trunk
 checkout


 Delivery has failed to these recipients or distribution lists:

 NOTES:Stanislav_Seltser/SSP/SUNGARD%SSP-Bedford@sas.sungardrs.com
 An error occurred while trying to deliver this message to the recipient's
 e-mail address. Microsoft Exchange will not try to redeliver this message
 for you. Please try resending this message, or provide the following
 diagnostic text to your system administrator.







 Diagnostic information for administrators:

 Generating server: SUNGARD

 NOTES:Stanislav_Seltser/SSP/SUNGARD%SSP-Bedford@sas.sungardrs.com
 # #5.0.0 X-Notes;The message could not be delivered because the recipient's
 destination email system is unknown or invalid. Please check the address and
 try again, or contact your system administrator to verify connectivity to
 the email system of  #SMTP#

 Original message headers:

 To: user@hadoop.apache.org
 CC:
 Reply-To: user@hadoop.apache.org
 X-Priority: 3 (Normal)
 Importance: Normal
 Date: Mon, 10 Sep 2012 05:43:30 -0400
 From: tbur...@sportingindex.com
 Subject: RE: build failure - trying to build hadoop trunk checkout
 Message-ID: of32a4aa74.63265ba7-on85257a75.00358...@sas.sungardrs.com
 X-MIMETrack: Serialize by Router on Notes9/Corporate/SunGard(Release
 8.5.2FP2|March 22, 2011) at
  09/10/2012 05:44:23 AM,
 Serialize complete at 09/10/2012 05:44:23 AM
 MIME-Version: 1.0
 Content-Type: multipart/alternative;
 boundary=_=_NextPart_001_01CD8F38.BC144D00;
 charset=iso-8859-1


 P Think of the environment: please don't print this email unless you really
 need to.

 Outbound Email has been scanned for viruses and SPAM

 This email and any attachments are confidential, protected by copyright and
 may be legally privileged. If you are not the intended recipient, then the
 dissemination or copying of this email is prohibited. If you have received
 this in error, please notify the sender by replying by email and then delete
 the email completely from your system. Neither Sporting Index nor the sender
 accepts responsibility for any virus, or any other defect which might affect
 any computer or IT system into which the email is received and/or opened. It
 is the responsibility of the recipient to scan the email and no
 responsibility is accepted for any loss or damage arising in any way from
 receipt or use of this email. Sporting Index Ltd is a company registered in
 England and Wales with company number 2636842, whose registered office is at
 Gateway House, Milverton Street, London, SE11 4AP. Sporting Index Ltd is
 authorised and regulated by the UK Financial Services Authority (reg. no.
 150404) and Gambling Commission (reg. no. 000-027343-R-308898-001). Any
 financial promotion contained herein has been issued and approved by
 Sporting Index Ltd.



 --
 Harsh J

 www.sportingindex.com
 Inbound Email has been scanned for viruses and SPAM
 **
 This email and any attachments are confidential, protected by copyright and 
 may be legally privileged.  If you are not the intended recipient, then the 
 dissemination or copying of this email is prohibited. If you have received 
 this in error, please notify the sender by replying by email and then delete 
 the email completely from your system.  Neither Sporting Index nor the 
 sender accepts 

RE: hadoop trunk build failure - yarn, surefire related?

2012-09-11 Thread Tony Burton
Good suggestions Harsh and Hemanth.

When I was asked to submit a patch for hadoop 1.0.3, I thought it a good 
exercise to work through the build process to become familiar even though the 
patch is documentation-only. Maybe the requests for patches could come with a 
list of suggested reading as well as the link to 
http://wiki.apache.org/hadoop/HowToContribute, and the contributer can distill 
the appropriate important steps from all three. Otherwise, it's hard to tell 
which is the gospel version.

Tony





-Original Message-
From: Harsh J [mailto:ha...@cloudera.com] 
Sent: 11 September 2012 11:59
To: user@hadoop.apache.org
Subject: Re: hadoop trunk build failure - yarn, surefire related?

I guess we'll need to clean that guide and divide it in two - For
branch-1 maintenance contributors, and for trunk contributors.

I had another page that serves a slightly different purpose, but may
help you just the same:
http://wiki.apache.org/hadoop/QwertyManiac/BuildingHadoopTrunk

On Tue, Sep 11, 2012 at 4:09 PM, Tony Burton tbur...@sportingindex.com wrote:
 That's good to know - is there a more up to date guide than 
 http://wiki.apache.org/hadoop/HowToContribute which still makes many 
 references to ant builds?




 -Original Message-
 From: Harsh J [mailto:ha...@cloudera.com]
 Sent: 11 September 2012 11:36
 To: user@hadoop.apache.org
 Subject: Re: hadoop trunk build failure - yarn, surefire related?

 Ah there is no longer a need to run ant anymore on trunk. Ignore the
 leftover files in the base MR project directory - those should be
 cleaned up soon. All of the functional pieces of the build right now
 definitely use Maven.

 On Tue, Sep 11, 2012 at 4:01 PM, Tony Burton tbur...@sportingindex.com 
 wrote:
 Ok - thanks Harsh.

 Following Steve's earlier advice I tried the mvn install, which worked fine, 
 then ant test in hadoop-mapreduce-project which failed. I was mid-email to 
 mapreduce-dev@hadoop, now I'll try mvn test in hadoop-mapreduce-project and 
 report back.

 Tony


 -Original Message-
 From: Harsh J [mailto:ha...@cloudera.com]
 Sent: 11 September 2012 11:25
 To: user@hadoop.apache.org
 Subject: Re: hadoop trunk build failure - yarn, surefire related?

 Tony,

 What I do is:

 $ cd hadoop/; mvn install -DskipTests; cd hadoop-mapreduce-project/; mvn test

 This seems to work in running without any missing dependencies at least.

 The user lists are all merged, but the developer lists remain separate
 as that works better.

 On Tue, Sep 11, 2012 at 3:24 PM, Tony Burton tbur...@sportingindex.com 
 wrote:
 Thanks Steve, I'll try the mvn command you suggest. All the snapshots I can
 see came from repository.apache.org though.



 How do I run the MR tests only?



 Thanks for the mapreduce-dev mailing list suggestion, I thought all lists
 had merged into one though - did I get the wrong end of the stick?



 Tony









 From: Steve Loughran [mailto:ste...@hortonworks.com]
 Sent: 11 September 2012 10:45


 To: user@hadoop.apache.org
 Subject: Re: hadoop trunk build failure - yarn, surefire related?



 It's probably some maven thing -in particular Maven's habit of grabbing the
 online nightly snapshots off apache rather than local,



 try  mvn clean install -DskipTests -offline

 to force in all the artifacts, then run the MR tests



 Tony -why not get on the mapreduce-dev mailing list, as this is the place to
 discuss build and test process.

 On 11 September 2012 09:58, Tony Burton tbur...@sportingindex.com wrote:

 Hi,



 I've checked out the hadoop trunk, and I'm running mvn test on the
 codebase as part of following the How To Contribute guide at
 http://wiki.apache.org/hadoop/HowToContribute. The tests are currently
 failing in hadoop-mapreduce-client-jobclient, the failure message is below -
 something to do with org.hadoop.yarn.client and/or the maven surefire
 plugin. Can anyone suggest how to proceed?  Thanks!







 [INFO]
 
 [INFO] Reactor Summary:
 [INFO] [snipped SKIPPED test groups]
 [INFO] hadoop-mapreduce-client-jobclient . FAILURE [13.988s]
 [INFO]
 [INFO]
 
 [INFO] BUILD FAILURE
 [INFO]
 
 [INFO] Total time: 17.229s
 [INFO] Finished at: Tue Sep 11 09:53:44 BST 2012
 [INFO] Final Memory: 21M/163M
 [INFO]
 
 [ERROR] Failed to execute goal
 org.apache.maven.plugins:maven-surefire-plugin:2.12:test (default-test) on
 project hadoop-mapreduce-client-jobclient: Execution default-test of goal
 org.apache.maven.plugins:maven-surefire-plugin:2.12:test failed:
 java.lang.reflect.InvocationTargetException; nested exception is
 java.lang.reflect.InvocationTargetException: null:
 org/hadoop/yarn/client/YarnClientImpl: org.hadoop.yarn.client.YarnClientImpl
 - [Help 1]

Re: what's the default reducer number?

2012-09-11 Thread Bejoy Ks
Hi Lin

The default value for number of reducers is 1

namemapred.reduce.tasks/name
  value1/value

It is not determined by data volume. You need to specify the number of
reducers for your mapreduce jobs as per your data volume.

Regards
Bejoy KS

On Tue, Sep 11, 2012 at 4:53 PM, Jason Yang lin.yang.ja...@gmail.comwrote:

 Hi, all

 I was wondering what's the default number of reducer if I don't set it in
 configuration?

 Will it change dynamically according to the output volume of Mapper?

 --
 YANG, Lin




Re: FW: Doubts Reg

2012-09-11 Thread sudha sadhasivam
Dear Madam,

I'am keeping as attachment relevant screen shots of running hive

Profile.png  shows contents of /etc/profile

hive_common_lib.png shows h-ve_common*.jar is already in $HIVE_HOME/lib  , here 
$HIVE_HOME is /home/yahoo/hive/build/dist as evident from classpath_err.png

Yours Truly
G Sudha


--- On Mon, 9/10/12, sowmya@nokia.com sowmya@nokia.com wrote:

From: sowmya@nokia.com sowmya@nokia.com
Subject: FW: Doubts Reg
To: sudhasadhasi...@yahoo.com
Cc: chidambara...@nokia.com
Date: Monday, September 10, 2012, 9:53 AM



 
 




Hi Mam, 
   
Please let us know if the below solution works. 
   

Thanks  Regards, 
Sowmya Samuel 

   


From: Bhagtani Chandra (Nokia-LC/Bangalore)


Sent: Monday, September 10, 2012 9:52 AM

To: Talanki Shobanbabu (Nokia-LC/Bangalore); Nayak Santhosh (Nokia-LC/Bangalore)

Cc: B Sowmya.1 (Nokia-LC/Bangalore)

Subject: Re: Doubts Reg 


   



Hi Sowmya, 


   


This is a classpath problem. They have to check whether
hive-common-*.jar is in hive home's lib directory.  


   




Thanks, 


Chandra 


   






   


From:
Shoban shobanbabu.tala...@nokia.com

Date: Monday, September 10, 2012 9:45 AM

To: Bhagtani Chandra Prakash chandra.bhagt...@nokia.com, Nayak Santhosh 
(Nokia-LC/Bangalore) santhosh.na...@nokia.com

Subject: FW: Doubts Reg 


   



Do you know what needs to be done here?  This is from Univ. guys 
  


From: B Sowmya.1 (Nokia-LC/Bangalore)


Sent: Monday, September 10, 2012 9:44 AM

To: Talanki Shobanbabu (Nokia-LC/Bangalore)

Subject: FW: Doubts Reg 


  
Hi Shoban, 
  
Can someone please provide a solution to this error if possible 
  
Thanks  Regards, 
Sowmya Samuel 
  
From: ext sudha sadhasivam [mailto:sudhasadhasi...@yahoo.com]


Sent: Saturday, September 08, 2012 2:23 PM

To: B Sowmya.1 (Nokia-LC/Bangalore)

Cc: K Chidambaran (Nokia-LC/Bangalore)

Subject: Fw: Doubts Reg 
  





  





Sir

WE have doubts reg hive installation

We are herewith attaching screenshots for the same

path file for conf is not set

Can you give suggestions for the same

Thanking you

Dr G sudha Sadasivam 










  







Re: what's the default reducer number?

2012-09-11 Thread Bejoy Ks
Hi Lin

The default values for all the properties are in
core-default.xml
hdfs-default.xml and
mapred-default.xml


Regards
Bejoy KS


On Tue, Sep 11, 2012 at 5:06 PM, Jason Yang lin.yang.ja...@gmail.comwrote:

 Hi, Bejoy

 Thanks for you reply.

 where could I find the default value of mapred.reduce.tasks ? I have
 checked the core-site.xml, hdfs-site.xml and mapred-site.xml, but I haven't
 found it.


 2012/9/11 Bejoy Ks bejoy.had...@gmail.com

 Hi Lin

 The default value for number of reducers is 1

 namemapred.reduce.tasks/name
   value1/value

 It is not determined by data volume. You need to specify the number of
 reducers for your mapreduce jobs as per your data volume.

 Regards
 Bejoy KS

 On Tue, Sep 11, 2012 at 4:53 PM, Jason Yang lin.yang.ja...@gmail.comwrote:

 Hi, all

 I was wondering what's the default number of reducer if I don't set it
 in configuration?

 Will it change dynamically according to the output volume of Mapper?

 --
 YANG, Lin





 --
 YANG, Lin




Re: Some general questions about DBInputFormat

2012-09-11 Thread Bejoy KS
Hi Yaron

Sqoop uses a similar implementation. You can get some details there.

Replies inline
• (more general question) Are there many use-cases for using DBInputFormat? Do 
most Hadoop jobs take their input from files or DBs?

 From my small experience Most MR jobs have data in hdfs. It is useful for 
 getting data out of rdbms to hadoop, sqoop implemenation is an example.


• Since all mappers open a connection to the same DBS, one cannot use hundreds 
of mapper. Is there a solution to this problem? 

Num of mappers shouldn't be more than the permissible number of connections 
allowed for that db. 



Regards
Bejoy KS

Sent from handheld, please excuse typos.

-Original Message-
From: Yaron Gonen yaron.go...@gmail.com
Date: Tue, 11 Sep 2012 15:41:26 
To: user@hadoop.apache.org
Reply-To: user@hadoop.apache.org
Subject: Some general questions about DBInputFormat

Hi,
After reviewing the class's (not very complicated) code, I have some
questions I hope someone can answer:

   - (more general question) Are there many use-cases for using
   DBInputFormat? Do most Hadoop jobs take their input from files or DBs?
   - What happens when the database is updated during mappers' data
   retrieval phase? is there a way to lock the database before the data
   retrieval phase and release it afterwords?
   - Since all mappers open a connection to the same DBS, one cannot use
   hundreds of mapper. Is there a solution to this problem?

Thanks,
Yaron



Question about the task assignment strategy

2012-09-11 Thread Hiroyuki Yamada
Hi,

I want to make sure my understanding about task assignment in hadoop
is correct or not.

When scanning a file with multiple tasktrackers,
I am wondering how a task is assigned to each tasktracker .
Is it based on the block sequence or data locality ?

Let me explain my question by example.
There is a file which composed of 10 blocks (block1 to block10), and
block1 is the beginning of the file and block10 is the tail of the file.
When scanning the file with 3 tasktrackers (tt1 to tt3),
I am wondering if
task assignment is based on the block sequence like
first tt1 takes block1 and tt2 takes block2 and tt3 takes block3 and
tt1 takes block4 and so on
or
task assignment is based on the task(data) locality like
first tt1 takes block2(because it's located in the local) and tt2
takes block1 (because it's located in the local) and
tt3 takes block 4(because it's located in the local) and so on.

As far as I experienced and the definitive guide book says,
I think that the first case is the task assignment strategy.
(and if there are many replicas, closest one is picked.)

Is this right ?

If this is right, is there any way to do like the second case
with the current implementation ?

Thanks,

Hiroyuki


Re: Question about the task assignment strategy

2012-09-11 Thread Hemanth Yamijala
Hi,

Task assignment takes data locality into account first and not block
sequence. In hadoop, tasktrackers ask the jobtracker to be assigned tasks.
When such a request comes to the jobtracker, it will try to look for an
unassigned task which needs data that is close to the tasktracker and will
assign it.

Thanks
Hemanth

On Tue, Sep 11, 2012 at 6:31 PM, Hiroyuki Yamada mogwa...@gmail.com wrote:

 Hi,

 I want to make sure my understanding about task assignment in hadoop
 is correct or not.

 When scanning a file with multiple tasktrackers,
 I am wondering how a task is assigned to each tasktracker .
 Is it based on the block sequence or data locality ?

 Let me explain my question by example.
 There is a file which composed of 10 blocks (block1 to block10), and
 block1 is the beginning of the file and block10 is the tail of the file.
 When scanning the file with 3 tasktrackers (tt1 to tt3),
 I am wondering if
 task assignment is based on the block sequence like
 first tt1 takes block1 and tt2 takes block2 and tt3 takes block3 and
 tt1 takes block4 and so on
 or
 task assignment is based on the task(data) locality like
 first tt1 takes block2(because it's located in the local) and tt2
 takes block1 (because it's located in the local) and
 tt3 takes block 4(because it's located in the local) and so on.

 As far as I experienced and the definitive guide book says,
 I think that the first case is the task assignment strategy.
 (and if there are many replicas, closest one is picked.)

 Is this right ?

 If this is right, is there any way to do like the second case
 with the current implementation ?

 Thanks,

 Hiroyuki



RE: hadoop trunk build failure - yarn, surefire related?

2012-09-11 Thread Tony Burton
Another mvn test caused the build to fail slightly further down the road. As 
my Jira issue is documentation-only, I've submitted the patch anyway.

Is this multiple-failure scenario typical for trying to build hadoop from the 
trunk? It's sure putting me off submitting code in future. Is there any way the 
build process could be reconsidered such that there's only one source of 
instruction for how to successfully build the project?

Tony






-Original Message-
From: Tony Burton [mailto:tbur...@sportingindex.com]
Sent: 11 September 2012 12:15
To: user@hadoop.apache.org
Subject: RE: hadoop trunk build failure - yarn, surefire related?

Good suggestions Harsh and Hemanth.

When I was asked to submit a patch for hadoop 1.0.3, I thought it a good 
exercise to work through the build process to become familiar even though the 
patch is documentation-only. Maybe the requests for patches could come with a 
list of suggested reading as well as the link to 
http://wiki.apache.org/hadoop/HowToContribute, and the contributer can distill 
the appropriate important steps from all three. Otherwise, it's hard to tell 
which is the gospel version.

Tony





-Original Message-
From: Harsh J [mailto:ha...@cloudera.com]
Sent: 11 September 2012 11:59
To: user@hadoop.apache.org
Subject: Re: hadoop trunk build failure - yarn, surefire related?

I guess we'll need to clean that guide and divide it in two - For
branch-1 maintenance contributors, and for trunk contributors.

I had another page that serves a slightly different purpose, but may
help you just the same:
http://wiki.apache.org/hadoop/QwertyManiac/BuildingHadoopTrunk

On Tue, Sep 11, 2012 at 4:09 PM, Tony Burton tbur...@sportingindex.com wrote:
 That's good to know - is there a more up to date guide than 
 http://wiki.apache.org/hadoop/HowToContribute which still makes many 
 references to ant builds?




 -Original Message-
 From: Harsh J [mailto:ha...@cloudera.com]
 Sent: 11 September 2012 11:36
 To: user@hadoop.apache.org
 Subject: Re: hadoop trunk build failure - yarn, surefire related?

 Ah there is no longer a need to run ant anymore on trunk. Ignore the
 leftover files in the base MR project directory - those should be
 cleaned up soon. All of the functional pieces of the build right now
 definitely use Maven.

 On Tue, Sep 11, 2012 at 4:01 PM, Tony Burton tbur...@sportingindex.com 
 wrote:
 Ok - thanks Harsh.

 Following Steve's earlier advice I tried the mvn install, which worked fine, 
 then ant test in hadoop-mapreduce-project which failed. I was mid-email to 
 mapreduce-dev@hadoop, now I'll try mvn test in hadoop-mapreduce-project and 
 report back.

 Tony


 -Original Message-
 From: Harsh J [mailto:ha...@cloudera.com]
 Sent: 11 September 2012 11:25
 To: user@hadoop.apache.org
 Subject: Re: hadoop trunk build failure - yarn, surefire related?

 Tony,

 What I do is:

 $ cd hadoop/; mvn install -DskipTests; cd hadoop-mapreduce-project/; mvn test

 This seems to work in running without any missing dependencies at least.

 The user lists are all merged, but the developer lists remain separate
 as that works better.

 On Tue, Sep 11, 2012 at 3:24 PM, Tony Burton tbur...@sportingindex.com 
 wrote:
 Thanks Steve, I'll try the mvn command you suggest. All the snapshots I can
 see came from repository.apache.org though.



 How do I run the MR tests only?



 Thanks for the mapreduce-dev mailing list suggestion, I thought all lists
 had merged into one though - did I get the wrong end of the stick?



 Tony









 From: Steve Loughran [mailto:ste...@hortonworks.com]
 Sent: 11 September 2012 10:45


 To: user@hadoop.apache.org
 Subject: Re: hadoop trunk build failure - yarn, surefire related?



 It's probably some maven thing -in particular Maven's habit of grabbing the
 online nightly snapshots off apache rather than local,



 try  mvn clean install -DskipTests -offline

 to force in all the artifacts, then run the MR tests



 Tony -why not get on the mapreduce-dev mailing list, as this is the place to
 discuss build and test process.

 On 11 September 2012 09:58, Tony Burton tbur...@sportingindex.com wrote:

 Hi,



 I've checked out the hadoop trunk, and I'm running mvn test on the
 codebase as part of following the How To Contribute guide at
 http://wiki.apache.org/hadoop/HowToContribute. The tests are currently
 failing in hadoop-mapreduce-client-jobclient, the failure message is below -
 something to do with org.hadoop.yarn.client and/or the maven surefire
 plugin. Can anyone suggest how to proceed?  Thanks!







 [INFO]
 
 [INFO] Reactor Summary:
 [INFO] [snipped SKIPPED test groups]
 [INFO] hadoop-mapreduce-client-jobclient . FAILURE [13.988s]
 [INFO]
 [INFO]
 
 [INFO] BUILD FAILURE
 [INFO]
 

Re: Error in : hadoop fsck /

2012-09-11 Thread Hemanth Yamijala
Could you please review your configuration to see if you are pointing to
the right namenode address ? (This will be in core-site.xml)
Please paste it here so we can look for clues.

Thanks
hemanth

On Tue, Sep 11, 2012 at 9:25 PM, yogesh dhari yogeshdh...@live.com wrote:

  Hi all,

 I am running hadoop-0.20.2 on single node cluster,

 I run the command

 hadoop fsck /

 it shows error:

 Exception in thread main java.net.UnknownHostException: http
 at
 java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:178)
 at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:391)
 at java.net.Socket.connect(Socket.java:579)
 at java.net.Socket.connect(Socket.java:528)
 at sun.net.NetworkClient.doConnect(NetworkClient.java:180)
 at sun.net.www.http.HttpClient.openServer(HttpClient.java:378)
 at sun.net.www.http.HttpClient.openServer(HttpClient.java:473)
 at sun.net.www.http.HttpClient.init(HttpClient.java:203)
 at sun.net.www.http.HttpClient.New(HttpClient.java:290)
 at sun.net.www.http.HttpClient.New(HttpClient.java:306)
 at
 sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:995)
 at
 sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:931)
 at
 sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:849)
 at
 sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1299)
 at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:123)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
 at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:159)




 Please suggest why it so..  it should show the health status:



Re: Error in : hadoop fsck /

2012-09-11 Thread Arpit Gupta
Yogesh

try this

hadoop fsck -Ddfs.http.address=localhost:50070 /

50070 is the default http port that the namenode runs on. The property 
dfs.http.address should be set in your hdfs-site.xml

--
Arpit Gupta
Hortonworks Inc.
http://hortonworks.com/

On Sep 11, 2012, at 9:03 AM, yogesh dhari yogeshdh...@live.com wrote:

 Hi Hemant,
 
 Its the content of core-site.xml
 
 configuration
 property
  namefs.default.name/name
  valuehdfs://localhost:9000/value
  /property
property
 namehadoop.tmp.dir/name
 value/opt/hadoop-0.20.2/hadoop_temporary_dirr/value
 descriptionA base for other temporary directories./description
 /property
 
 /configuration
 
 Regards
 Yogesh Kumar
 
 Date: Tue, 11 Sep 2012 21:29:36 +0530
 Subject: Re: Error in : hadoop fsck /
 From: yhema...@thoughtworks.com
 To: user@hadoop.apache.org
 
 Could you please review your configuration to see if you are pointing to the 
 right namenode address ? (This will be in core-site.xml)
 Please paste it here so we can look for clues.
 
 Thanks
 hemanth
 
 On Tue, Sep 11, 2012 at 9:25 PM, yogesh dhari yogeshdh...@live.com wrote:
 Hi all,
 
 I am running hadoop-0.20.2 on single node cluster,
 
 I run the command 
 
 hadoop fsck /
 
 it shows error:
 
 Exception in thread main java.net.UnknownHostException: http
 at 
 java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:178)
 at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:391)
 at java.net.Socket.connect(Socket.java:579)
 at java.net.Socket.connect(Socket.java:528)
 at sun.net.NetworkClient.doConnect(NetworkClient.java:180)
 at sun.net.www.http.HttpClient.openServer(HttpClient.java:378)
 at sun.net.www.http.HttpClient.openServer(HttpClient.java:473)
 at sun.net.www.http.HttpClient.init(HttpClient.java:203)
 at sun.net.www.http.HttpClient.New(HttpClient.java:290)
 at sun.net.www.http.HttpClient.New(HttpClient.java:306)
 at 
 sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:995)
 at 
 sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:931)
 at 
 sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:849)
 at 
 sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1299)
 at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:123)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
 at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:159)
 
 
 
 
 Please suggest why it so..  it should show the health status:
 
 



how to specify the root directory of hadoop on slave node?

2012-09-11 Thread Richard Tang
Hi, All
I need to setup a hadoop/hdfs cluster with one namenode on a machine and
two datanodes on two other machines. But after setting datanode machiines
in conf/slaves file, running bin/start-dfs.sh can not start hdfs normally..
I am aware that I have not specify the root directory hadoop is installed
on slave nodes and the OS user account to use hadoop on slave node.
I am asking how to specify where hadoop/hdfs is locally installed on slave
node? Also how to specify the user account to start hdfs there?

Regards,
Richard


security-in-HADOOP

2012-09-11 Thread nisha
How security is maintained in hadoop, is it maintained by giving
folder/file permissions in hadoop
how can i make sure that somebody else dunt write in to my hdfs file system
...


removing datanodes from clustes.

2012-09-11 Thread yogesh dhari

Hello all,

I am not getting the clear way out to remove datanode from the cluster.

please explain me decommissioning steps with example.  
like how to creating exclude files and other steps involved in it.


Thanks  regards
Yogesh Kumar
  

Re: security-in-HADOOP

2012-09-11 Thread Bertrand Dechoux
By reading the documentation, like the following
http://hadoop.apache.org/docs/r1.0.3/hdfs_permissions_guide.html

On Tue, Sep 11, 2012 at 8:14 PM, nisha nishakulkarn...@gmail.com wrote:

 How security is maintained in hadoop, is it maintained by giving
 folder/file permissions in hadoop
 how can i make sure that somebody else dunt write in to my hdfs file
 system ...



Re: How to remove datanode from cluster..

2012-09-11 Thread Bejoy Ks
Hi Yogesh

The detailed steps are available in hadoop wiki on FAQ page

http://wiki.apache.org/hadoop/FAQ#I_want_to_make_a_large_cluster_smaller_by_taking_out_a_bunch_of_nodes_simultaneously._How_can_this_be_done.3F

Regrads
Bejoy KS



On Wed, Sep 12, 2012 at 12:14 AM, yogesh dhari yogeshdh...@live.com wrote:

  Hello all,

 I am not getting the clear way out to remove datanode from the cluster.

 please explain me decommissioning steps with example.
 like how to creating exclude files and other steps involved in it.


 Thanks  regards
 Yogesh Kumar



Re: Question about the task assignment strategy

2012-09-11 Thread Hiroyuki Yamada
I figured out the cause.
HDFS block size is 128MB, but
I specify mapred.min.split.size as 512MB,
and data local I/O processing goes wrong for some reason.
When I remove the mapred.min.split.size configuration,
tasktrackers pick data-local tasks.
Why does it happen ?

It seems like a bug.
Split is a logical container of blocks,
so nothing is wrong logically.

On Wed, Sep 12, 2012 at 1:20 AM, Hiroyuki Yamada mogwa...@gmail.com wrote:
 Hi, thank you for the comment.

 Task assignment takes data locality into account first and not block 
 sequence.

 Does it work like that when replica factor is set to 1 ?

 I just had a experiment to check the behavior.
 There are 14 nodes (node01 to node14) and there are 14 datanodes and
 14 tasktrackers working.
 I first created a data to be processed in each node (say data01 to data14),
 and I put the each data to the hdfs from each node (at /data
 directory. /data/data01, ... /data/data14).
 Replica factor is set to 1, so according to the default block placement 
 policy,
 each data is stored at local node. (data01 is stored at node01, data02
 is stored at node02 and so on)
 In that setting, I launched a job that processes the /data and
 what happened is that tasktrackers read from data01 to data14 sequentially,
 which means tasktrackers first take all data from node01 and then
 node02 and then node03 and so on.

 If tasktracker takes data locality into account as you say,
 each tasktracker should take the local task(data). (tasktrackers at
 node02 should take data02 blocks if there is any)
 But, it didn't work like that.
 What this is happening ?

 Is there any documents about this ?
 What part of the source code is doing that ?

 Regards,
 Hiroyuki

 On Tue, Sep 11, 2012 at 11:27 PM, Hemanth Yamijala
 yhema...@thoughtworks.com wrote:
 Hi,

 Task assignment takes data locality into account first and not block
 sequence. In hadoop, tasktrackers ask the jobtracker to be assigned tasks.
 When such a request comes to the jobtracker, it will try to look for an
 unassigned task which needs data that is close to the tasktracker and will
 assign it.

 Thanks
 Hemanth


 On Tue, Sep 11, 2012 at 6:31 PM, Hiroyuki Yamada mogwa...@gmail.com wrote:

 Hi,

 I want to make sure my understanding about task assignment in hadoop
 is correct or not.

 When scanning a file with multiple tasktrackers,
 I am wondering how a task is assigned to each tasktracker .
 Is it based on the block sequence or data locality ?

 Let me explain my question by example.
 There is a file which composed of 10 blocks (block1 to block10), and
 block1 is the beginning of the file and block10 is the tail of the file.
 When scanning the file with 3 tasktrackers (tt1 to tt3),
 I am wondering if
 task assignment is based on the block sequence like
 first tt1 takes block1 and tt2 takes block2 and tt3 takes block3 and
 tt1 takes block4 and so on
 or
 task assignment is based on the task(data) locality like
 first tt1 takes block2(because it's located in the local) and tt2
 takes block1 (because it's located in the local) and
 tt3 takes block 4(because it's located in the local) and so on.

 As far as I experienced and the definitive guide book says,
 I think that the first case is the task assignment strategy.
 (and if there are many replicas, closest one is picked.)

 Is this right ?

 If this is right, is there any way to do like the second case
 with the current implementation ?

 Thanks,

 Hiroyuki




Re: Accessing image files from hadoop to jsp

2012-09-11 Thread Michael Segel
Here's one... 
Write a Java program which can be accessed on the server side to pull the 
picture from HDFS and display it on your JSP. 



On Sep 11, 2012, at 3:48 PM, Visioner Sadak visioner.sa...@gmail.com wrote:

 any hints experts atleast if i m on the right track or we cant use hftp at 
 all coz the browser wont understand it?
 
 On Mon, Sep 10, 2012 at 1:58 PM, Visioner Sadak visioner.sa...@gmail.com 
 wrote:
 or shud i use datanode ip for accessing images using hftp
  
 tdimg src =hftp://localhost:50075/Comb/java1.jpg//td
  
 my datanode is at 50075
 
  
 On Sat, Sep 8, 2012 at 8:30 PM, Visioner Sadak visioner.sa...@gmail.com 
 wrote:
 seems like webhdfs is available on above 1.0. versions,for the time being 
 just to do a POC cant i use hftp in 0.22.0 version
 i have to just show the img on a jsp sumthing like this,any ideas on how to 
 configure hftp,security and all,fully confused coz if the url is accessible 
 as below any WAR can access it right or we will have to do it frm admin 
 configs...
  
 
 
 tdimg src =hftp://localhost:50070/Comb/java1.jpg//td
 
 On Sat, Sep 8, 2012 at 6:47 PM, Harsh J ha...@cloudera.com wrote:
 I'd say yes. Use either WebHDFS (requires client to be able to access
 all HDFS nodes) or HttpFS (acts as a gateway if your client can't
 access all HDFS nodes) via their REST APIs.
 
 On Sat, Sep 8, 2012 at 3:48 PM, Visioner Sadak visioner.sa...@gmail.com 
 wrote:
  is webhdfs a viable approach???
 
 
  On Thu, Sep 6, 2012 at 8:26 PM, Visioner Sadak visioner.sa...@gmail.com
  wrote:
 
  I have uploaded some images to hdfs   hadoop  user/combo/  directory now
  want to show those images in a jsp i have configured tomcat and hadoop
  properly i m able to do uploads any ideas on  how to build the interface to
  show thes images on to my front end jsp...
 
 
 
 
 
 
 
 --
 Harsh J
 
 
 



How to split a sequence file

2012-09-11 Thread Jason Yang
Hi,

I have a sequence file written by SequenceFileOutputFormat with key/value
type of Text, BytesWritable, like below:

Text BytesWritable
-
id_A_01  7F2B3C687F2B3C687F2B3C68
id_A_02  2F2B3C687F2B3C687F2B3C686AB23C68D73C68D7
id_A_03  5F2B3C68D77F2B3C687F2B3A
...
id_B_01  1AB23C68D73C68D76AB23C68D73C68D7
id_B_02  5AB23C68D73C68D76AB68D76A1
id_B_03  F2B23C68D7B23C68D7B23C68D7

If I want all the records with the same key prefix to be processed by a
same mapper, say records with key id_A_XX are processed by a mapper and
records with key id_B_XX are processed by another mapper, what should I do?


Should I implement our own InputFormat inherited from
SequenceFileInputFormat ?

Any help would be appreciated.
-- 
YANG, Lin


Re: How to not output the key

2012-09-11 Thread Manoj Babu
Hi,

You have to specify the reducer key out type as NullWritable.

Cheers!
Manoj.



On Wed, Sep 12, 2012 at 7:43 AM, Nataraj Rashmi - rnatar 
rashmi.nata...@acxiom.com wrote:

  Hello,

 ** **

 I have simple map/reduce program to merge input files into one big output
 files. My question is, is there a way not to output the key from the
 reducer to the output file? I only want the value, not the key for each
 record.

 ** **

 Thanks

 ** **

 ***
 The information contained in this communication is confidential, is
 intended only for the use of the recipient named above, and may be legally
 privileged.

 If the reader of this message is not the intended recipient, you are
 hereby notified that any dissemination, distribution or copying of this
 communication is strictly prohibited.

 If you have received this communication in error, please resend this
 communication to the sender and delete the original message or any copy
 of it from your computer system.

 Thank You.

 



RE: Issue in access static object in MapReduce

2012-09-11 Thread Stuti Awasthi
Thanks Bejoy,
I try to implement and if face any issues will let you know.

Thanks
Stuti

From: Bejoy Ks [mailto:bejoy.had...@gmail.com]
Sent: Tuesday, September 11, 2012 8:39 PM
To: user@hadoop.apache.org
Subject: Re: Issue in access static object in MapReduce

Hi Stuti

You can pass the json object as a configuration property from your main class 
then Initialize this static json object on the configure() method. Every 
instance of map or reduce task will have this configure() method executed once 
before the map()/reduce() function . So all the executions of map()/reduce() on 
each record can have a look up into the json object.

public void configure(JobConf job)

Regards
Bejoy KS
On Tue, Sep 11, 2012 at 8:23 PM, Stuti Awasthi 
stutiawas...@hcl.commailto:stutiawas...@hcl.com wrote:


Hi,

I have a configuration JSON file which is accessed by MR job for every input.So 
, I created a class with a static block, load the JSON file in static Instance 
variable.

So everytime my mapper or reducer wants to access configuration can use this 
Instance variable. But on a single node cluster,when I run this as a jar, my 
mapper is executing fine,

But in reducer static Instance variable is retuning null.



I get this mainly because mapper and reducer runs on separate jvms. How can I 
handle this situation gracefully. There are few ways that I can think of:



1.  Serialize the JSON load object and store it on HDFS and then deserialize 
and use in the code. Problem is obj is not too large so not be useful to store 
on HDFS

2. I load the JSON for every Input line of map reduce call. This is highly 
inefficient and  as Il be loading JSON for so many lines again and again.



How to resolve this issue.. Any Inputs  are welcome.



Note , same code works with eclipse well. :(



Thanks

Stuti



Regards,
Stuti Awasthi
HCL Comnet Systems and Services Ltd



::DISCLAIMER::

The contents of this e-mail and any attachment(s) are confidential and intended 
for the named recipient(s) only.
E-mail transmission is not guaranteed to be secure or error-free as information 
could be intercepted, corrupted,
lost, destroyed, arrive late or incomplete, or may contain viruses in 
transmission. The e mail and its contents
(with or without referred errors) shall therefore not attach any liability on 
the originator or HCL or its affiliates.
Views or opinions, if any, presented in this email are solely those of the 
author and may not necessarily reflect the
views or opinions of HCL or its affiliates. Any form of reproduction, 
dissemination, copying, disclosure, modification,
distribution and / or publication of this message without the prior written 
consent of authorized representative of
HCL is strictly prohibited. If you have received this email in error please 
delete it and notify the sender immediately.
Before opening any email and/or attachments, please check them for viruses and 
other defects.




RE: removing datanodes from clustes.

2012-09-11 Thread Brahma Reddy Battula
Hi Yogesh..

FYI. Please go through following..

http://tech.zhenhua.info/2011/04/how-to-decommission-nodesblacklist.html


http://hadoop-karma.blogspot.in/2011/01/hadoop-cookbook-how-to-decommission.html



From: yogesh dhari [yogeshdh...@live.com]
Sent: Wednesday, September 12, 2012 2:16 AM
To: user@hadoop.apache.org
Subject: removing datanodes from clustes.

Hello all,

I am not getting the clear way out to remove datanode from the cluster.

please explain me decommissioning steps with example.
like how to creating exclude files and other steps involved in it.


Thanks  regards
Yogesh Kumar


Re: Issue in access static object in MapReduce

2012-09-11 Thread Kunaal
Have you looked at Terracotta or any other distributed caching system?

Kunal
-- Sent while mobile --

On Sep 11, 2012, at 9:30 PM, Stuti Awasthi stutiawas...@hcl.com wrote:

 Thanks Bejoy,
 I try to implement and if face any issues will let you know.
  
 Thanks
 Stuti
  
 From: Bejoy Ks [mailto:bejoy.had...@gmail.com] 
 Sent: Tuesday, September 11, 2012 8:39 PM
 To: user@hadoop.apache.org
 Subject: Re: Issue in access static object in MapReduce
  
 Hi Stuti
  
 You can pass the json object as a configuration property from your main class 
 then Initialize this static json object on the configure() method. Every 
 instance of map or reduce task will have this configure() method executed 
 once before the map()/reduce() function . So all the executions of 
 map()/reduce() on each record can have a look up into the json object.
  
 public void configure(JobConf job) 
  
 Regards
 Bejoy KS
 
 On Tue, Sep 11, 2012 at 8:23 PM, Stuti Awasthi stutiawas...@hcl.com wrote:
  
 Hi,
 
 I have a configuration JSON file which is accessed by MR job for every 
 input.So , I created a class with a static block, load the JSON file in 
 static Instance variable.
 
 So everytime my mapper or reducer wants to access configuration can use this 
 Instance variable. But on a single node cluster,when I run this as a jar, my 
 mapper is executing fine,
 
 But in reducer static Instance variable is retuning null.
 
  
 
 I get this mainly because mapper and reducer runs on separate jvms. How can I 
 handle this situation gracefully. There are few ways that I can think of:
 
  
 
 1.  Serialize the JSON load object and store it on HDFS and then deserialize 
 and use in the code. Problem is obj is not too large so not be useful to 
 store on HDFS
 
 2. I load the JSON for every Input line of map reduce call. This is highly 
 inefficient and  as Il be loading JSON for so many lines again and again.
 
  
 
 How to resolve this issue.. Any Inputs  are welcome.
 
  
 
 Note , same code works with eclipse well. :(
 
  
 
 Thanks
 
 Stuti
 
  
 
  
 Regards,
 Stuti Awasthi
 HCL Comnet Systems and Services Ltd
  
 
 
 ::DISCLAIMER::
 
 The contents of this e-mail and any attachment(s) are confidential and 
 intended for the named recipient(s) only.
 E-mail transmission is not guaranteed to be secure or error-free as 
 information could be intercepted, corrupted, 
 lost, destroyed, arrive late or incomplete, or may contain viruses in 
 transmission. The e mail and its contents 
 (with or without referred errors) shall therefore not attach any liability on 
 the originator or HCL or its affiliates. 
 Views or opinions, if any, presented in this email are solely those of the 
 author and may not necessarily reflect the 
 views or opinions of HCL or its affiliates. Any form of reproduction, 
 dissemination, copying, disclosure, modification, 
 distribution and / or publication of this message without the prior written 
 consent of authorized representative of 
 HCL is strictly prohibited. If you have received this email in error please 
 delete it and notify the sender immediately. 
 Before opening any email and/or attachments, please check them for viruses 
 and other defects.
 
  


Re: How to split a sequence file

2012-09-11 Thread Robert Dyer
If the file is pre-sorted, why not just make multiple sequence files -
1 for each split?

Then you don't have to compute InputSplits because the physical files
are already split.

On Tue, Sep 11, 2012 at 11:00 PM, Harsh J ha...@cloudera.com wrote:
 Hey Jason,

 Is the file pre-sorted? You could override the OutputFormat's
 #getSplits method to return InputSplits at identified key boundaries,
 as one solution - this would require reading the file up-front (at
 submit-time) and building the input splits out of it.

 On Wed, Sep 12, 2012 at 8:45 AM, Jason Yang lin.yang.ja...@gmail.com wrote:
 Hi,

 I have a sequence file written by SequenceFileOutputFormat with key/value
 type of Text, BytesWritable, like below:

 Text BytesWritable
 -
 id_A_01  7F2B3C687F2B3C687F2B3C68
 id_A_02  2F2B3C687F2B3C687F2B3C686AB23C68D73C68D7
 id_A_03  5F2B3C68D77F2B3C687F2B3A
 ...
 id_B_01  1AB23C68D73C68D76AB23C68D73C68D7
 id_B_02  5AB23C68D73C68D76AB68D76A1
 id_B_03  F2B23C68D7B23C68D7B23C68D7

 If I want all the records with the same key prefix to be processed by a same
 mapper, say records with key id_A_XX are processed by a mapper and records
 with key id_B_XX are processed by another mapper, what should I do?

 Should I implement our own InputFormat inherited from
 SequenceFileInputFormat ?

 Any help would be appreciated.
 --
 YANG, Lin




 --
 Harsh J


Re: Question about the task assignment strategy

2012-09-11 Thread Hemanth Yamijala
Hi,

I tried a similar experiment as yours but couldn't replicate the issue.

I generated 64 MB files and added them to my DFS - one file from every
machine, with a replication factor of 1,  like you did. My block size was
64MB. I verified the blocks were located on the same machine as where I
added them from.

Then, I launched a wordcount (without the min split size config). As
expected, it created 8 maps, and I could verify that it ran all the tasks
as data local - i.e. every task read off its own datanode. From the launch
times of the tasks, I could roughly feel that this scheduling behaviour was
independent of the order in which the tasks were launched. This behaviour
was retained even with the min split size config.

Could you share the size of input you generated (i.e the size of
data01..data14) ? Also, what job are you running - specifically what is the
input format ?

BTW, this wiki entry:
http://wiki.apache.org/hadoop/HowManyMapsAndReducestalks a little bit
about how the maps are created.

Thanks
Hemanth

On Wed, Sep 12, 2012 at 7:49 AM, Hiroyuki Yamada mogwa...@gmail.com wrote:

 I figured out the cause.
 HDFS block size is 128MB, but
 I specify mapred.min.split.size as 512MB,
 and data local I/O processing goes wrong for some reason.
 When I remove the mapred.min.split.size configuration,
 tasktrackers pick data-local tasks.
 Why does it happen ?

 It seems like a bug.
 Split is a logical container of blocks,
 so nothing is wrong logically.

 On Wed, Sep 12, 2012 at 1:20 AM, Hiroyuki Yamada mogwa...@gmail.com
 wrote:
  Hi, thank you for the comment.
 
  Task assignment takes data locality into account first and not block
 sequence.
 
  Does it work like that when replica factor is set to 1 ?
 
  I just had a experiment to check the behavior.
  There are 14 nodes (node01 to node14) and there are 14 datanodes and
  14 tasktrackers working.
  I first created a data to be processed in each node (say data01 to
 data14),
  and I put the each data to the hdfs from each node (at /data
  directory. /data/data01, ... /data/data14).
  Replica factor is set to 1, so according to the default block placement
 policy,
  each data is stored at local node. (data01 is stored at node01, data02
  is stored at node02 and so on)
  In that setting, I launched a job that processes the /data and
  what happened is that tasktrackers read from data01 to data14
 sequentially,
  which means tasktrackers first take all data from node01 and then
  node02 and then node03 and so on.
 
  If tasktracker takes data locality into account as you say,
  each tasktracker should take the local task(data). (tasktrackers at
  node02 should take data02 blocks if there is any)
  But, it didn't work like that.
  What this is happening ?
 
  Is there any documents about this ?
  What part of the source code is doing that ?
 
  Regards,
  Hiroyuki
 
  On Tue, Sep 11, 2012 at 11:27 PM, Hemanth Yamijala
  yhema...@thoughtworks.com wrote:
  Hi,
 
  Task assignment takes data locality into account first and not block
  sequence. In hadoop, tasktrackers ask the jobtracker to be assigned
 tasks.
  When such a request comes to the jobtracker, it will try to look for an
  unassigned task which needs data that is close to the tasktracker and
 will
  assign it.
 
  Thanks
  Hemanth
 
 
  On Tue, Sep 11, 2012 at 6:31 PM, Hiroyuki Yamada mogwa...@gmail.com
 wrote:
 
  Hi,
 
  I want to make sure my understanding about task assignment in hadoop
  is correct or not.
 
  When scanning a file with multiple tasktrackers,
  I am wondering how a task is assigned to each tasktracker .
  Is it based on the block sequence or data locality ?
 
  Let me explain my question by example.
  There is a file which composed of 10 blocks (block1 to block10), and
  block1 is the beginning of the file and block10 is the tail of the
 file.
  When scanning the file with 3 tasktrackers (tt1 to tt3),
  I am wondering if
  task assignment is based on the block sequence like
  first tt1 takes block1 and tt2 takes block2 and tt3 takes block3 and
  tt1 takes block4 and so on
  or
  task assignment is based on the task(data) locality like
  first tt1 takes block2(because it's located in the local) and tt2
  takes block1 (because it's located in the local) and
  tt3 takes block 4(because it's located in the local) and so on.
 
  As far as I experienced and the definitive guide book says,
  I think that the first case is the task assignment strategy.
  (and if there are many replicas, closest one is picked.)
 
  Is this right ?
 
  If this is right, is there any way to do like the second case
  with the current implementation ?
 
  Thanks,
 
  Hiroyuki