Re: planning a cluster

2014-07-22 Thread Devaraj K
You may need to consider these things while choosing no of nodes for your
cluster.

1. Data storage: how much data you are going to store in the cluster
2. Data processing : what is the processing you are going to do in the
cluster
3. Each node hardware configurations


On Mon, Jul 21, 2014 at 11:50 PM, Adaryl "Bob" Wakefield, MBA <
adaryl.wakefi...@hotmail.com> wrote:

>   What is the rule for determining how many nodes should be in your
> initial cluster?
> B.
>



-- 


Thanks
Devaraj K


Re: yarn container memory setting

2014-07-22 Thread Devaraj K
I assume you meant mapreduce.map.java.opts = -Xmx(0.8 *
mapreduce.map.memory.mb)M.



Here trying allocate this much memory for heap size. And we need to leave
the remaining memory for launching the Child container java process, native
memory, etc.


On Tue, Jul 22, 2014 at 2:14 AM, Chen Song  wrote:

> I read a bit on documentation on yarn memory tuning and found that
>
> It is suggested to set mapreduce.map.java.opts = 0.8 *
> mapreduce.map.memory.mb.
>
> I am wondering why is 0.8, but not 0.9 or higher?
>
> --
> Chen Song
>
>


-- 


Thanks
Devaraj K


Re: XML parsing in Hadoop

2013-11-29 Thread Devaraj K
>
> {
>
>
> Text colvalue=new Text("");
>
>
> Text nodename= new Text("");
>
>
>
>
> nodename = new Text(nodes.item(n).getNodeName());
>
>
> try{colvalue = new
> Text(nodes.item(n).getFirstChild().getNodeValue());}catch(Exception e){}
>
>
> if(colvalue.toString().equalsIgnoreCase(null)){colvalue=new Text("");}
>
>
> context.write(nodename, colvalue);
>
> }
>
>
>
>
>
> } catch
> (ParserConfigurationException e) {
>
> // TODO
> Auto-generated catch block
>
>
> e.printStackTrace();
>
> } catch
> (SAXException e) {
>
> // TODO
> Auto-generated catch block
>
>
> e.printStackTrace();
>
>
>
> } catch
> (XPathExpressionException e) {
>
> // TODO
> Auto-generated catch block
>
>
> e.printStackTrace();
>
> }
>
>
>
> }
>
>
>
> }
>
>
>
>
>
>
>
> public static void main(String[] args) throws Exception
>
>
>
> {
>
>
>
> Configuration conf = new Configuration();
>
>
>
> Job job = new Job(conf, "XmlParsing");
>
> job.setJarByClass(ReadXmlMR.class);
>
> job.setOutputKeyClass(Text.class);
>
> job.setOutputValueClass(Text.class);
>
>
>
>
>
> job.setMapperClass(Map.class);
>
>
>
>
>
> job.setInputFormatClass(TextInputFormat.class);
>
>     job.setOutputFormatClass(TextOutputFormat.class);
>
>
>
> FileInputFormat.addInputPath(job, new Path(args[0]));
>
> FileOutputFormat.setOutputPath(job, new Path(args[1]));
>
>
>
>
>
> job.submit();
>
>
>
> job.waitForCompletion(true);
>
>
>
>
>
> }
>
>
>
> }
>
>
>
>
>
>
>
> Regards,
>
> Chhaya Vishwakarma
>
>
>
> --
> The contents of this e-mail and any attachment(s) may contain confidential
> or privileged information for the intended recipient(s). Unintended
> recipients are prohibited from taking action on the basis of information in
> this e-mail and using or disseminating the information, and must notify the
> sender and delete it from their system. L&T Infotech will not accept
> responsibility or liability for the accuracy or completeness of, or the
> presence of any virus or disabling code in this e-mail"
>



-- 


Thanks
Devaraj K


Re: default capacity scheduler only one job in running status

2013-11-26 Thread Devaraj K
Could you check the below configuration in capacity-scheduler.xml, whether
is it causing to run only one AM.


yarn.scheduler.capacity.maximum-am-resource-percent
0.1

  Maximum percent of resources in the cluster which can be used to run
  application masters i.e. controls number of concurrent running
  applications.

  


On Tue, Nov 26, 2013 at 2:50 PM, ch huang  wrote:

> hi,maillist:
> i set the following option in yarn-site.xml ,let yarn
> framework to use capacity scheduler,but i submit three job,only one job in
> running status,other two stay in accepted status,why ,the default queue
> only 50% capacity used,i do not know why?
>
> 
> yarn.resourcemanager.scheduler.class
>
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
> 
>



-- 


Thanks
Devaraj K


RE: why i can not track the job which i submitted in yarn?

2013-09-11 Thread Devaraj k
Your Job is running in local mode, that's why you don't see in the RM UI/Job 
History.

Can you change 'mapreduce.framework.name' configuration value to 'yarn', it 
will show in RM UI.

Thanks
Devaraj k

From: ch huang [mailto:justlo...@gmail.com]
Sent: 11 September 2013 15:08
To: user@hadoop.apache.org
Subject: why i can not track the job which i submitted in yarn?

hi,all:
 i do now know why i can not track my job which submitted to yarn ?

# hadoop jar 
/usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.0.0-cdh4.3.0.jar pi 20 10
Number of Maps  = 20
Samples per Map = 10
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Wrote input for Map #4
Wrote input for Map #5
Wrote input for Map #6
Wrote input for Map #7
Wrote input for Map #8
Wrote input for Map #9
Wrote input for Map #10
Wrote input for Map #11
Wrote input for Map #12
Wrote input for Map #13
Wrote input for Map #14
Wrote input for Map #15
Wrote input for Map #16
Wrote input for Map #17
Wrote input for Map #18
Wrote input for Map #19
Starting Job
13/09/11 17:32:02 WARN conf.Configuration: session.id<http://session.id> is 
deprecated. Instead, use dfs.metrics.session-id
13/09/11 17:32:02 INFO jvm.JvmMetrics: Initializing JVM Metrics with 
processName=JobTracker, sessionId=
13/09/11 17:32:02 WARN conf.Configuration: 
slave.host.name<http://slave.host.name> is deprecated. Instead, use 
dfs.datanode.hostname
13/09/11 17:32:02 WARN mapred.JobClient: Use GenericOptionsParser for parsing 
the arguments. Applications should implement Tool for the same.
13/09/11 17:32:02 INFO mapred.FileInputFormat: Total input paths to process : 20
13/09/11 17:32:03 INFO mapred.LocalJobRunner: OutputCommitter set in config null
13/09/11 17:32:03 INFO mapred.JobClient: Running job: job_local854997782_0001
13/09/11 17:32:03 INFO mapred.LocalJobRunner: OutputCommitter is 
org.apache.hadoop.mapred.FileOutputCommitter
13/09/11 17:32:03 INFO mapred.LocalJobRunner: Waiting for map tasks
13/09/11 17:32:03 INFO mapred.LocalJobRunner: Starting task: 
attempt_local854997782_0001_m_00_0
13/09/11 17:32:03 WARN mapreduce.Counters: Group 
org.apache.hadoop.mapred.Task$Counter is deprecated. Use 
org.apache.hadoop.mapreduce.TaskCounter instead
13/09/11 17:32:03 INFO util.ProcessTree: setsid exited with exit code 0
13/09/11 17:32:03 INFO mapred.Task:  Using ResourceCalculatorPlugin : 
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@7f342545<mailto:org.apache.hadoop.util.LinuxResourceCalculatorPlugin@7f342545>
13/09/11 17:32:03 INFO mapred.MapTask: Processing split: 
hdfs://CH22:9000/user/root/PiEstimator_TMP_3_141592654/in/part0:0+118
13/09/11 17:32:03 WARN mapreduce.Counters: Counter name MAP_INPUT_BYTES is 
deprecated. Use FileInputFormatCounters as group name and  BYTES_READ as 
counter name instead
13/09/11 17:32:03 INFO mapred.MapTask: numReduceTasks: 1
13/09/11 17:32:03 INFO mapred.MapTask: Map output collector class = 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer
13/09/11 17:32:03 INFO mapred.MapTask: io.sort.mb = 100
13/09/11 17:32:03 INFO mapred.MapTask: data buffer = 79691776/99614720
13/09/11 17:32:03 INFO mapred.MapTask: record buffer = 262144/327680
13/09/11 17:32:03 INFO mapred.MapTask: Starting flush of map output
13/09/11 17:32:03 INFO mapred.MapTask: Finished spill 0
13/09/11 17:32:03 INFO mapred.Task: Task:attempt_local854997782_0001_m_00_0 
is done. And is in the process of commiting
13/09/11 17:32:03 INFO mapred.LocalJobRunner: 
hdfs://CH22:9000/user/root/PiEstimator_TMP_3_141592654/in/part0:0+118
13/09/11 17:32:03 INFO mapred.Task: Task 
'attempt_local854997782_0001_m_00_0' done.
13/09/11 17:32:03 INFO mapred.LocalJobRunner: Finishing task: 
attempt_local854997782_0001_m_00_0
13/09/11 17:32:03 INFO mapred.LocalJobRunner: Starting task: 
attempt_local854997782_0001_m_01_0
13/09/11 17:32:03 WARN mapreduce.Counters: Group 
org.apache.hadoop.mapred.Task$Counter is deprecated. Use 
org.apache.hadoop.mapreduce.TaskCounter instead
13/09/11 17:32:03 INFO mapred.Task:  Using ResourceCalculatorPlugin : 
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@18f63055<mailto:org.apache.hadoop.util.LinuxResourceCalculatorPlugin@18f63055>
13/09/11 17:32:03 INFO mapred.MapTask: Processing split: 
hdfs://CH22:9000/user/root/PiEstimator_TMP_3_141592654/in/part1:0+118
13/09/11 17:32:04 WARN mapreduce.Counters: Counter name MAP_INPUT_BYTES is 
deprecated. Use FileInputFormatCounters as group name and  BYTES_READ as 
counter name instead
13/09/11 17:32:04 INFO mapred.MapTask: numReduceTasks: 1
13/09/11 17:32:04 INFO mapred.MapTask: Map output collector class = 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer
13/09/11 17:32:04 INFO mapred.MapTask: io.sort.mb = 100
13/09/11 17:32:04 INFO mapred.MapTask: data buffer = 79691776/99614720
13/09/11 17:32:04 INFO mapred.MapTask: record buffer = 262144/327680
13/09/11 17:32:04 INFO mapred.MapTask: St

RE: help!!!,what is happened with my project?

2013-09-11 Thread Devaraj k
It seems filesystem is closed in the Task. Do you see any error in 
Datanode/Namenode(HDFS) log files?

Thanks
Devaraj k

From: heyamin [mailto:heya...@jiandan100.cn]
Sent: 11 September 2013 14:28
To: user@hadoop.apache.org
Cc: user-unsubscr...@hadoop.apache.org
Subject: help!!!,what is happened with my project?


Hi:

Today when I run a task,I get some warnings,what is happened?



2013-09-11 16:45:17,486 INFO org.Apache.adopt.util.NativeCodeLoader: Loaded the 
native-hadoop library

2013-09-11 16:45:18,680 INFO org.apache.hadoop.util.ProcessTree: setsid exited 
with exit code 0

2013-09-11 16:45:18,708 INFO org.apache.hadoop.mapred.Task:  Using 
ResourceCalculatorPlugin : 
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@2ddc7394<mailto:org.apache.hadoop.util.LinuxResourceCalculatorPlugin@2ddc7394>

2013-09-11 16:45:18,998 INFO org.apache.hadoop.mapred.MapTask: Processing 
split: hdfs://192.168.1.240:9000/user/hadoop/input/20130815-log.log:0+118673837

2013-09-11 16:45:19,027 INFO org.apache.hadoop.mapred.MapTask: io.sort.mb = 40

2013-09-11 16:45:19,113 INFO org.apache.hadoop.mapred.MapTask: data buffer = 
31876710/39845888

2013-09-11 16:45:19,113 INFO org.apache.hadoop.mapred.MapTask: record buffer = 
104857/131072

2013-09-11 16:45:19,136 WARN org.apache.hadoop.io.compress.snappy.LoadSnappy: 
Snappy native library not loaded

2013-09-11 16:45:23,374 INFO org.apache.hadoop.mapred.MapTask: Spilling map 
output: record full = true

2013-09-11 16:45:23,374 INFO org.apache.hadoop.mapred.MapTask: bufstart = 0; 
bufend = 3082708; bufvoid = 39845888

2013-09-11 16:45:23,374 INFO org.apache.hadoop.mapred.MapTask: kvstart = 0; 
kvend = 104857; length = 131072

2013-09-11 16:45:23,661 INFO org.apache.hadoop.mapred.MapTask: Ignoring 
exception during close for 
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader@36781b93<mailto:org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader@36781b93>

java.io.IOException: Filesystem closed

  at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:323)

  at org.apache.hadoop.hdfs.DFSClient.access$1200(DFSClient.java:78)

  at 
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.close(DFSClient.java:2326)

  at java.io.FilterInputStream.close(FilterInputStream.java:181)

  at org.apache.hadoop.util.LineReader.close(LineReader.java:145)

  at 
org.apache.hadoop.mapreduce.lib.input.LineRecordReader.close(LineRecordReader.java:187)

  at 
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.close(MapTask.java:496)

  at org.apache.hadoop.mapred.MapTask.closeQuietly(MapTask.java:1776)

  at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:778)

  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)

  at org.apache.hadoop.mapred.Child$4.run(Child.java:255)

  at java.security.AccessController.doPrivileged(Native Method)

  at javax.security.auth.Subject.doAs(Subject.java:415)

  at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)

  at org.apache.hadoop.mapred.Child.main(Child.java:249)

2013-09-11 16:45:23,662 INFO org.apache.hadoop.mapred.MapTask: Starting flush 
of map output


RE: which web ui can track yarn job status?

2013-09-11 Thread Devaraj k
You can find the Application Master/Job History link for each application in RM 
web UI. The default port for RM Web UI is 8088. From the Application Master/Job 
History UI, you can find the tasks status/progress.

Thanks
Devaraj k

From: ch huang [mailto:justlo...@gmail.com]
Sent: 11 September 2013 13:04
To: user@hadoop.apache.org
Subject: which web ui can track yarn job status?

hi,all:
  i use yarn so 50030 is not available for job status,thanks


RE: Question related to resource allocation in Yarn!

2013-09-05 Thread Devaraj k
Hi Rahul,

Could you tell me, what is the version you are using?

· If you want a container, you need to issue 3 resource requests 
(1-node local, 1-rack local and 1-Any(*) ). If you are using 
2.1.0-beta<https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&mode=hide&sorter/order=ASC&sorter/field=priority&pid=12313722&customfield_12310320=12324029>
 or later versions, you can set the Relax Locality flag to false for getting 
only on the specified host.
Can you also share the code how you are requesting for containers…so that we 
can help you better..

Thanks
Devaraj k

From: Rahul Bhattacharjee [mailto:rahul.rec@gmail.com]
Sent: 06 September 2013 09:43
To: user@hadoop.apache.org
Subject: Re: Question related to resource allocation in Yarn!

I could progress a bit on this.

I was not setting responseId while asking for containers.
Still I have one question as why I am only been allocated two containers 
whereas node manager can run more containers.

Response while registering the application master -
AM registration response minimumCapability {, memory: 1024, virtual_cores: 1, 
}, maximumCapability {, memory: 8192, virtual_cores: 32, },
Thanks,
Rahul

On Thu, Sep 5, 2013 at 8:33 PM, Rahul Bhattacharjee 
mailto:rahul.rec@gmail.com>> wrote:
Hi,
I am trying to make a small poc on top of yarn.
Within the launched application master , I am trying to request for 50 
containers and launch  a same task on those allocated containers.
My config : AM registration response minimumCapability {, memory: 1024, 
virtual_cores: 1, }, maximumCapability {, memory: 8192, virtual_cores: 32, },
1) I am asking for 1G mem and 1 core container to the RM. Ideally the RM should 
return me 6 - 7 containers , but the response always returns with only 2 
containers.
Why is that ?
2) So , when in the first ask 2 containers are returned , then I again required 
the RM for 50 - 2 = 48 containers. I keep getting 0 containers , even if the 
previously started containers have finished.
Why is that ?
Any link explaining the allocate request of RM would be very helpful.

Thanks,
Rahul



RE: unsubscribe

2013-09-03 Thread Devaraj k
Please send a mail to 
user-unsubscr...@hadoop.apache.org<mailto:user-unsubscr...@hadoop.apache.org> 
for unsubscribe.

Thanks
Devaraj k

From: berty...@gmail.com [mailto:berty...@gmail.com] On Behalf Of Bert Yuan
Sent: 04 September 2013 07:43
To: user@hadoop.apache.org
Subject: unsubscribe



On Wed, Sep 4, 2013 at 12:31 AM, Mounir E. Bsaibes 
mailto:m...@linux.vnet.ibm.com>> wrote:




RE: Hadoop Yarn - samples

2013-08-29 Thread Devaraj k
Perhaps you can try writing the same yarn application using these steps.

http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/WritingYarnApplications.html

Thanks
Devaraj k

From: Punnoose, Roshan [mailto:rashan.punnr...@merck.com]
Sent: 29 August 2013 19:43
To: user@hadoop.apache.org
Subject: Re: Hadoop Yarn - samples

Is there an example of running a sample yarn application that will only allow 
one container to start per host?

Punnoose, Roshan
rashan.punnr...@merck.com<mailto:rashan.punnr...@merck.com>



On Aug 29, 2013, at 10:08 AM, Arun C Murthy 
mailto:a...@hortonworks.com>> wrote:


Take a look at the dist-shell example in 
http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/

I recently wrote up another simplified version of it for illustration purposes 
here: https://github.com/hortonworks/simple-yarn-app

Arun

On Aug 28, 2013, at 4:47 AM, Manickam P 
mailto:manicka...@outlook.com>> wrote:


Hi,

I have just installed Hadoop 2.0.5 alpha version.
I want to analyse how the Yarn resource manager and node mangers works.
I executed the map reduce examples but i want to execute the samples in Yarn. 
Searching for that but unable to find any.  Please help me.



Thanks,
Manickam P

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/


CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader of 
this message is not the intended recipient, you are hereby notified that any 
printing, copying, dissemination, distribution, disclosure or forwarding of 
this communication is strictly prohibited. If you have received this 
communication in error, please contact the sender immediately and delete it 
from your system. Thank You.


Notice:  This e-mail message, together with any attachments, contains
information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station,
New Jersey, USA 08889), and/or its affiliates Direct contact information
for affiliates is available at
http://www.merck.com/contact/contacts.html) that may be confidential,
proprietary copyrighted and/or legally privileged. It is intended solely
for the use of the individual or entity named on this message. If you are
not the intended recipient, and have received this message in error,
please notify us immediately by reply e-mail and then delete it from
your system.


RE: HBase client with security

2013-08-29 Thread Devaraj k
Please ask this question in u...@hbase.apache.org, you would get better 
response there. 

Thanks
Devaraj k


-Original Message-
From: Lanati, Matteo [mailto:matteo.lan...@lrz.de] 
Sent: 29 August 2013 14:03
To: 
Subject: HBase client with security

Hi all,

I set up Hadoop (1.2.0), Zookeeper (3.4.5) and HBase (0.94.8-security) with 
security.
HBase works if I launch the shell from the node running the master, but I'd 
like to use it from an external machine.
I prepared one, copying the Hadoop and HBase installation folders and adapting 
the path (indeed I can use the same client to run MR jobs and interact with 
HDFS).
Regarding HBase client configuration:

- hbase-site.xml specifies

  
hbase.security.authentication
kerberos
  
  
hbase.rpc.engine
org.apache.hadoop.hbase.ipc.SecureRpcEngine
  
  
hbase.zookeeper.quorum
master.hadoop.local,host49.hadoop.local
  

where the zookeeper hosts are reachable and can be solved via DNS. I had to 
specify them otherwise the shell complains about 
"org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
= ConnectionLoss for /hbase/hbaseid"

- I have a keytab for the principal I want to use (), correctly addressed by the file 
hbase/conf/zk-jaas.conf. In hbase-env.sh, the variable HBASE_OPTS points to 
zk-jaas.conf.

Nonetheless, when I issue a command from a HBase shell on the client machine, I 
got an error in the HBase master log

2013-08-29 10:11:30,890 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server 
listener on 6: readAndProcess threw exception 
org.apache.hadoop.security.AccessControlException: Authentication is required. 
Count of bytes read: 0
org.apache.hadoop.security.AccessControlException: Authentication is required
at 
org.apache.hadoop.hbase.ipc.SecureServer$SecureConnection.readAndProcess(SecureServer.java:435)
at 
org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:748)
at 
org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.doRunLoop(HBaseServer.java:539)
at 
org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:514)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)

It looks like there's a mismatch between the client and the master regarding 
the authentication mechanism. Note that from the same client machine I can 
launch and use a Zookeeper shell.
What am I missing in the client configuration? Does /etc/krb5.conf play any 
role into this?
Thanks,

Matteo


Matteo Lanati
Distributed Resources Group
Leibniz-Rechenzentrum (LRZ)
Boltzmannstrasse 1
85748   Garching b. München (Germany)
Phone: +49 89 35831 8724



RE: why mapred job can not catch the current running job?

2013-08-19 Thread Devaraj k
Could you check whether the Job is getting submitted to Job Tracker or is it 
running in local mode. You can verify the same by seeing the Job Id.

Thanks
Devaraj k

From: ch huang [mailto:justlo...@gmail.com]
Sent: 19 August 2013 14:00
To: user@hadoop.apache.org
Subject: why mapred job can not catch the current running job?

hi,all
i use TestDFSIO lauch a job,and use mapred job to observer running job status 
,but it seems no any job catched,why?


# mapred job -list
13/08/19 16:26:05 WARN conf.Configuration: session.id<http://session.id> is 
deprecated. Instead, use dfs.metrics.session-id
13/08/19 16:26:05 INFO jvm.JvmMetrics: Initializing JVM Metrics with 
processName=JobTracker, sessionId=
13/08/19 16:26:05 WARN conf.Configuration: 
slave.host.name<http://slave.host.name> is deprecated. Instead, use 
dfs.datanode.hostname
0 jobs currently running
JobId   State   StartTime   UserNamePrioritySchedulingInfo


RE: Things to keep in mind when writing to a db

2013-08-19 Thread Devaraj k
If you want to read/write through mapred Job, You can refer 
org.apache.hadoop.mapred.lib.db.DBInputFormat & 
org.apache.hadoop.mapred.lib.db.DBOutputFormat for reading and writing 
from/into database.

http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapreduce/lib/db/DBInputFormat.html
http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapreduce/lib/db/DBOutputFormat.html

Thanks
Devaraj k

From: jamal sasha [mailto:jamalsha...@gmail.com]
Sent: 17 August 2013 00:47
To: user@hadoop.apache.org
Subject: Things to keep in mind when writing to a db

Hi,
  I am wondering if there is any tutorial to see.
What are the challenges for reading and/or writing to/from database.

Is there a common flavor across all the database.
For example, the dbs start a server on some host : port
Establish connection to that host:port
It can be across proxy?

Which is to say, if I dont want to use Squoop and want to learn a very basic 
fragile way to write (and read.. but  mostly write) to  sql table, what are the 
steps.?
Or how should I approach this?
Thanks


RE: specify Mapred tasks and slots

2013-08-07 Thread Devaraj k
One task can use only one slot, It cannot use more than one slot. If the task 
is Map task then it will use one map slot and if the task is reduce task the it 
will use one reduce slot from the configured ones.

Thanks
Devaraj k

From: Azuryy Yu [mailto:azury...@gmail.com]
Sent: 08 August 2013 08:27
To: user@hadoop.apache.org
Subject: Re: specify Mapred tasks and slots

My question is can I specify how many slots to be used for each M/R task?

On Thu, Aug 8, 2013 at 10:29 AM, Shekhar Sharma 
mailto:shekhar2...@gmail.com>> wrote:
Slots are decided upon the configuration of machines, RAM etc...

Regards,
Som Shekhar Sharma
+91-8197243810

On Thu, Aug 8, 2013 at 7:19 AM, Azuryy Yu 
mailto:azury...@gmail.com>> wrote:
Hi Dears,

Can I specify how many slots to use for reduce?

I know we can specify reduces tasks, but is there one task occupy one slot?

it it possible that one tak occupy more than one slot in Hadoop-1.1.2.

Thanks.




RE: Hadoop datanode becomes down

2013-08-06 Thread Devaraj k
Can you find out the reason for going Data node down from Data Node log? Do you 
get any exception in the client when you try to put the file in HDFS.

Thanks
Devaraj k

From: Manickam P [mailto:manicka...@outlook.com]
Sent: 06 August 2013 15:07
To: user@hadoop.apache.org
Subject: Hadoop datanode becomes down

Hi,

I have a n number of file each of around 25GB. I have a cluster set up with 6 
machines in data node and one master node. When i move this file from my local 
to HDFS location sometimes data node becomes down. Any specific reason for this 
behavior? Or do i need to follow any other way to move this file from local to 
HDFS?

Please help me to resolve this.



Thanks,
Manickam P


RE: incorrect staging area path in 2.0.5

2013-08-01 Thread Devaraj k
Hi Pierre,

As per the below information, we see Job is running in local mode and trying to 
use the local file system for staging dir. Could you please configure 
'mapreduce.framework.name'  & 'fs.defaultFS' and check.

http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/ClusterSetup.html

Thanks
Devaraj k

From: Pierre-Francois Laquerre [mailto:pierre.franc...@nec-labs.com]
Sent: 01 August 2013 22:57
To: user@hadoop.apache.org
Subject: incorrect staging area path in 2.0.5

I recently updated from 1.0.4 to 2.0.5. Since then, streaming jobs have been 
failing to launch due to what seems like an incorrect staging path:

# /opt/hadoop2/bin/hadoop jar 
/opt/hadoop2/share/hadoop/tools/lib/hadoop-streaming-2.0.5-alpha.jar -input foo 
-output bar -mapper baz -reducer foobar
13/08/01 10:43:50 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
13/08/01 10:43:50 WARN conf.Configuration: session.id is deprecated. Instead, 
use dfs.metrics.session-id
13/08/01 10:43:50 INFO jvm.JvmMetrics: Initializing JVM Metrics with 
processName=JobTracker, sessionId=
13/08/01 10:43:50 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with 
processName=JobTracker, sessionId= - already initialized
13/08/01 10:43:50 INFO mapreduce.JobSubmitter: Cleaning up the staging area 
file:/user/myuser1544460269/.staging/job_local1544460269_0001
13/08/01 10:43:50 ERROR security.UserGroupInformation: 
PriviledgedActionException as:myuser (auth:SIMPLE) 
cause:org.apache.hadoop.util.Shell$ExitCodeException: chmod: cannot access 
`/user/myuser1544460269/.staging/job_local1544460269_0001': No such file or 
directory
13/08/01 10:43:50 ERROR streaming.StreamJob: Error Launching job : chmod: 
cannot access `/user/myuser1544460269/.staging/job_local1544460269_0001': No 
such file or directory
Streaming Command Failed!

This is for a job launched as "myuser". Given that 
mapreduce.jobtracker.staging.root.dir is set to /user, I would expect the 
staging area to be in /user/myuser/.staging/job_local$jobid. Instead, it is in 
/user/myuser$jobid/.staging/job_local$jobid, which fails since 
/user/myuser$jobid/ doesn't exist. Has the way staging.root.dir is used changed 
in 2.x?

Thank you,

Pierre


RE: java.util.NoSuchElementException

2013-07-31 Thread Devaraj k
If you want to write a mapreduce Job, you need to have basic knowledge on core 
Java.  You can get many resources in the internet for that.

If you face any problems related to Hadoop, you could ask here for help.

Thanks
Devaraj k

From: jamal sasha [mailto:jamalsha...@gmail.com]
Sent: 31 July 2013 23:52
To: user@hadoop.apache.org
Subject: Re: java.util.NoSuchElementException

Hi,
  Thanks for responding.
How do I do that? (very new in java )
There are just two words per line..
One is word, second is integer.
Thanks

On Wed, Jul 31, 2013 at 11:20 AM, Devaraj k 
mailto:devara...@huawei.com>> wrote:
Here seems to be some problem in the mapper logic. You need to have the input 
according to your code or need to update the code to handle the cases like 
having the odd no of words in a line.

Before getting the element second time, need to check whether tokenizer has 
more elements or not. If you have only two words in a line, you can modify the 
code to get these directly instead of iterating multiple times.

Thanks
Devaraj k

From: jamal sasha [mailto:jamalsha...@gmail.com<mailto:jamalsha...@gmail.com>]
Sent: 31 July 2013 23:40
To: user@hadoop.apache.org<mailto:user@hadoop.apache.org>
Subject: java.util.NoSuchElementException

Hi,
  I am getting this error:

13/07/31 09:29:41 INFO mapred.JobClient: Task Id : 
attempt_201307102216_0270_m_02_2, Status : FAILED
java.util.NoSuchElementException
at java.util.StringTokenizer.nextToken(StringTokenizer.java:332)
at java.util.StringTokenizer.nextElement(StringTokenizer.java:390)
at org.mean.Mean$MeanMapper.map(Mean.java:60)
at org.mean.Mean$MeanMapper.map(Mean.java:1)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
at org.apache.hadoop.mapred.Child.main(Child.java:249)



public void map(LongWritable key, Text value , Context context) 
throws IOException, InterruptedException,NoSuchElementException{
initialize(context);
StringTokenizer tokenizer = new 
StringTokenizer(value.toString());
while (tokenizer.hasMoreElements()){
String curWord = 
tokenizer.nextElement().toString();
//The line which causes this 
error.
Integer curValue = 
Integer.parseInt(tokenizer.nextElement().toString());

Integer sum = 
summation.get(curWord);
Integer count = 
counter.get(curWord);

..
 ...

}

close(context);
}


What am i doing wrong?

My data looks like:

//word count
foo 20
bar  21
and so on???
The code works fine if I strip the hadoop part and run it in java?






RE: java.util.NoSuchElementException

2013-07-31 Thread Devaraj k
Here seems to be some problem in the mapper logic. You need to have the input 
according to your code or need to update the code to handle the cases like 
having the odd no of words in a line.

Before getting the element second time, need to check whether tokenizer has 
more elements or not. If you have only two words in a line, you can modify the 
code to get these directly instead of iterating multiple times.

Thanks
Devaraj k

From: jamal sasha [mailto:jamalsha...@gmail.com]
Sent: 31 July 2013 23:40
To: user@hadoop.apache.org
Subject: java.util.NoSuchElementException

Hi,
  I am getting this error:

13/07/31 09:29:41 INFO mapred.JobClient: Task Id : 
attempt_201307102216_0270_m_02_2, Status : FAILED
java.util.NoSuchElementException
at java.util.StringTokenizer.nextToken(StringTokenizer.java:332)
at java.util.StringTokenizer.nextElement(StringTokenizer.java:390)
at org.mean.Mean$MeanMapper.map(Mean.java:60)
at org.mean.Mean$MeanMapper.map(Mean.java:1)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
at org.apache.hadoop.mapred.Child.main(Child.java:249)



public void map(LongWritable key, Text value , Context context) 
throws IOException, InterruptedException,NoSuchElementException{
initialize(context);
StringTokenizer tokenizer = new 
StringTokenizer(value.toString());
while (tokenizer.hasMoreElements()){
String curWord = 
tokenizer.nextElement().toString();
//The line which causes this 
error.
Integer curValue = 
Integer.parseInt(tokenizer.nextElement().toString());

Integer sum = 
summation.get(curWord);
Integer count = 
counter.get(curWord);

..
 ...

}

close(context);
}


What am i doing wrong?

My data looks like:

//word count
foo 20
bar  21
and so on???
The code works fine if I strip the hadoop part and run it in java?





RE: objects as key/values

2013-07-29 Thread Devaraj k
You can write custom key/value classes by implementing 
org.apache.hadoop.io.Writable interface for your Job.

http://hadoop.apache.org/docs/current/api/org/apache/hadoop/io/Writable.html

Thanks
Devaraj k

From: jamal sasha [mailto:jamalsha...@gmail.com]
Sent: 30 July 2013 10:27
To: user@hadoop.apache.org
Subject: objects as key/values

Ok.
  A very basic (stupid) question.
I am trying to compute mean using hadoop.

So my implementation is like this:

public class Mean
 public static class Pair{
  //simple class to create object
}
 public class MeanMapper
   emit(text,pair) //where pair is (local sum, count)

 public class MeanReducer
emit (text, mean)

Unfortunately such approach of creating custom class types are not working
since in job I have to set the output type for mapper/reducer...
How are custom key values pair implemented in hadoop?




RE: Want to Sort the values in one line using map reduce

2013-07-26 Thread Devaraj k
You are almost done to get the desired output. You need to change little in the 
reduce function like this,

public static class ReduceClass extends MapReduceBase implements
  Reducer {
Text v = new Text();

public void reduce(Text key, Iterator values,
OutputCollector output, Reporter reporter)
throws IOException {
  StringBuffer value = new StringBuffer();
  while (values.hasNext()){
value.append(values.next().toString());
value.append(",");
  }
  v.set(value.toString());
  output.collect(key, v);
}
  }
In the above reduce function you can add logical condition to avoid extra ',' 
at end of each value line.

Thanks
Devaraj k

From: manish dunani [mailto:manishd...@gmail.com]
Sent: 27 July 2013 10:02
To: user@hadoop.apache.org
Subject: Want to Sort the values in one line using map reduce

Hi,

I have input file and my data looks like:

date

country

 city

pagePath

visits

20120301

India

Ahmedabad

/

1

20120302

India

Ahmedabad

/gtuadmissionhelpline-team

1

20120302

India

Mumbai

/

1

20120302

India

Mumbai

/merit-calculator

1





 I wrote the map and reduce application to convert it into page_url by city:




package data.ga<http://data.ga>;

import java.io.IOException;
import java.util.Iterator;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter;
import org.apache.hadoop.mapred.TextInputFormat;
import org.apache.hadoop.mapred.TextOutputFormat;


public class pharmecy
{
public static class MapClass extends MapReduceBase implements 
Mapper
{
Text k = new Text();
Text v = new Text();

public void map(LongWritable key,Text 
value,OutputCollectoroutput,Reporter reporter) throws IOException
{
try
{
String[] line = 
value.toString().split(",",5);

String city = String.valueOf(line[2]);
String url = String.valueOf(line[3]);

k.set(city);
v.set(url);

output.collect(k, v);
}
catch(Exception e)
{
System.out.println(e);
}

}
}

public static class ReduceClass extends MapReduceBase implements 
Reducer 
{
Text v = new Text();

public void reduce(Text key,Iterator 
values,OutputCollectoroutput,Reporter reporter) throws IOException
{


while(values.hasNext())

{
String 
val=values.next().toString();

v.set(val);

output.collect(key,v);


}


}


public static void main(String[] args) {
JobClient client = new JobClient();
JobConf conf = new JobConf(data.ga.pharmecy.class);

conf.setMapOutputKeyClass(Text.class);
conf.setMapOutputValueClass(Text.class);
// TODO: specify output types
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(Text.class);

FileInputFormat.setInputPaths(conf, new 
Path("hdfs://localhost:54310/user/manish/gadatainput/pharmecydata.txt"));
FileOutputFormat.setOutputPath(conf, new 
Path("hdfs://localhost:54310/user/manish/gadataoutput11"));

conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);

conf.setMapperClass(MapClass.class);
conf.setReducerClass(ReduceClass.class);

client.setConf(conf);
try {
JobClient.runJob(conf);
} catch (Exception e)

RE: Context Object in Map Reduce

2013-07-26 Thread Devaraj k
If you are using new Mapper API (org.apache.hadoop.mapreduce.Mapper) for your 
Job, then you can get the configuration object from Context like 
"context.getConfiguration() ". If you are using old Mapper API 
(org.apache.hadoop.mapred.Mapper) you can get the JobConf object as an argument 
to configure() method which you implement in Mapper as you mentioned below.

Thanks
Devaraj k

From: Tanniru Govardhan [mailto:govardhan5...@gmail.com]
Sent: 26 July 2013 21:06
To: user@hadoop.apache.org; dunaniman...@gmail.com
Subject: Re: Context Object in Map Reduce

Thanks Manish. you are correct. I figured it out somehow.
This is what i had to change.

In the main class
Configuration config = new Configuration();
config.set("filename", "Syn-100n-8t-2l-2k.vars");
JobConf conf = new JobConf(config,Hadoop_GNTS.class);

In the Map class
public void configure(JobConf job)
  {
   myfilename = job.get("filename");
  }

Thanks


On Fri, Jul 26, 2013 at 10:56 AM, manish dunani 
mailto:manishd...@gmail.com>> wrote:
If u r using OutputCollector,Reporter then no need to use Context..Context 
works same as OutputCollector and Reporter.

That is used to collect mapper's  pair.


On Fri, Jul 26, 2013 at 8:03 PM, manish dunani 
mailto:manishd...@gmail.com>> wrote:
Can u please elaborate what is the code and what is error.??

Then it will be much better to give the answer.

Regards Manish Dunani..

On Fri, Jul 26, 2013 at 7:07 PM, Tanniru Govardhan 
mailto:govardhan5...@gmail.com>> wrote:
Hi everyone,
I am trying to pass a string variable from Driver class to Mapper class.
I came to know that i need to use context variable for this purpose.
But the Mapper method i have used has a signature without context object.
It is not allowing to change the method signature.(reporting syntax error)
Can anyone please suggest me what can I do regarding this?
I am very new to Map Reduce Programming.

Here is the code:

class Map extends MapReduceBase implements Mapper
{
   String myfilename;
  private Text word = new Text();
  private IntWritable var = new IntWritable();

  public void map(LongWritable key, Text value, Context context, 
OutputCollector output, Reporter reporter) //here is the 
error
throws IOException
{
 Configuration conf = context.getConfiguration();
 myfilename = conf.get("filename");
 -- - - -- - -- -
}
 public  void compute() throws Exception
  {
 Configuration config = new Configuration();
  config.set("filename", "Syn-100n-8t-2l-2k.vars");

JobConf conf = new JobConf(Hadoop_GNTS.class);
-- - - - -- - -
}

Thanks




--
MANISH DUNANI
-THANX
+91 9426881954,+91 8460656443
manishd...@gmail.com<mailto:manishd...@gmail.com>



--
MANISH DUNANI
-THANX
+91 9426881954,+91 8460656443
manishd...@gmail.com<mailto:manishd...@gmail.com>



RE: Node manager crashing when running an app requiring 100 containers on hadoop-2.1.0-beta RC0

2013-07-25 Thread Devaraj k
Hi Kishore,

It seems that system doesn't have enough resources to launch a new thread.

Could you check the system is affordable to launch the configured containers 
and try increasing the native memory available in the system by reducing the no 
of running processes in the system.

Thanks
Devaraj k

From: Krishna Kishore Bonagiri [mailto:write2kish...@gmail.com]
Sent: 25 July 2013 16:09
To: user@hadoop.apache.org
Subject: Node manager crashing when running an app requiring 100 containers on 
hadoop-2.1.0-beta RC0

Hi,

  I am running an application against hadoop-2.1.0-beta RC, and my app requires 
117 containers, I have got all the containers allocated, but while starting 
those containers, at around 99th container the node manager has gone down with 
the following kind of error in it's log. Also, I could reproduce this error 
running a "sleep 200; date" command using the Distributed Shell example, in 
which case I got this error at around 66th container.


2013-07-25 06:07:17,743 FATAL 
org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread Thread[process 
reaper,5,main] threw an Error.  Shutting down now...
java.lang.OutOfMemoryError: Failed to create a thread: retVal -1073741830, 
errno 11
at java.lang.Thread.startImpl(Native Method)
at java.lang.Thread.start(Thread.java:887)
at java.lang.ProcessInputStream.(UNIXProcess.java:472)
at java.lang.UNIXProcess$1$1$1.run(UNIXProcess.java:157)
at 
java.security.AccessController.doPrivileged(AccessController.java:202)
at java.lang.UNIXProcess$1$1.run(UNIXProcess.java:137)
2013-07-25 06:07:17,745 INFO org.apache.hadoop.util.ExitUtil: Halt with status 
-1 Message: HaltException

Thanks,
Kishore


RE: Join Operation with Regular Expression

2013-07-23 Thread Devaraj k
You can try writing the mapreduce job for this. In the Job, you can filter the 
records in Mapper based on the where condition regex and then perform the join 
in the Reducer.

Please refer the classes present in hadoop-datajoin module to get an idea how 
to implement the join job.

Thanks
Devaraj k

From: enes yücer [mailto:enes...@gmail.com]
Sent: 23 July 2013 13:49
To: user@hadoop.apache.org
Subject: Join Operation with Regular Expression

Hi,
I have 2 data set one of them contain string text, and other table contain 
string patern (which is searching in text), id.
I have create volatile solution, to create two external hive table and full 
join of them
and after full join,I use regex function in where case. but, it takes too long. 
because hive does not support regex based join conditions.

how do I do this operation in hadoop or have you implement MR job( in hive,pig, 
java) like this? or any advice?

thanks.


RE: Copy data from Mainframe to HDFS

2013-07-23 Thread Devaraj k
Hi Balamurali,

As per my knowledge, there is nothing in the hadoop which does exactly as per 
your requirement.

You can write mapreduce jobs according to your functionality and submit 
hourly/daily/weekly or monthly . And then you can aggregate the results.

If you want some help regarding Hive, you can ask the same in Hive mailing list.



Thanks
Devaraj k

From: Balamurali [mailto:balamurali...@gmail.com]
Sent: 23 July 2013 12:42
To: user
Subject: Re: Copy data from Mainframe to HDFS

Hi,

I configured hadoop-1.0.3, hbase-0.92.1 and hive-0.10.0 .
Created table in HBase.Inserted records.Processing the data using Hive.
I have to show a graph with some points ( 7 - 7 days or 12 for one year).In one 
day records may include 1000 - lacks.I need to show average of these 1000 - 
lacks records.is<http://records.is> there any built in haddop mechanism to 
process these records fast.
Also I need to run a hive query  or job (when we run a hive query actually a 
job is submitting) in every 1 hour.Is there a scheduling mechanism in hadoop to 
handle thsese

Please reply.
Balamurali

On Tue, Jul 23, 2013 at 12:24 PM, Mohammad Tariq 
mailto:donta...@gmail.com>> wrote:
Hello Sandeep,

You don't have to convert the data in order to copy it into the HDFS. But you 
might have to think about the MR processing of these files because of the 
format of these files.

You could probably make use of Sqoop<http://sqoop.apache.org/>.

I also came across DMX-H a few days ago while browsing. I don't know anything 
about the licensing and how good it is. Just thought of sharing it with you. 
You can visit their page<http://www.syncsort.com/en/Data-Integration/Home> to 
see more. They also provide a VM(includes CDH) to get started quickly.

Warm Regards,
Tariq
cloudfront.blogspot.com<http://cloudfront.blogspot.com>

On Tue, Jul 23, 2013 at 11:54 AM, Sandeep Nemuri 
mailto:nhsande...@gmail.com>> wrote:
Hi ,

"How to copy datasets from Mainframe to HDFS directly?  I know that we can NDM 
files to Linux box and then we can use simple put command to copy data to HDFS. 
 But, how to copy data directly from mainframe to HDFS?  I have PS, PDS and 
VSAM datasets to copy to HDFS for analysis using MapReduce.

Also, Do we need to convert data from EBCDIC to ASCII before copy? "

--
--Regards
  Sandeep Nemuri




RE: setting mapred.task.timeout programmatically from client

2013-07-22 Thread Devaraj k
'mapred.task.timeout' is deprecated configuration. You can use 
'mapreduce.task.timeout' property to do the same.

You could set this configuration while submitting the Job using 
org.apache.hadoop.conf.Configuration.setLong(String name, long value) API from 
conf or JobConf.

Thanks
Devaraj k

-Original Message-
From: Eugene Koifman [mailto:ekoif...@hortonworks.com] 
Sent: 23 July 2013 04:24
To: user@hadoop.apache.org
Subject: setting mapred.task.timeout programmatically from client

Hello,
is there a way to set mapred.task.timeout programmatically from client?

Thank you


RE: subsicrbe

2013-07-19 Thread Devaraj k
Hi Pradeep,

Please send mail to subscribe mail ids, after subscription if you have any 
queries you can reach to the corresponding  lists. You can find the subscribe 
mail ids in this page.

   http://hadoop.apache.org/mailing_lists.html


Thanks
Devaraj k

From: Pradeep Singh [mailto:hadoop.guy0...@gmail.com]
Sent: 20 July 2013 09:38
To: Hadoop Common commits mailing list; Hadoop Common issue tracking system; 
Hadoop HDFS issues mailing list; Hadoop MapReduce commits mailing; Hadoop user 
mailing list; general mailing list is for announcements and project management
Subject: subsicrbe

Regards
Pradeep Singh


RE: Incrementally adding to existing output directory

2013-07-17 Thread Devaraj k
It seems, It is not taking the CutomOutputFormat for the Job. You need to set 
the custom output format class using the 
org.apache.hadoop.mapred.JobConf.setOutputFormat(Class 
theClass) API for your Job.

If we don't set OutputFormat for Job, it takes the default as TextOutputFormat 
which internally extends FileOutputFormat, that's why you see in the below 
exception still it is using the FileOutputFormat.


Thanks
Devaraj k

From: Max Lebedev [mailto:ma...@actionx.com]
Sent: 18 July 2013 01:03
To: user@hadoop.apache.org
Subject: Re: Incrementally adding to existing output directory

Hi Devaraj,

Thank you very much for your help. I've created a CustomOutputFormat which is 
almost identical to FileOutputFormat as seen 
here<http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/com.cloudera.hadoop/hadoop-core/0.20.2-320/org/apache/hadoop/mapreduce/lib/output/FileOutputFormat.java>
except I've removed line 125 which throws the FileAlreadyExistsException. 
However, when I try to run my code, I get this error:
Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: 
Output directory outDir already exists
   at 
org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:137)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:887)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at 
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:500)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530)
...
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

In my source code, I've changed "FileOutputFormat.setOutputPath" to 
"CustomOutputFormat.setOutputPath"

Is it the case that FileOutputFormat.checkOutputSpecs is happening somewhere 
else, or have I done something wrong?
I also don't quite understand your suggestion about MultipleOutputs. Would you 
mind elaborating?

Thanks,
Max Lebedev

On Tue, Jul 16, 2013 at 9:42 PM, Devaraj k 
mailto:devara...@huawei.com>> wrote:
Hi Max,

  It can be done by customizing the output format class for your Job according 
to your expectations. You could you refer 
OutputFormat.checkOutputSpecs(JobContext context) method which checks the ouput 
specification. We can override this in your custom OutputFormat. You can also 
see MultipleOutputs class for implementation details how it could be done.

Thanks
Devaraj k

From: Max Lebedev [mailto:ma...@actionx.com<mailto:ma...@actionx.com>]
Sent: 16 July 2013 23:33
To: user@hadoop.apache.org<mailto:user@hadoop.apache.org>
Subject: Incrementally adding to existing output directory

Hi
I'm trying to figure out how to incrementally add to an existing output 
directory using MapReduce.
I cannot specify the exact output path, as data in the input is sorted into 
categories and then written to different directories based in the contents. (in 
the examples below, token= or token=)
As an example:
When using MultipleOutput and provided that outDir does not exist yet, the 
following will work:
hadoop jar myMR.jar --input-path=inputDir/dt=2013-05-03/* --output-path=outDir
The result will be:
outDir/token=/dt=2013-05-03/
outDir/token=/dt=2013-05-03/
However, the following will fail because outDir already exists. Even though I 
am copying new inputs.
hadoop jar myMR.jar  --input-path=inputDir/dt=2013-05-04/* --output-path=outDir
will throw FileAlreadyExistsException
What I would expect is that it adds
outDir/token=/dt=2013-05-04/
outDir/token=/dt=2013-05-04/
Another possibility would be the following hack but it does not seem to be very 
elegant:
hadoop jar myMR.jar --input-path=inputDir/2013-05-04/* --output-path=tempOutDir
then copy from tempOutDir to outDir
Is there a better way to address incrementally adding to an existing hadoop 
output directory?



RE: spawn maps without any input data - hadoop streaming

2013-07-16 Thread Devaraj k
Hi Austin,

Here number of maps  for a Job  depends on the splits return by 
InputFormat.getSplits() API. We can have an input format which decides the 
number of maps(by returning the splits) for a Job according to the need.

If we use FileInputFormat, these number of splits depend on the input data for 
the Job, that's why you see no of mappers is proportional to the Job input size.

http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapreduce/InputFormat.html#getSplits(org.apache.hadoop.mapreduce.JobContext)

Thanks
Devaraj k

From: Austin Chungath [mailto:austi...@gmail.com]
Sent: 16 July 2013 14:40
To: user@hadoop.apache.org
Subject: spawn maps without any input data - hadoop streaming

Hi,

I am trying to generate random data using hadoop streaming & python. It's a map 
only job and I need to run a number of maps. There is no input to the map as 
it's just going to generate random data.

How do I specify the number of maps to run? ( I am confused here because, if I 
am not wrong, the number of maps spawned is related to the input data size )
Any ideas as to how this can be done?

Warm regards,
Austin


RE: Incrementally adding to existing output directory

2013-07-16 Thread Devaraj k
Hi Max,

  It can be done by customizing the output format class for your Job according 
to your expectations. You could you refer 
OutputFormat.checkOutputSpecs(JobContext context) method which checks the ouput 
specification. We can override this in your custom OutputFormat. You can also 
see MultipleOutputs class for implementation details how it could be done.

Thanks
Devaraj k

From: Max Lebedev [mailto:ma...@actionx.com]
Sent: 16 July 2013 23:33
To: user@hadoop.apache.org
Subject: Incrementally adding to existing output directory

Hi
I'm trying to figure out how to incrementally add to an existing output 
directory using MapReduce.
I cannot specify the exact output path, as data in the input is sorted into 
categories and then written to different directories based in the contents. (in 
the examples below, token= or token=)
As an example:
When using MultipleOutput and provided that outDir does not exist yet, the 
following will work:
hadoop jar myMR.jar --input-path=inputDir/dt=2013-05-03/* --output-path=outDir
The result will be:
outDir/token=/dt=2013-05-03/
outDir/token=/dt=2013-05-03/
However, the following will fail because outDir already exists. Even though I 
am copying new inputs.
hadoop jar myMR.jar  --input-path=inputDir/dt=2013-05-04/* --output-path=outDir
will throw FileAlreadyExistsException
What I would expect is that it adds
outDir/token=/dt=2013-05-04/
outDir/token=/dt=2013-05-04/
Another possibility would be the following hack but it does not seem to be very 
elegant:
hadoop jar myMR.jar --input-path=inputDir/2013-05-04/* --output-path=tempOutDir
then copy from tempOutDir to outDir
Is there a better way to address incrementally adding to an existing hadoop 
output directory?


RE: hive task fails when left semi join

2013-07-16 Thread Devaraj k
Hi,
   In the given image, I see there are some failed/killed map& reduce task 
attempts. Could you check why these are failing, you can check further based on 
the fail/kill reason.


Thanks
Devaraj k

From: kira.w...@xiaoi.com [mailto:kira.w...@xiaoi.com]
Sent: 16 July 2013 12:57
To: user@hadoop.apache.org
Subject: hive task fails when left semi join

Hello,

I am trying to filter out some records in a table in hive.
The number of lines in this table is 4billions+,
I make a left semi join between above table and a small table with 1k lines.

However, after 3 hours job running, it turns out a fail status.

My question are as follows,

1. How could I address this problem and final solve it?

2. Is there any other good methods could filter out records with give 
conditions?

The following picture is a snapshot of the failed job.
[cid:image001.jpg@01CE8225.6507BD90]

<>

RE: Policies for placing a reducer

2013-07-16 Thread Devaraj k
Hi,

It doesn’t consider where the map’s ran to schedule the reducers because 
reducers need to contact all the mappers for the map o/p’s.  It schedules 
reducers wherever the slots available.

Thanks
Devaraj k

From: Felix.徐 [mailto:ygnhz...@gmail.com]
Sent: 16 July 2013 09:25
To: user@hadoop.apache.org
Subject: Policies for placing a reducer

Hi all,

What is the policy of choosing a node for a reducer in mapreduce (Hadoop 
v1.2.0)?
For example,
If a cluster has 5 slaves, each slave can serve 2 maps and 2 reduces , there is 
a job who occupies 5 mappers and 3 reducers , how does the jobtracker assign 
reducers to these nodes (choosing free slaves or placing reducers close to 
mappers)?

Thanks.


RE: Map slots and Reduce slots

2013-07-14 Thread Devaraj k
These configurations cannot be changed dynamically. We need to configure these 
values for Tasks Tracker's before starting and cannot be changed after that. If 
we want to change these then TT's need to be restarted. You can configure the 
cluster based on the resources available. You can tune your Job configuration 
according your cluster configurations.

Thanks
Devaraj k

From: Shekhar Sharma [mailto:shekhar2...@gmail.com]
Sent: 15 July 2013 07:32
To: user@hadoop.apache.org
Subject: Re: Map slots and Reduce slots

Sorry for the wrong properties name, i meant the same..
I understand the properties functionality, can i add the slots at run time to a 
particular task trackers depending on the load, because as you suggested we can 
determine the slots depending on the load..and since the load can be dynamic, 
so can i dynamically allocate the task tracker based on information lets say 
depending on the availablity of the resources on the task tracker machine..

Regards,
Som Shekhar Sharma
+91-8197243810

On Mon, Jul 15, 2013 at 7:27 AM, Devaraj k 
mailto:devara...@huawei.com>> wrote:
Hi Shekar,

   I assume you are trying with Hadoop-1. There are no properties with the 
names 'mapred.map.max.tasks' and 'mapred.reduce.max.tasks'.

We have these configuration to control the max no of map/reduce tasks run 
simultaneously.
mapred.tasktracker.map.tasks.maximum - The maximum number of map tasks that 
will be run simultaneously by a task tracker.
mapred.tasktracker.reduce.tasks.maximum - The maximum number of reduce tasks 
that will be run simultaneously by a task tracker.

For ex: If we declare mapred.tasktracker.map.tasks.maximum=3  and 
mapred.tasktracker.reduce.tasks.maximum=4 for a task tracker, means the TT has 
3 map slots and 4 reduce slots.

> Let's say on a machine if i have 8GB RAM and dual core machine, how can i 
> determine that what would be the optimal number of map and reducer slots for 
> this machine
It purely depends on which type of tasks you are going to run and load of the 
task. Normally each task requires one core to execute, no of concurrent tasks 
can be configured based on this. And memory required for the task depends on 
how much data it is going to process.


Thanks
Devaraj k

From: Shekhar Sharma 
[mailto:shekhar2...@gmail.com<mailto:shekhar2...@gmail.com>]
Sent: 14 July 2013 23:15
To: user@hadoop.apache.org<mailto:user@hadoop.apache.org>
Subject: Map slots and Reduce slots

Does the properties mapred.map.max.tasks=3 and mapred.reduce.max.tasks=4 means 
that machine has 3 map slots and 4 reduce slots?

Or is there any way i can determine the number of map slots and reduce slots 
that i can allocate for a machine?

Let's say on a machine if i have 8GB RAM and dual core machine, how can i 
determine that what would be the optimal number of map and reducer slots for 
this machine




Regards,
Som Shekhar Sharma
+91-8197243810



RE: Map slots and Reduce slots

2013-07-14 Thread Devaraj k
Hi Shekar,

   I assume you are trying with Hadoop-1. There are no properties with the 
names 'mapred.map.max.tasks' and 'mapred.reduce.max.tasks'.

We have these configuration to control the max no of map/reduce tasks run 
simultaneously.
mapred.tasktracker.map.tasks.maximum - The maximum number of map tasks that 
will be run simultaneously by a task tracker.
mapred.tasktracker.reduce.tasks.maximum - The maximum number of reduce tasks 
that will be run simultaneously by a task tracker.

For ex: If we declare mapred.tasktracker.map.tasks.maximum=3  and 
mapred.tasktracker.reduce.tasks.maximum=4 for a task tracker, means the TT has 
3 map slots and 4 reduce slots.

> Let's say on a machine if i have 8GB RAM and dual core machine, how can i 
> determine that what would be the optimal number of map and reducer slots for 
> this machine
It purely depends on which type of tasks you are going to run and load of the 
task. Normally each task requires one core to execute, no of concurrent tasks 
can be configured based on this. And memory required for the task depends on 
how much data it is going to process.


Thanks
Devaraj k

From: Shekhar Sharma [mailto:shekhar2...@gmail.com]
Sent: 14 July 2013 23:15
To: user@hadoop.apache.org
Subject: Map slots and Reduce slots

Does the properties mapred.map.max.tasks=3 and mapred.reduce.max.tasks=4 means 
that machine has 3 map slots and 4 reduce slots?

Or is there any way i can determine the number of map slots and reduce slots 
that i can allocate for a machine?

Let's say on a machine if i have 8GB RAM and dual core machine, how can i 
determine that what would be the optimal number of map and reducer slots for 
this machine




Regards,
Som Shekhar Sharma
+91-8197243810


RE: Taktracker in namenode failure

2013-07-12 Thread Devaraj k
I think, there is mismatch of jar’s coming in the classpath for the map tasks 
when it runs in different machines. You can find out this, by giving some 
unique name for your Mapper class, Job Submit class and then submit the Job.

Thanks
Devaraj k

From: Ramya S [mailto:ram...@suntecgroup.com]
Sent: 12 July 2013 15:27
To: user@hadoop.apache.org
Subject: RE: Taktracker in namenode failure

Both the map output value  class configured and the output value  written from 
the mapper is Text class. So there is no mismatch in the value class.

 But when the same MR program is run with 2 tasktrackers(without tasktracker in 
namenode) exception is not occuring.

The problem is only with the tasktracker running in the namenode.



Thanks & Regards

Ramya.S


From: Devaraj k [mailto:devara...@huawei.com]
Sent: Fri 7/12/2013 3:04 PM
To: user@hadoop.apache.org<mailto:user@hadoop.apache.org>
Subject: RE: Taktracker in namenode failure
Could you tell, what is the Map Output Value class you are configuring while 
submitting Job and what is the type of the value writing from the Mapper. If 
both of these mismatches then it will trow the below error.

Thanks
Devaraj k

From: Ramya S [mailto:ram...@suntecgroup.com]
Sent: 12 July 2013 14:46
To: user@hadoop.apache.org<mailto:user@hadoop.apache.org>
Subject: Taktracker in namenode failure

Hi,

Why only tasktracker in namenode faill during  job execution with error.
I have attached the snapshot of error screen with this mail

java.io.IOException: Type mismatch in value from map: expected 
org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.IntWritable

at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1019)

at 
org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:691)

at 
org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)

at WordCount$TokenizerMapper.map(WordCount.java:30)

at WordCount$TokenizerMapper.map(WordCount.java:19)

at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)

at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)

at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)

at org.apache.hadoop.mapred.Child$4.run(Child.java:255)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:416)

at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)

at org.apache.hadoop.mapred.Child.main(Child.java:249)



but  this same task is reassigned to another tasktracker and getting executed. 
why?


Best Regards,
Ramya


RE: Taktracker in namenode failure

2013-07-12 Thread Devaraj k
Could you tell, what is the Map Output Value class you are configuring while 
submitting Job and what is the type of the value writing from the Mapper. If 
both of these mismatches then it will trow the below error.

Thanks
Devaraj k

From: Ramya S [mailto:ram...@suntecgroup.com]
Sent: 12 July 2013 14:46
To: user@hadoop.apache.org
Subject: Taktracker in namenode failure

Hi,

Why only tasktracker in namenode faill during  job execution with error.
I have attached the snapshot of error screen with this mail

java.io.IOException: Type mismatch in value from map: expected 
org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.IntWritable

at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1019)

at 
org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:691)

at 
org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)

at WordCount$TokenizerMapper.map(WordCount.java:30)

at WordCount$TokenizerMapper.map(WordCount.java:19)

at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)

at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)

at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)

at org.apache.hadoop.mapred.Child$4.run(Child.java:255)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:416)

at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)

at org.apache.hadoop.mapred.Child.main(Child.java:249)



but  this same task is reassigned to another tasktracker and getting executed. 
why?


Best Regards,
Ramya


RE: unsubscribe

2013-07-12 Thread Devaraj k
You need to send a mail to user-unsubscr...@hadoop.apache.org for unsubscribe. 

http://hadoop.apache.org/mailing_lists.html#User

Thanks
Devaraj k


-Original Message-
From: Margusja [mailto:mar...@roo.ee] 
Sent: 12 July 2013 14:26
To: user@hadoop.apache.org
Subject: unsubscribe




RE: Failed to run wordcount on YARN

2013-07-12 Thread Devaraj k
Hi Raymond, 

In Hadoop 2.0.5 version, FileInputFormat new API doesn't support 
reading the files recursively in input dir. In supports only having the input 
dir with files. If the input dir has any child dir's then it throws below error.

This has been added in trunk with this JIRA 
https://issues.apache.org/jira/browse/MAPREDUCE-3193.

You can give input dir to the Job which doesn't have nested dir's or you can 
make use of the old FileInputFormat API to read files recursively in the sub 
dir's.

Thanks
Devaraj k

-Original Message-
From: Liu, Raymond [mailto:raymond@intel.com] 
Sent: 12 July 2013 12:57
To: user@hadoop.apache.org
Subject: Failed to run wordcount on YARN

Hi 

I just start to try out hadoop2.0, I use the 2.0.5-alpha package

And follow 

http://hadoop.apache.org/docs/r2.0.5-alpha/hadoop-project-dist/hadoop-common/ClusterSetup.html

to setup a cluster in non-security mode. HDFS works fine with client tools.

While when I run wordcount example, there are errors :

./bin/hadoop jar 
./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.0.5-alpha.jar wordcount 
/tmp /out


13/07/12 15:05:53 INFO mapreduce.Job: Task Id : 
attempt_1373609123233_0004_m_04_0, Status : FAILED
Error: java.io.FileNotFoundException: Path is not a file: /tmp/hadoop-yarn
at 
org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:42)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1317)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1276)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1252)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1225)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:403)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:239)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:40728)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:454)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1014)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1741)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1737)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1478)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1735)

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:525)
at 
org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:90)
at 
org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:57)
at 
org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:986)
at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:974)
at 
org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:157)
at 
org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:124)
at org.apache.hadoop.hdfs.DFSInputStream.(DFSInputStream.java:117)
at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1131)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:244)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:77)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:713)
at 
org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:89)
at 
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:519)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:756)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:158)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1478)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:153)

I check the HDFS and found 

RE: Staging directory ENOTDIR error.

2013-07-11 Thread Devaraj k
Hi Jay,

   Here client is trying to create a staging directory in local file system,  
which actually should create in HDFS.

Could you check whether do you have configured "fs.defaultFS" configuration in 
client with the HDFS.


Thanks
Devaraj k

From: Jay Vyas [mailto:jayunit...@gmail.com]
Sent: 12 July 2013 04:12
To: common-u...@hadoop.apache.org
Subject: Staging directory ENOTDIR error.

Hi , I'm getting an ungoogleable exception, never seen this before.
This is on a hadoop 1.1. cluster... It appears that its permissions related...
Any thoughts as to how this could crop up?
I assume its a bug in my filesystem, but not sure.

13/07/11 18:39:43 ERROR security.UserGroupInformation: 
PriviledgedActionException as:root cause:ENOTDIR: Not a directory
ENOTDIR: Not a directory
at org.apache.hadoop.io.nativeio.NativeIO.chmod(Native Method)
at org.apache.hadoop.fs.FileUtil.execSetPermission(FileUtil.java:699)
at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:654)
at 
org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:509)
at 
org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:344)
at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:189)
at 
org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:116)


--
Jay Vyas
http://jayunit100.blogspot.com


RE: CompositeInputFormat

2013-07-11 Thread Devaraj k
Hi Andrew,

You could make use of hadoop data join classes to perform the join or you can 
refer these classes for better idea to perform join.

http://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-tools/hadoop-datajoin

Thanks
Devaraj k

From: Botelho, Andrew [mailto:andrew.bote...@emc.com]
Sent: 12 July 2013 03:33
To: user@hadoop.apache.org
Subject: RE: CompositeInputFormat

Sorry I should've specified that I need an example of CompositeInputFormat that 
uses the new API.
The example linked below uses old API objects like JobConf.

Any known examples of CompositeInputFormat using the new API?

Thanks in advance,

Andrew

From: Jay Vyas [mailto:jayunit...@gmail.com]
Sent: Thursday, July 11, 2013 5:10 PM
To: common-u...@hadoop.apache.org<mailto:common-u...@hadoop.apache.org>
Subject: Re: CompositeInputFormat

Map Side joins will use the CompositeInputFormat.  They will only really be 
worth doing if one data set is small, and the other is large.
This is a good example : 
http://www.congiu.com/joins-in-hadoop-using-compositeinputformat/
the trick is to google for CompositeInputFormat.compose()  :)

On Thu, Jul 11, 2013 at 5:02 PM, Botelho, Andrew 
mailto:andrew.bote...@emc.com>> wrote:
Hi,

I want to perform a JOIN on two sets of data with Hadoop.  I read that the 
class CompositeInputFormat can be used to perform joins on data, but I can't 
find any examples of how to do it.
Could someone help me out? It would be much appreciated. :)

Thanks in advance,

Andrew



--
Jay Vyas
http://jayunit100.blogspot.com


RE: yarn Failed to bind to: 0.0.0.0/0.0.0.0:8080

2013-07-10 Thread Devaraj k
Hi,

If you are using the release which doesn't have this patch  
https://issues.apache.org/jira/browse/MAPREDUCE-5036, then 8080 port will be 
used by Node Manager shuffle handler service.

You can change this default port '8080' to some other value using the 
configuration "mapreduce.shuffle.port" in all Node Mangers yarn-site.xml file.

Thanks
Devaraj k

From: ch huang [mailto:justlo...@gmail.com]
Sent: 11 July 2013 07:46
To: user@hadoop.apache.org
Subject: yarn Failed to bind to: 0.0.0.0/0.0.0.0:8080

i have 3 NM, on the box of one of NM ,the 8080 PORT has already ocuppied by 
tomcat,so i want to change all NM 8080 port to 8090,but problem is
i do not know 8080 port is control by what option in yarn ,anyone can help??


RE: NoClassDefFoundError: org/apache/hadoop/yarn/service/CompositeService

2013-07-10 Thread Devaraj k
Hi Libo,

MRAppMaster is not able to load the yarn related jar files.

 Is this the classpath using by MRAppMaster or any other process?

"/opt/hadoop/hadoop-2.0.0-cdh4.3.0/conf:/opt/hadoop/hadoop-2.0.0-cdh4.3.0/share/hadoop/common/lib/*:/opt/hadoop/hadoop-2.0.0-cdh4.3.0/share/hadoop/common/*:/contrib/capacity-scheduler/*.jar:/opt/hadoop/hadoop-2.0.0-cdh4.3.0/share/hadoop/hdfs:/opt/hadoop/hadoop-2.0.0-cdh4.3.0/share/hadoop/hdfs/lib/*:/opt/hadoop/hadoop-2.0.0-cdh4.3.0/share/hadoop/hdfs/*:/opt/hadoop/hadoop-2.0.0-cdh4.3.0/share/hadoop/yarn/lib/*:/opt/hadoop/hadoop-2.0.0-cdh4.3.0/share/hadoop/yarn/*:/opt/hadoop/hadoop-2.0.0-cdh4.3.0/share/hadoop/mapreduce2/lib/*:/opt/hadoop/hadoop-2.0.0-cdh4.3.0/share/hadoop/mapreduce/*"

If this is not the MRAppMaster process classpath, can you try to get the 
MRAppMaster java process classpath.


Thanks
Devaraj k
From: Yu, Libo [mailto:libo...@citi.com]
Sent: 11 July 2013 01:30
To: 'user@hadoop.apache.org'
Subject: NoClassDefFoundError: org/apache/hadoop/yarn/service/CompositeService

Hi,

I tried to run the wordcount example with yarn. Here is the command line:
hadoop jar share/hadoop/mapreduce2/hadoop-mapreduce-examples-2.0.0-cdh4.3.0.jar 
wordcount /user/lyu/wordcount/input /user/lyu/wordcount/output

But I got this exception:

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/yar
n/service/CompositeService
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631)
at java.lang.ClassLoader.defineClass(ClassLoader.java:615)
at 
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:283)
at java.net.URLClassLoader.access$000(URLClassLoader.java:58)
at java.net.URLClassLoader$1.run(URLClassLoader.java:197)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
Caused by: java.lang.ClassNotFoundException: 
org.apache.hadoop.yarn.service.CompositeService
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
... 12 more
Could not find the main class: org.apache.hadoop.mapreduce.v2.app.MRAppMaster.  
Program will exit

Here is conent of the CLASSPATH used by java:

/opt/hadoop/hadoop-2.0.0-cdh4.3.0/conf:/opt/hadoop/hadoop-2.0.0-cdh4.3.0/share/hadoop/common/lib/*:/opt/hadoop/hadoop-2.0.0-cdh4.3.0/share/hadoop/common/*:/contrib/capacity-scheduler/*.jar:/opt/hadoop/hadoop-2.0.0-cdh4.3.0/share/hadoop/hdfs:/opt/hadoop/hadoop-2.0.0-cdh4.3.0/share/hadoop/hdfs/lib/*:/opt/hadoop/hadoop-2.0.0-cdh4.3.0/share/hadoop/hdfs/*:/opt/hadoop/hadoop-2.0.0-cdh4.3.0/share/hadoop/yarn/lib/*:/opt/hadoop/hadoop-2.0.0-cdh4.3.0/share/hadoop/yarn/*:/opt/hadoop/hadoop-2.0.0-cdh4.3.0/share/hadoop/mapreduce2/lib/*:/opt/hadoop/hadoop-2.0.0-cdh4.3.0/share/hadoop/mapreduce/*

Here is the console output:

13/07/09 20:43:07 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
13/07/09 20:43:07 INFO service.AbstractService: 
Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.
13/07/09 20:43:07 INFO service.AbstractService: 
Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.
13/07/09 20:43:08 INFO input.FileInputFormat: Total input paths to process : 0
13/07/09 20:43:08 INFO mapreduce.JobSubmitter: number of splits:0
13/07/09 20:43:08 WARN conf.Configuration: mapred.jar is deprecated. Instead, 
use mapreduce.job.jar
13/07/09 20:43:08 WARN conf.Configuration: mapred.output.value.class is 
deprecated. Instead, use mapreduce.job.output.value.class
13/07/09 20:43:08 WARN conf.Configuration: mapreduce.combine.class is 
deprecated. Instead, use mapreduce.job.combine.class
13/07/09 20:43:08 WARN conf.Configuration: mapreduce.map.class is deprecated. 
Instead, use mapreduce.job.map.class
13/07/09 20:43:08 WARN conf.Configuration: mapred.job.name is deprecated. 
Instead, use mapreduce.job.name
13/07/09 20:43:08 WARN conf.Configuration: mapreduce.reduce.class is 
deprecated. Instead, use mapreduce.job.reduce.class
13/07/09 20:43:08 WARN conf.Configuration: mapred.input.dir is deprecated. 
Instead, use mapreduce.input.fileinputformat.inputdir
13/07/09 20:43:08 WARN conf.Configuration: mapred.output.di

RE: ConnectionException in container, happens only sometimes

2013-07-10 Thread Devaraj k
>1. I assume this is the task (container) that tries to establish connection, 
>but what it wants to connect to?
It is trying to connect to MRAppMaster for executing the actual task.

>1. I assume this is the task (container) that tries to establish connection, 
>but what it wants to connect to?
It seems Container is not getting the correct MRAppMaster address due to some 
reason or AM is crashing before giving the task to Container. Probably it is 
coming due to invalid host mapping.  Can you check the host mapping is proper 
in both the machines and also check the AM log that time for any clue.

Thanks
Devaraj k

From: Andrei [mailto:faithlessfri...@gmail.com]
Sent: 10 July 2013 17:32
To: user@hadoop.apache.org
Subject: ConnectionException in container, happens only sometimes

Hi,

I'm running CDH4.3 installation of Hadoop with the following simple setup:

master-host: runs NameNode, ResourceManager and JobHistoryServer
slave-1-host and slave-2-hosts: DataNodes and NodeManagers.

When I run simple MapReduce job (both - using streaming API or Pi example from 
distribution) on client I see that some tasks fail:

13/07/10 14:40:10 INFO mapreduce.Job:  map 60% reduce 0%
13/07/10 14:40:14 INFO mapreduce.Job: Task Id : 
attempt_1373454026937_0005_m_03_0, Status : FAILED
13/07/10 14:40:14 INFO mapreduce.Job: Task Id : 
attempt_1373454026937_0005_m_05_0, Status : FAILED
...
13/07/10 14:40:23 INFO mapreduce.Job:  map 60% reduce 20%
...

Every time different set of tasks/attempts fails. In some cases number of 
failed attempts becomes critical, and the whole job fails, in other cases job 
is finished successfully. I can't see any dependency, but I noticed the 
following.

Let's say, ApplicationMaster runs on _slave-1-host_. In this case on 
_slave-2-host_ there will be corresponding syslog with the following contents:

...
2013-07-10 11:06:10,986 INFO [main] org.apache.hadoop.ipc.Client: Retrying 
connect to server: slave-2-host/127.0.0.1:11812<http://127.0.0.1:11812>. 
Already tried 0 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2013-07-10 11:06:11,989 INFO [main] org.apache.hadoop.ipc.Client: Retrying 
connect to server: slave-2-host/127.0.0.1:11812<http://127.0.0.1:11812>. 
Already tried 1 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
...
2013-07-10 11:06:20,013 INFO [main] org.apache.hadoop.ipc.Client: Retrying 
connect to server: slave-2-host/127.0.0.1:11812<http://127.0.0.1:11812>. 
Already tried 9 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2013-07-10 11:06:20,019 WARN [main] org.apache.hadoop.mapred.YarnChild: 
Exception running child : java.net.ConnectException: Call From 
slave-2-host/127.0.0.1<http://127.0.0.1> to slave-2-host:11812 failed on 
connection exception: java.net.ConnectException: Connection refused; For more 
details see:  http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:782)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:729)
at org.apache.hadoop.ipc.Client.call(Client.java:1229)
at 
org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:225)
at com.sun.proxy.$Proxy6.getTask(Unknown Source)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:131)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:708)
at 
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:207)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:528)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:492)
at 
org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:499)
at 
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:593)
at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:241)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1278)
at org.apache.hadoop.ipc.Client.call(Client.java:1196)
... 3 more


Notice several things:

1. This exception always happens on the different host than ApplicationMaster 
runs on.
2. It always tries to connect to localhost, not other host in cluster.
3. Port number (11812 in this case) is always different.

My questions are:

1. I assume this is the 

RE: cannot submit a job via java client in hadoop- 2.0.5-alpha

2013-07-10 Thread Devaraj k
'yarn.nodemanager.address' is not required to submit the Job, it will be 
required only in NM side.


Thanks
Devaraj k

From: Azuryy Yu [mailto:azury...@gmail.com]
Sent: 10 July 2013 16:22
To: user@hadoop.apache.org
Subject: Re: cannot submit a job via java client in hadoop- 2.0.5-alpha

you didn't set yarn.nodemanager.address in your yarn-site.xml



On Wed, Jul 10, 2013 at 4:33 PM, Francis.Hu 
mailto:francis...@reachjunction.com>> wrote:
Hi,All

I have a hadoop- 2.0.5-alpha cluster with 3 data nodes . I have Resource 
Manager and all data nodes started and can access web ui of Resource Manager.
I wrote a java client to submit a job as TestJob class below. But the job is 
never submitted successfully. It throws out exception all the time.
My configurations are attached.  Can anyone help me? Thanks.

-my-java client
public class TestJob {

public void execute() {

Configuration conf1 = new Configuration();
conf1.addResource("resources/core-site.xml");
conf1.addResource("resources/hdfs-site.xml");
conf1.addResource("resources/yarn-site.xml");
conf1.addResource("resources/mapred-site.xml");
JobConf conf = new JobConf(conf1);

conf.setJar("/home/francis/hadoop-jobs/MapReduceJob.jar");
conf.setJobName("Test");

conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);

conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);

conf.setMapperClass(DisplayRequestMapper.class);
conf.setReducerClass(DisplayRequestReducer.class);

FileInputFormat.setInputPaths(conf,new 
Path("/home/francis/hadoop-jobs/2013070907.FNODE.2.txt"));
FileOutputFormat.setOutputPath(conf, new 
Path("/home/francis/hadoop-jobs/result/"));

try {
JobClient client = new JobClient(conf);
RunningJob job = client.submitJob(conf);
job.waitForCompletion();
} catch (IOException e) {
e.printStackTrace();
}
}
}

--Exception

jvm 1| java.io.IOException: Cannot initialize Cluster. Please check your 
configuration for mapreduce.framework.name<http://mapreduce.framework.name> and 
the correspond server addresses.
jvm 1|  at 
org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:119)
jvm 1|  at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:81)
jvm 1|  at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:74)
jvm 1|  at org.apache.hadoop.mapred.JobClient.init(JobClient.java:482)
jvm 1|  at org.apache.hadoop.mapred.JobClient.(JobClient.java:461)
jvm 1|  at com.rh.elastic.hadoop.job.TestJob.execute(TestJob.java:59)


Thanks,
Francis.Hu




RE: cannot submit a job via java client in hadoop- 2.0.5-alpha

2013-07-10 Thread Devaraj k
Hi Francis,

Could you check whether those configuration files are getting 
loaded or not, There could be a chance that these configuration files are not 
getting loaded into configuration object due to some invalid path reason.

conf1.addResource("resources/mapred-site.xml");
   // Can you try printing the properties 'yarn.resourcemanager.address' & 
'mapreduce.framework.name' values and check they are   coming as per the config 
files
JobConf conf = new JobConf(conf1);



Thanks
Devaraj k

From: Azuryy Yu [mailto:azury...@gmail.com]
Sent: 10 July 2013 16:22
To: user@hadoop.apache.org
Subject: Re: cannot submit a job via java client in hadoop- 2.0.5-alpha

you didn't set yarn.nodemanager.address in your yarn-site.xml



On Wed, Jul 10, 2013 at 4:33 PM, Francis.Hu 
mailto:francis...@reachjunction.com>> wrote:
Hi,All

I have a hadoop- 2.0.5-alpha cluster with 3 data nodes . I have Resource 
Manager and all data nodes started and can access web ui of Resource Manager.
I wrote a java client to submit a job as TestJob class below. But the job is 
never submitted successfully. It throws out exception all the time.
My configurations are attached.  Can anyone help me? Thanks.

-my-java client
public class TestJob {

public void execute() {

Configuration conf1 = new Configuration();
conf1.addResource("resources/core-site.xml");
conf1.addResource("resources/hdfs-site.xml");
conf1.addResource("resources/yarn-site.xml");
conf1.addResource("resources/mapred-site.xml");
JobConf conf = new JobConf(conf1);

conf.setJar("/home/francis/hadoop-jobs/MapReduceJob.jar");
conf.setJobName("Test");

conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);

conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);

conf.setMapperClass(DisplayRequestMapper.class);
conf.setReducerClass(DisplayRequestReducer.class);

FileInputFormat.setInputPaths(conf,new 
Path("/home/francis/hadoop-jobs/2013070907.FNODE.2.txt"));
FileOutputFormat.setOutputPath(conf, new 
Path("/home/francis/hadoop-jobs/result/"));

try {
JobClient client = new JobClient(conf);
RunningJob job = client.submitJob(conf);
job.waitForCompletion();
} catch (IOException e) {
e.printStackTrace();
}
}
}

--Exception

jvm 1| java.io.IOException: Cannot initialize Cluster. Please check your 
configuration for mapreduce.framework.name<http://mapreduce.framework.name> and 
the correspond server addresses.
jvm 1|  at 
org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:119)
jvm 1|  at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:81)
jvm 1|  at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:74)
jvm 1|  at org.apache.hadoop.mapred.JobClient.init(JobClient.java:482)
jvm 1|  at org.apache.hadoop.mapred.JobClient.(JobClient.java:461)
jvm 1|  at com.rh.elastic.hadoop.job.TestJob.execute(TestJob.java:59)


Thanks,
Francis.Hu




RE: stop-dfs.sh does not work

2013-07-09 Thread Devaraj k
Hi,

Are you trying to stop the DFS with same user or different user?

Could you check whether these processes are running or not using 'jps' or 'ps' .

Thanks
Devaraj k

From: YouPeng Yang [mailto:yypvsxf19870...@gmail.com]
Sent: 10 July 2013 11:01
To: user@hadoop.apache.org
Subject: stop-dfs.sh does not work

Hi users.

I start my HDFS by using :start-dfs.sh. And add the node start successfully.
However the stop-dfs.sh dose not work when I want to stop the HDFS.
It shows : no namdenode to stop
   no datanode to stop.

I have to stop it by the command: kill -9 pid.


So I wonder that how the stop-dfs.sh does not  work no longer?


Best regards


RE: can not start yarn

2013-07-09 Thread Devaraj k
Hi,

   Here NM is failing to connect to Resource Manager.

Have you started the Resource Manager successfully? Or Do you see any problem 
while starting Resource Manager in RM log..

If you have started the Resource Manager in different machine other than NM, 
you need to set this configuration for NM 
"yarn.resourcemanager.resource-tracker.address" with RM resource tracker 
address.


Thanks
Devaraj k

From: ch huang [mailto:justlo...@gmail.com]
Sent: 10 July 2013 08:36
To: user@hadoop.apache.org
Subject: can not start yarn

i am testing mapreducev2 ,i find i start NM error

here is NM log content

2013-07-10 11:02:35,909 INFO org.apache.hadoop.yarn.service.AbstractService: 
Service:org.apache.hadoop.yarn.server.nodemanager.webapp.WebServer is started.
2013-07-10 11:02:35,909 INFO org.apache.hadoop.yarn.service.AbstractService: 
Service:Dispatcher is started.
2013-07-10 11:02:35,930 INFO 
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Connecting to 
ResourceManager at /0.0.0.0:8031<http://0.0.0.0:8031>
2013-07-10 11:02:37,209 INFO org.apache.hadoop.ipc.Client: Retrying connect to 
server: 0.0.0.0/0.0.0.0:8031<http://0.0.0.0/0.0.0.0:8031>. Already tried 0 
time(s); retry policy is RetryUpToMaximumCountWithFi
xedSleep(maxRetries=10, sleepTime=1 SECONDS)
2013-07-10 11:02:38,210 INFO org.apache.hadoop.ipc.Client: Retrying connect to 
server: 0.0.0.0/0.0.0.0:8031<http://0.0.0.0/0.0.0.0:8031>. Already tried 1 
time(s); retry policy is RetryUpToMaximumCountWithFi
xedSleep(maxRetries=10, sleepTime=1 SECONDS)
2013-07-10 11:02:39,211 INFO org.apache.hadoop.ipc.Client: Retrying connect to 
server: 0.0.0.0/0.0.0.0:8031<http://0.0.0.0/0.0.0.0:8031>. Already tried 2 
time(s); retry policy is RetryUpToMaximumCountWithFi
xedSleep(maxRetries=10, sleepTime=1 SECONDS)
2013-07-10 11:02:40,212 INFO org.apache.hadoop.ipc.Client: Retrying connect to 
server: 0.0.0.0/0.0.0.0:8031<http://0.0.0.0/0.0.0.0:8031>. Already tried 3 
time(s); retry policy is RetryUpToMaximumCountWithFi
xedSleep(maxRetries=10, sleepTime=1 SECONDS)
2013-07-10 11:02:41,213 INFO org.apache.hadoop.ipc.Client: Retrying connect to 
server: 0.0.0.0/0.0.0.0:8031<http://0.0.0.0/0.0.0.0:8031>. Already tried 4 
time(s); retry policy is RetryUpToMaximumCountWithFi
xedSleep(maxRetries=10, sleepTime=1 SECONDS)
2013-07-10 11:02:42,215 INFO org.apache.hadoop.ipc.Client: Retrying connect to 
server: 0.0.0.0/0.0.0.0:8031<http://0.0.0.0/0.0.0.0:8031>. Already tried 5 
time(s); retry policy is RetryUpToMaximumCountWithFi
xedSleep(maxRetries=10, sleepTime=1 SECONDS)
2013-07-10 11:02:43,216 INFO org.apache.hadoop.ipc.Client: Retrying connect to 
server: 0.0.0.0/0.0.0.0:8031<http://0.0.0.0/0.0.0.0:8031>. Already tried 6 
time(s); retry policy is RetryUpToMaximumCountWithFi
xedSleep(maxRetries=10, sleepTime=1 SECONDS)
2013-07-10 11:02:44,217 INFO org.apache.hadoop.ipc.Client: Retrying connect to 
server: 0.0.0.0/0.0.0.0:8031<http://0.0.0.0/0.0.0.0:8031>. Already tried 7 
time(s); retry policy is RetryUpToMaximumCountWithFi
xedSleep(maxRetries=10, sleepTime=1 SECONDS)
2013-07-10 11:02:45,218 INFO org.apache.hadoop.ipc.Client: Retrying connect to 
server: 0.0.0.0/0.0.0.0:8031<http://0.0.0.0/0.0.0.0:8031>. Already tried 8 
time(s); retry policy is RetryUpToMaximumCountWithFi
xedSleep(maxRetries=10, sleepTime=1 SECONDS)
2013-07-10 11:02:46,219 INFO org.apache.hadoop.ipc.Client: Retrying connect to 
server: 0.0.0.0/0.0.0.0:8031<http://0.0.0.0/0.0.0.0:8031>. Already tried 9 
time(s); retry policy is RetryUpToMaximumCountWithFi
xedSleep(maxRetries=10, sleepTime=1 SECONDS)
2013-07-10 11:02:46,226 ERROR org.apache.hadoop.yarn.service.CompositeService: 
Error starting services org.apache.hadoop.yarn.server.nodemanager.NodeManager
org.apache.avro.AvroRuntimeException: 
java.lang.reflect.UndeclaredThrowableException
at 
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:141)
at 
org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.start(NodeManager.java:196)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:329)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:351)
Caused by: java.lang.reflect.UndeclaredThrowableException
at 
org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl.unwrapAndThrowException(YarnRemoteExceptionPBImpl.java:135)
at 
org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:61)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:190)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.

RE: Which InputFormat to use?

2013-07-04 Thread Devaraj k
Hi Ahmed,

Hadoop 0.20.0 included the new mapred API, these sometimes 
refer as context objects. These are designed to make API easier to evolve in 
future. There are some differences between new & old API's,

> The new API's favour abstract classes rather than interfaces, since abstract 
> classes are easy to evolve.
> New API's use context objects like MapContext & ReduceContext to connect the 
> user code.
> The old API has a special JobConf object for jobconf, in new API Job 
> configuration will be done using Configuration.

You can find the new API's in org.apache.hadoop.mapreduce.lib.input.* package 
and its sub packages, old API's in org.apache.hadoop.mapred.* package its sub 
packages.

The new API is type-incompatible with the old, we need to rewrite the jobs to 
make use of these advantages.

Based on these things you can select which API's to use.

Thanks
Devaraj k

From: Ahmed Eldawy [mailto:aseld...@gmail.com]
Sent: 05 July 2013 00:00
To: user@hadoop.apache.org
Subject: Which InputFormat to use?

Hi I'm developing a new set of InputFormats that are used for a project I'm 
doing. I found that there are two ways to create  a new InputFormat.
1- Extend the abstract class org.apache.hadoop.mapreduce.InputFormat
2- Implement the interface org.apache.hadoop.mapred.InputFormat
I don't know why there are two versions which are incompatible. I found out 
that for each one, there is a whole set of interfaces for different classes 
such as InputSplit, RecordReader and MapReduce job. Unfortunately, each set of 
classes is not compatible with the other one. This means that I have to choose 
one of the interfaces and go with it till the end. I have two questions 
basically.
1- Which of these two interfaces I should go with? I didn't find any 
deprecation in one of them so they both seem legitimate. Is there any plan to 
retire one of them?
2- I already have some classes implemented in one of the formats, does it worth 
refactoring these classes to use the other interface, in case I used he old 
format.
Thanks in advance for your help.


Best regards,
Ahmed Eldawy


RE: Decomssion datanode - no response

2013-07-04 Thread Devaraj k
And also could you check whether the client is connecting to NameNode or any 
failure in connecting to the NN.

Thanks
Devaraj k

From: Azuryy Yu [mailto:azury...@gmail.com]
Sent: 05 July 2013 09:15
To: user@hadoop.apache.org
Subject: Re: Decomssion datanode - no response

I added dfs.hosts.exclude before NN started.

and I updated /usr/local/hadoop/conf/dfs_exclude whith new hosts, but It 
doesn't decomssion.

On Fri, Jul 5, 2013 at 11:39 AM, Devaraj k 
mailto:devara...@huawei.com>> wrote:
When did you add this configuration in NN conf?
  
dfs.hosts.exclude
/usr/local/hadoop/conf/dfs_exclude
  

If you have added this configuration after starting NN, it won't take effect 
and need to restart NN.

If you have added this config with the exclude file before NN start, you can 
update the file with new hosts and refreshNodes command can be issued, then 
newly updated the DN's will be decommissioned.

Thanks
Devaraj k

From: Azuryy Yu [mailto:azury...@gmail.com<mailto:azury...@gmail.com>]
Sent: 05 July 2013 08:48
To: user@hadoop.apache.org<mailto:user@hadoop.apache.org>
Subject: Re: Decomssion datanode - no response

Thanks Devaraj,

There are no any releated logs in the NN log and DN log.

On Fri, Jul 5, 2013 at 11:14 AM, Devaraj k 
mailto:devara...@huawei.com>> wrote:
Do you see any log related to this in Name Node logs when you issue 
refreshNodes dfsadmin command?

Thanks
Devaraj k

From: Azuryy Yu [mailto:azury...@gmail.com<mailto:azury...@gmail.com>]
Sent: 05 July 2013 08:12
To: user@hadoop.apache.org<mailto:user@hadoop.apache.org>
Subject: Decomssion datanode - no response

Hi,
I am using hadoop-2.0.5-alpha, and I added 5 datanodes into dfs_exclude,

hdfs-site.xml:
  
dfs.hosts.exclude
/usr/local/hadoop/conf/dfs_exclude
  

then:
hdfs dfsadmin -refreshNodes

but there is no decomssion nodes showed on the webUI. and not any releated logs 
in the datanode log. what's wrong?




RE: Decomssion datanode - no response

2013-07-04 Thread Devaraj k
When did you add this configuration in NN conf?
  
dfs.hosts.exclude
/usr/local/hadoop/conf/dfs_exclude
  

If you have added this configuration after starting NN, it won't take effect 
and need to restart NN.

If you have added this config with the exclude file before NN start, you can 
update the file with new hosts and refreshNodes command can be issued, then 
newly updated the DN's will be decommissioned.

Thanks
Devaraj k

From: Azuryy Yu [mailto:azury...@gmail.com]
Sent: 05 July 2013 08:48
To: user@hadoop.apache.org
Subject: Re: Decomssion datanode - no response

Thanks Devaraj,

There are no any releated logs in the NN log and DN log.

On Fri, Jul 5, 2013 at 11:14 AM, Devaraj k 
mailto:devara...@huawei.com>> wrote:
Do you see any log related to this in Name Node logs when you issue 
refreshNodes dfsadmin command?

Thanks
Devaraj k

From: Azuryy Yu [mailto:azury...@gmail.com<mailto:azury...@gmail.com>]
Sent: 05 July 2013 08:12
To: user@hadoop.apache.org<mailto:user@hadoop.apache.org>
Subject: Decomssion datanode - no response

Hi,
I am using hadoop-2.0.5-alpha, and I added 5 datanodes into dfs_exclude,

hdfs-site.xml:
  
dfs.hosts.exclude
/usr/local/hadoop/conf/dfs_exclude
  

then:
hdfs dfsadmin -refreshNodes

but there is no decomssion nodes showed on the webUI. and not any releated logs 
in the datanode log. what's wrong?



RE: Requesting containers on a specific host

2013-07-04 Thread Devaraj k
Hi Kishore,

hadoop-2.1.0 beta release is in voting process now.

You can try out from hadoop-2.1.0 beta RC 
http://people.apache.org/~acmurthy/hadoop-2.1.0-beta-rc0/ or you could check 
the same with trunk build.

Thanks
Devaraj k

From: Krishna Kishore Bonagiri [mailto:write2kish...@gmail.com]
Sent: 04 July 2013 21:33
To: user@hadoop.apache.org
Subject: Re: Requesting containers on a specific host

Thanks Arun, it seems to be available with 2.1.0-beta, when will that be 
released? Or if I want it now, could I get from the trunk?

-Kishore

On Thu, Jul 4, 2013 at 5:58 PM, Arun C Murthy 
mailto:a...@hortonworks.com>> wrote:
To guarantee nodes on a specific container you need to use the whitelist 
feature we added recently:

https://issues.apache.org/jira/browse/YARN-398
Arun

On Jul 4, 2013, at 3:14 AM, Krishna Kishore Bonagiri 
mailto:write2kish...@gmail.com>> wrote:


I could get containers on specific nodes using addContainerRequest() on 
AMRMClient. But there are issues with it. I have two nodes, node1 and node2 in 
my cluster. And, my Application Master is trying to get 3 containers on node1, 
and 3 containers on node2 in that order.

While trying to request on node1, it sometimes gives me those on node2, and 
vice verse. When I get a container on a different node than the one I need, I 
release it and make a fresh request. I am having to do like that forever to get 
a container on the node I need.

 Though the node I am requesting has enough resources, why does it keep giving 
me containers on the other node? How can I make sure I get a container on the 
node I want?

Note: I am using the default scheduler, i.e. Capacity Scheduler.

Thanks,
Kishore

On Fri, Jun 21, 2013 at 7:25 PM, Arun C Murthy 
mailto:a...@hortonworks.com>> wrote:
Check if the hostname you are setting is the same in the RM logs...

On Jun 21, 2013, at 2:15 AM, Krishna Kishore Bonagiri 
mailto:write2kish...@gmail.com>> wrote:


Hi,
  I am trying to get container on a specific host, using setHostName(0 call on 
ResourceRequest, but could not get allocated anything forever, which just works 
fine when I change the node name to "*". I am working on a single node cluster, 
and I am giving the name of the single node I have in my cluster.

  Is there any specific format that I need to give for setHostName(), why is it 
not working...

Thanks,
Kishore

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/




RE: Decomssion datanode - no response

2013-07-04 Thread Devaraj k
Do you see any log related to this in Name Node logs when you issue 
refreshNodes dfsadmin command?

Thanks
Devaraj k

From: Azuryy Yu [mailto:azury...@gmail.com]
Sent: 05 July 2013 08:12
To: user@hadoop.apache.org
Subject: Decomssion datanode - no response

Hi,
I am using hadoop-2.0.5-alpha, and I added 5 datanodes into dfs_exclude,

hdfs-site.xml:
  
dfs.hosts.exclude
/usr/local/hadoop/conf/dfs_exclude
  

then:
hdfs dfsadmin -refreshNodes

but there is no decomssion nodes showed on the webUI. and not any releated logs 
in the datanode log. what's wrong?


RE: Subscribe

2013-07-03 Thread Devaraj k
Hi Steven,

   For subscribing to this list, you need to send mail to 
user-subscr...@hadoop.apache.org<mailto:user-subscr...@hadoop.apache.org>.

Please find the all mailing lists here 
http://hadoop.apache.org/mailing_lists.html

Thanks
Devaraj k

From: Steven Fuller [mailto:sful...@cloudera.com]
Sent: 03 July 2013 23:06
To: user@hadoop.apache.org
Subject: Subscribe




RE: temporary folders for YARN tasks

2013-07-01 Thread Devaraj k
You can make use of this configuration to do the same.


List of directories to store localized files in. An
  application's localized file directory will be found in:
  
${yarn.nodemanager.local-dirs}/usercache/${user}/appcache/application_${appid}.
  Individual containers' work directories, called container_${contid}, will
  be subdirectories of this.
   
yarn.nodemanager.local-dirs
${hadoop.tmp.dir}/nm-local-dir
  

Thanks
Devaraj k

From: John Lilley [mailto:john.lil...@redpoint.net]
Sent: 02 July 2013 02:08
To: user@hadoop.apache.org
Subject: temporary folders for YARN tasks

When a YARN app and its tasks wants to write temporary files, how does it know 
where to write the files?
I am assuming that each task has some temporary space available, and I hope it 
is available across multiple disk volumes for parallel performance.
Are those files cleaned up automatically after task exit?
If I want to give lifetime control of the files to an auxiliary service (along 
the lines of MR shuffle passing files to the aux service), how would I do that, 
and would that entail different file locations?
Thanks
John




RE: intermediate results files

2013-07-01 Thread Devaraj k
If you are 100% sure that all the node data nodes are available and healthy for 
that period of time, you can choose the replication factor as 1 or <3.

Thanks
Devaraj k

From: John Lilley [mailto:john.lil...@redpoint.net]
Sent: 02 July 2013 04:40
To: user@hadoop.apache.org
Subject: RE: intermediate results files

I've seen some benchmarks where replication=1 runs at about 50MB/sec and 
replication=3 runs at about 33MB/sec, but I can't seem to find that now.
John

From: Mohammad Tariq [mailto:donta...@gmail.com]
Sent: Monday, July 01, 2013 5:03 PM
To: user@hadoop.apache.org<mailto:user@hadoop.apache.org>
Subject: Re: intermediate results files

Hello John,

  IMHO, it doesn't matter. Your job will write the result just once. 
Replica creation is handled at the HDFS layer so it has nothing to with your 
job. Your job will still be writing at the same speed.

Warm Regards,
Tariq
cloudfront.blogspot.com<http://cloudfront.blogspot.com>

On Tue, Jul 2, 2013 at 4:16 AM, John Lilley 
mailto:john.lil...@redpoint.net>> wrote:
If my reducers are going to create results that are temporary in nature 
(consumed by the next processing stage) is it recommended to use a replication 
factor <3 to improve performance?
Thanks
john




RE: YARN tasks and child processes

2013-07-01 Thread Devaraj k
It is possible to persist the data by YARN task, you can choose whichever place 
you want to persist.
If you choose to persist in HDFS, you need to take care deleting the data after 
using it.  If you choose to write in local dir, you may write the data into the 
nm local dirs (i.e 'yarn.nodemanager.local-dirs' configuration) accordingly 
with the app id & container id, and this will be cleaned up after the app 
completion.  You need to make use of this persisted data before completing the 
application.


Thanks
Devaraj k

From: John Lilley [mailto:john.lil...@redpoint.net]
Sent: 02 July 2013 04:44
To: user@hadoop.apache.org
Subject: YARN tasks and child processes

Is it possible for a child process of a YARN task to persist after the task is 
complete?  I am looking at an alternative to a YARN auxiliary process that may 
be simpler to implement, if I can have a task spawn a process that persists for 
some time after the task finishes.
Thanks,
John



RE: region server can not start

2013-06-26 Thread Devaraj k
Can you ask this HBase question in the HBase user mailing list?

Thanks
Devaraj k

From: ch huang [mailto:justlo...@gmail.com]
Sent: 26 June 2013 14:30
To: user@hadoop.apache.org
Subject: region server can not start

i change zookeeper port from 2181 to 2281 , region server can not start

2013-06-26 16:57:00,003 INFO org.apache.zookeeper.ZooKeeper: Initiating client 
connection, connectString=CH22:2181 sessionTimeout=18 
watcher=regionserver:60020
2013-06-26 16:57:00,030 INFO org.apache.zookeeper.ClientCnxn: Opening socket 
connection to server CH22/192.168.10.22:2181<http://192.168.10.22:2181>
2013-06-26 16:57:00,039 WARN org.apache.zookeeper.ClientCnxn: Session 0x0 for 
server null, unexpected error, closing socket connection and attempting 
reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1143)
2013-06-26 16:57:00,254 INFO org.apache.hadoop.hbase.regionserver.ShutdownHook: 
Installed shutdown hook thread: Shutdownhook:regionserver60020
2013-06-26 16:57:01,765 INFO org.apache.zookeeper.ClientCnxn: Opening socket 
connection to server CH22/192.168.10.22:2181<http://192.168.10.22:2181>
2013-06-26 16:57:01,767 WARN org.apache.zookeeper.ClientCnxn: Session 0x0 for 
server null, unexpected error, closing socket connection and attempting 
reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1143)
2013-06-26 16:57:03,505 INFO org.apache.zookeeper.ClientCnxn: Opening socket 
connection to server CH22/192.168.10.22:2181<http://192.168.10.22:2181>
2013-06-26 16:57:03,506 WARN org.apache.zookeeper.ClientCnxn: Session 0x0 for 
server null, unexpected error, closing socket connection and attempting 
reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1143)
2013-06-26 16:57:05,323 INFO org.apache.zookeeper.ClientCnxn: Opening socket 
connection to server CH22/192.168.10.22:2181<http://192.168.10.22:2181>
2013-06-26 16:57:05,324 WARN org.apache.zookeeper.ClientCnxn: Session 0x0 for 
server null, unexpected error, closing socket connection and attempting 
reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1143)
2013-06-26 16:57:06,770 INFO org.apache.zookeeper.ClientCnxn: Opening socket 
connection to server CH22/192.168.10.22:2181<http://192.168.10.22:2181>
2013-06-26 16:57:06,771 WARN org.apache.zookeeper.ClientCnxn: Session 0x0 for 
server null, unexpected error, closing socket connection and attempting 
reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1143)
2013-06-26 16:57:08,824 INFO org.apache.zookeeper.ClientCnxn: Opening socket 
connection to server CH22/192.168.10.22:2181<http://192.168.10.22:2181>
2013-06-26 16:57:08,825 WARN org.apache.zookeeper.ClientCnxn: Session 0x0 for 
server null, unexpected error, closing socket connection and attempting 
reconnect
java.net.ConnectException: Connection refused


RE: master node abnormal ,help

2013-06-26 Thread Devaraj k
Can you ask this HBase question in the HBase user mailing list?


Thanks
Devaraj k

From: ch huang [mailto:justlo...@gmail.com]
Sent: 26 June 2013 14:52
To: user@hadoop.apache.org
Subject: master node abnormal ,help

when i start master node ,it not work,anyone can help?

2013-06-26 17:17:52,552 INFO 
org.apache.hadoop.hbase.master.ActiveMasterManager: Master=CH22:6
2013-06-26 17:17:52,859 DEBUG org.apache.hadoop.hbase.util.FSUtils: Created 
version file at hdfs://CH22:9000/hbaseroot set its version at:7
2013-06-26 17:17:52,863 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer 
Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File 
/hbaseroot/hbase.version could only be replicated to 0 nodes, instead of 1
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1533)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:667)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1434)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1430)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1428)

at org.apache.hadoop.ipc.Client.call(Client.java:1107)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226)
at $Proxy6.addBlock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy6.addBlock(Unknown Source)
at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3647)
at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3514)
at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2300(DFSClient.java:2720)
at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2915)


RE: eclipse connect problem in CDH3u4 Protocol ora.apache.hadoop.hdfs.protocal.ClientProtocol version mismatch

2013-06-26 Thread Devaraj k
You need to update the jars in client side with the jars which server uses.

Thanks
Devaraj k

From: ch huang [mailto:justlo...@gmail.com]
Sent: 26 June 2013 14:04
To: user@hadoop.apache.org
Subject: eclipse connect problem in CDH3u4 Protocol 
ora.apache.hadoop.hdfs.protocal.ClientProtocol version mismatch

i already test eclispe connnect apache HDFS,is ok,
but i connect  CDH3u4 HDFS, it get error  (i use same library)
Protocol ora.apache.hadoop.hdfs.protocal.ClientProtocol version mismatch 
(client=61,server=63)


RE:

2013-06-26 Thread Devaraj k
Could you check the logs for the hadoop processes, are they started 
successfully or any problem while starting?

Thanks
Devaraj k

From: ch huang [mailto:justlo...@gmail.com]
Sent: 26 June 2013 12:38
To: user@hadoop.apache.org
Subject:

hi i build a new hadoop cluster ,but i can not ACCESS hdfs ,why? i use CDH3u4 
,redhat6.2

# hadoop fs -put /opt/test 
hdfs://192.168.10.22:9000/user/test<http://192.168.10.22:9000/user/test>
13/06/26 15:00:47 INFO ipc.Client: Retrying connect to server: 
/192.168.10.22:9000<http://192.168.10.22:9000>. Already tried 0 time(s).
13/06/26 15:00:48 INFO ipc.Client: Retrying connect to server: 
/192.168.10.22:9000<http://192.168.10.22:9000>. Already tried 1 time(s).
13/06/26 15:00:49 INFO ipc.Client: Retrying connect to server: 
/192.168.10.22:9000<http://192.168.10.22:9000>. Already tried 2 time(s).
13/06/26 15:00:50 INFO ipc.Client: Retrying connect to server: 
/192.168.10.22:9000<http://192.168.10.22:9000>. Already tried 3 time(s).
13/06/26 15:00:51 INFO ipc.Client: Retrying connect to server: 
/192.168.10.22:9000<http://192.168.10.22:9000>. Already tried 4 time(s).
13/06/26 15:00:52 INFO ipc.Client: Retrying connect to server: 
/192.168.10.22:9000<http://192.168.10.22:9000>. Already tried 5 time(s).
13/06/26 15:00:53 INFO ipc.Client: Retrying connect to server: 
/192.168.10.22:9000<http://192.168.10.22:9000>. Already tried 6 time(s).
13/06/26 15:00:54 INFO ipc.Client: Retrying connect to server: 
/192.168.10.22:9000<http://192.168.10.22:9000>. Already tried 7 time(s).
13/06/26 15:00:55 INFO ipc.Client: Retrying connect to server: 
/192.168.10.22:9000<http://192.168.10.22:9000>. Already tried 8 time(s).
13/06/26 15:00:56 INFO ipc.Client: Retrying connect to server: 
/192.168.10.22:9000<http://192.168.10.22:9000>. Already tried 9 time(s).
put: Call to /192.168.10.22:9000<http://192.168.10.22:9000> failed on 
connection exception: java.net.ConnectException: Connection refused


RE: Yarn HDFS and Yarn Exceptions when processing "larger" datasets.

2013-06-25 Thread Devaraj k
Hi,

   Could you check the network usage in the cluster when this problem occurs? 
Probably it is causing due to high network usage.

Thanks
Devaraj k

From: blah blah [mailto:tmp5...@gmail.com]
Sent: 26 June 2013 05:39
To: user@hadoop.apache.org
Subject: Yarn HDFS and Yarn Exceptions when processing "larger" datasets.

Hi All
First let me excuse for the poor thread title but I have no idea how to express 
the problem in one sentence.
I have implemented new Application Master with the use of Yarn. I am using old 
Yarn development version. Revision 1437315, from 2013-01-23 (SNAPSHOT 3.0.0). I 
can not update to current trunk version, as prototype deadline is soon, and I 
don't have time to include Yarn API changes.
Currently I execute experiments in pseudo-distributed mode, I use guava version 
14.0-rc1. I have a problem with Yarn's and HDFS Exceptions for "larger" 
datasets. My AM works fine and I can execute it without a problem for a debug 
dataset (1MB size). But when I increase the size of input to 6.8 MB, I am 
getting the following exceptions:
AM_Exceptions_Stack

Exception in thread "Thread-3" java.lang.reflect.UndeclaredThrowableException
at 
org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl.unwrapAndThrowException(YarnRemoteExceptionPBImpl.java:135)
at 
org.apache.hadoop.yarn.api.impl.pb.client.AMRMProtocolPBClientImpl.allocate(AMRMProtocolPBClientImpl.java:77)
at 
org.apache.hadoop.yarn.client.AMRMClientImpl.allocate(AMRMClientImpl.java:194)
at 
org.tudelft.ludograph.app.AppMasterContainerRequester.sendContainerAskToRM(AppMasterContainerRequester.java:219)
at 
org.tudelft.ludograph.app.AppMasterContainerRequester.run(AppMasterContainerRequester.java:315)
at java.lang.Thread.run(Thread.java:662)
Caused by: com.google.protobuf.ServiceException: java.io.IOException: Failed on 
local exception: java.io.IOException: Response is null.; Host Details : local 
host is: "linux-ljc5.site/127.0.0.1<http://127.0.0.1>"; destination host is: 
"0.0.0.0":8030;
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:212)
at $Proxy10.allocate(Unknown Source)
at 
org.apache.hadoop.yarn.api.impl.pb.client.AMRMProtocolPBClientImpl.allocate(AMRMProtocolPBClientImpl.java:75)
... 4 more
Caused by: java.io.IOException: Failed on local exception: java.io.IOException: 
Response is null.; Host Details : local host is: 
"linux-ljc5.site/127.0.0.1<http://127.0.0.1>"; destination host is: 
"0.0.0.0":8030;
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:760)
at org.apache.hadoop.ipc.Client.call(Client.java:1240)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
... 6 more
Caused by: java.io.IOException: Response is null.
at 
org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:950)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:844)
Container_Exception

Exception in thread 
"org.apache.hadoop.hdfs.SocketCache@6da0d866<mailto:org.apache.hadoop.hdfs.SocketCache@6da0d866>"
 java.lang.NoSuchMethodError: 
com.google.common.collect.LinkedListMultimap.values()Ljava/util/List;
at org.apache.hadoop.hdfs.SocketCache.clear(SocketCache.java:257)
at org.apache.hadoop.hdfs.SocketCache.access$100(SocketCache.java:45)
at org.apache.hadoop.hdfs.SocketCache$1.run(SocketCache.java:126)
at java.lang.Thread.run(Thread.java:662)

As I said this problem does not occur for the 1MB input. For the 6MB input 
nothing is changed except the input dataset. Now a little bit of what am I 
doing, to give you the context of the problem. My AM starts N (debug 4) 
containers and each container reads its input data part. When this process is 
finished I am exchanging parts of input between containers (exchanging IDs of 
input structures, to provide means for communication between data structures). 
During the process of exchanging IDs these exceptions occur. I start Netty 
Server/Client on each container and I use ports 12000-12099 as mean of 
communicating these IDs.
Any help will be greatly appreciated. Sorry for any typos and if the 
explanation is not clear just ask for any details you are interested in. 
Currently it is after 2 AM I hope this will be a valid excuse.
regards
tmp


RE: Job end notification does not always work (Hadoop 2.x)

2013-06-25 Thread Devaraj k
I agree, for getting status/counters we need HS. I mean Job can finish without 
HS also.

Thanks
Devaraj k

From: Alejandro Abdelnur [mailto:t...@cloudera.com]
Sent: 25 June 2013 18:05
To: common-u...@hadoop.apache.org
Subject: Re: Job end notification does not always work (Hadoop 2.x)

Devaraj,

If you don't run the HS, once your jobs finished you cannot retrieve 
status/counters from it, from Java AP or Web UI. So I'd for any practical 
usage, you need it.

thx

On Mon, Jun 24, 2013 at 8:42 PM, Devaraj k 
mailto:devara...@huawei.com>> wrote:
It is not mandatory to have running HS in the cluster. Still the user can 
submit the job without HS in the cluster, and user may expect the Job/App End 
Notification.

Thanks
Devaraj k

From: Alejandro Abdelnur [mailto:t...@cloudera.com<mailto:t...@cloudera.com>]
Sent: 24 June 2013 21:42
To: user@hadoop.apache.org<mailto:user@hadoop.apache.org>
Cc: user@hadoop.apache.org<mailto:user@hadoop.apache.org>

Subject: Re: Job end notification does not always work (Hadoop 2.x)

if we ought to do this in a yarn service it
should be the RM or the HS. the RM is, IMO, the natural fit. the HS, would be a 
good choice if we are concerned about the extra work this would cause in the 
RM. the problem with the current HS is that it is MR specific, we should 
generalize it for diff AM types.

thx

Alejandro
(phone typing)

On Jun 23, 2013, at 23:28, Devaraj k 
mailto:devara...@huawei.com>> wrote:
Even if we handle all the failure cases in AM for Job End Notification, we may 
miss cases like abrupt kill of AM when it is in last retry. If we choose NM to 
give the notification, again RM needs to identify which NM should give the 
end-notification as we don't have any direct protocol between AM and NM.

I feel it would be better to move End-Notification responsibility to RM as Yarn 
Service because it ensures 100% notification and also useful for other types of 
applications as well.


Thanks
Devaraj K

From: Ravi Prakash [mailto:ravi...@ymail.com]
Sent: 23 June 2013 19:01
To: user@hadoop.apache.org<mailto:user@hadoop.apache.org>
Subject: Re: Job end notification does not always work (Hadoop 2.x)

Hi Alejandro,

Thanks for your reply! I was thinking more along the lines Prashant suggested 
i.e. a failure during init() should still trigger an attempt to notify (by the 
AM). But now that you mention it, maybe we would be better of including this as 
a YARN feature after all (specially with all the new AMs being written). We 
could let the NM of the AM handle the notification burden, so that the RM 
doesn't get unduly taxed. Thoughts?

Thanks
Ravi



From: Alejandro Abdelnur mailto:t...@cloudera.com>>
To: "common-u...@hadoop.apache.org<mailto:common-u...@hadoop.apache.org>" 
mailto:user@hadoop.apache.org>>
Sent: Saturday, June 22, 2013 7:37 PM
Subject: Re: Job end notification does not always work (Hadoop 2.x)

If the AM fails before doing the job end notification, at any stage of the 
execution for whatever reason, the job end notification will never be deliver. 
There is not way to fix this unless the notification is done by a Yarn service. 
The 2 'candidate' services for doing this would be the RM and the HS. The job 
notification URL is in the job conf. The RM never sees the job conf, that rules 
out the RM out unless we add, at AM registration time the possibility to 
specify a callback URL. The HS has access to the job conf, but the HS is 
currently a 'passive' service.

thx

On Sat, Jun 22, 2013 at 3:48 PM, Arun C Murthy 
mailto:a...@hortonworks.com>> wrote:
Prashanth,

 Please file a jira.

 One thing to be aware of - AMs get restarted a certain number of times for 
fault-tolerance - which means we can't just assume that failure of a single AM 
is equivalent to failure of the job.

 Only the ResourceManager is in the appropriate position to judge failure of AM 
v/s failure-of-job.

hth,
Arun

On Jun 22, 2013, at 2:44 PM, Prashant Kommireddi 
mailto:prash1...@gmail.com>> wrote:


Thanks Ravi.

Well, in this case its a no-effort :) A failure of AM init should be considered 
as failure of the job? I looked at the code and best-effort makes sense with 
respect to retry logic etc. You make a good point that there would be no 
notification in case AM OOMs, but I do feel AM init failure should send a 
notification by other means.

On Sat, Jun 22, 2013 at 2:38 PM, Ravi Prakash 
mailto:ravi...@ymail.com>> wrote:
Hi Prashant,

I would tend to agree with you. Although job-end notification is only a 
"best-effort" mechanism (i.e. we cannot always guarantee notification for 
example when the AM OOMs), I agree with you that we can do more. If you feel 
strongly about this, please create a JIRA and possibly upload a patch.

Thanks
Ravi



From: Prashant Kommireddi mailto:prash1...@gmail.com>

RE: Error:java heap size

2013-06-25 Thread Devaraj k
As you described in the below MR Job uses 6 GB file, how many tasks are there 
in this Job and how much is the i/p for each task?

Is there any chance of holding more data by tasks in-memory, could you check 
your map function why it is not able to run with 2GB memory.

Could you also check those failing task log files, you will get more idea who 
is causing the problem.

Thanks
Devaraj k

From: Ramya S [mailto:ram...@suntecgroup.com]
Sent: 25 June 2013 15:50
To: user@hadoop.apache.org
Subject: RE: Error:java heap size

Hi,

I have set the properties in mapred-site.xml as follows:

 
  mapreduce.map.memory.mb
2048
   
   
  
mapreduce.map.java.opts
   -Xmx2048M


  
mapreduce.reduce.memory.mb
 
2048
  
  
 
mapreduce.reduce.java.opts
 
-Xmx2048M
 
But i am still getting the same error in AM logs with some improvement in  
mapping process.

I have found the property "mapreduce.task.io.sort.mb" set with the value 512. 
Is there any significance in changing this value to resolve this error?

Thanks,
Ramya

____
From: Devaraj k [mailto:devara...@huawei.com]
Sent: Tue 6/25/2013 3:08 PM
To: user@hadoop.apache.org<mailto:user@hadoop.apache.org>
Subject: RE: Error:java heap size
Hi Ramya,

We need to change the –Xmx value for your Job tasks according to the memory 
allocating for map/reduce containers.

You can pass the –Xmx value for map and reduce yarn child’s using 
configurations "mapreduce.map.java.opts" and "mapreduce.reduce.java.opts".

If you are allocating 2GB for map container, you can probably pass the same 
value as –Xmx for the  mapreduce.map.java.opts and same way for reducer as well.


Thanks
Devaraj k

From: Ramya S [mailto:ram...@suntecgroup.com]
Sent: 25 June 2013 14:39
To: user@hadoop.apache.org<mailto:user@hadoop.apache.org>
Subject: RE: Error:java heap size


Hi,

Error is in AM log, which is as follows:

  *   FATAL [IPC Server handler 10 on 49363] 
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: 
attempt_1372143291407_0003_m_01_0 - exited : Java heap space

  *INFO [IPC Server handler 10 on 49363] 
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report from 
attempt_1372143291407_0003_m_01_0: Error: Java heap space

  *   INFO [AsyncDispatcher event handler] 
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report 
from attempt_1372143291407_0003_m_01_0: Error: Java heap space
Thanks,
Ramya
____
From: Devaraj k [mailto:devara...@huawei.com]
Sent: Tue 6/25/2013 2:09 PM
To: user@hadoop.apache.org<mailto:user@hadoop.apache.org>
Subject: RE: Error:java heap size
Hi Ramya,

Where did you get the java heap size error?

Could you see the error in client side/RM/AM log? What is the detailed error?

Thanks
Devaraj k

From: Ramya S [mailto:ram...@suntecgroup.com]
Sent: 25 June 2013 13:10
To: user@hadoop.apache.org<mailto:user@hadoop.apache.org>
Subject: Error:java heap size

Hi,

I am using hadoop-2.0.0-cdh4.3.0 version (YARN) and when i tried to run a MR 
job(6gb file) i got yhe following error:

ERROR: java heap size

Plese give me a solution to solve this...

Ramya


RE: Error:java heap size

2013-06-25 Thread Devaraj k
Hi Ramya,

We need to change the –Xmx value for your Job tasks according to the memory 
allocating for map/reduce containers.

You can pass the –Xmx value for map and reduce yarn child’s using 
configurations "mapreduce.map.java.opts" and "mapreduce.reduce.java.opts".

If you are allocating 2GB for map container, you can probably pass the same 
value as –Xmx for the  mapreduce.map.java.opts and same way for reducer as well.


Thanks
Devaraj k

From: Ramya S [mailto:ram...@suntecgroup.com]
Sent: 25 June 2013 14:39
To: user@hadoop.apache.org
Subject: RE: Error:java heap size


Hi,

Error is in AM log, which is as follows:

  *   FATAL [IPC Server handler 10 on 49363] 
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: 
attempt_1372143291407_0003_m_01_0 - exited : Java heap space

  *INFO [IPC Server handler 10 on 49363] 
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report from 
attempt_1372143291407_0003_m_01_0: Error: Java heap space

  *   INFO [AsyncDispatcher event handler] 
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report 
from attempt_1372143291407_0003_m_01_0: Error: Java heap space
Thanks,
Ramya
________
From: Devaraj k [mailto:devara...@huawei.com]
Sent: Tue 6/25/2013 2:09 PM
To: user@hadoop.apache.org<mailto:user@hadoop.apache.org>
Subject: RE: Error:java heap size
Hi Ramya,

Where did you get the java heap size error?

Could you see the error in client side/RM/AM log? What is the detailed error?

Thanks
Devaraj k

From: Ramya S [mailto:ram...@suntecgroup.com]
Sent: 25 June 2013 13:10
To: user@hadoop.apache.org<mailto:user@hadoop.apache.org>
Subject: Error:java heap size

Hi,

I am using hadoop-2.0.0-cdh4.3.0 version (YARN) and when i tried to run a MR 
job(6gb file) i got yhe following error:

ERROR: java heap size

Plese give me a solution to solve this...

Ramya


RE: Error:java heap size

2013-06-25 Thread Devaraj k
Hi Ramya,

Where did you get the java heap size error?

Could you see the error in client side/RM/AM log? What is the detailed error?

Thanks
Devaraj k

From: Ramya S [mailto:ram...@suntecgroup.com]
Sent: 25 June 2013 13:10
To: user@hadoop.apache.org
Subject: Error:java heap size

Hi,

I am using hadoop-2.0.0-cdh4.3.0 version (YARN) and when i tried to run a MR 
job(6gb file) i got yhe following error:

ERROR: java heap size

Plese give me a solution to solve this...

Ramya


RE: After importing into Eclipse

2013-06-24 Thread Devaraj k
It is a good start Lokesh.

Can you go through this page http://wiki.apache.org/hadoop/HowToContribute for 
the steps and guidelines to contribute.


Thanks
Devaraj k

From: Lokesh Basu [mailto:lokesh.b...@gmail.com]
Sent: 25 June 2013 11:03
To: user@hadoop.apache.org
Subject: After importing into Eclipse

Hi all,

I built the hdfs-trunk through maven and imported the project into eclipse 
through eclipse maven plugin. and then the project view shows the following 
list which is different from the content from my hadoop-trunk folder :

hadoop-common-project
hadoop-hdfs-project
hadoop-main
hadoop-mapreduce
hadoop-mapreduce-client
hadoop-pipes
hadoop-project
hadoop-project-dist
hadoop-tools
hadoop-yarn
hadoop-yarn-applications
hadoop-yarn-project
hadoop-yarn-server
hadoop-annotations
hadoop-archives
hadoop-assemblies
hadoop-auth
hadoop-auth-examples
hadoop-client
hadoop-common
hadoop-datajoin
hadoop-dist
hadoop-distcp
hadoop-extras
hadoop-gridmix
hadoop-hdfs
hadoop-hdfs-bkjournal
hadoop-hdfs-httpfs
hadoop-mapreduce-client-app
hadoop-mapreduce-client-common
hadoop-mapreduce-client-core
hadoop-mapreduce-client-hs
hadoop-mapreduce-client-hs-plugins
hadoop-mapreduce-client-jobclient
hadoop-mapreduce-client-shuffle
hadoop-mapreduce-examples
hadoop-maven-plugins
hadoop-minicluster
hadoop-nfs
hadoop-rumen
hadoop-streaming
hadoop-tools-dist
hadoop-yarn-api
hadoop-yarn-applications-distributedshell
hadoop-yarn-applications-unmanaged-am-launcher
hadoop-yarn-client
hadoop-yarn-common
hadoop-yarn-server-common
hadoop-yarn-server-nodemanager
hadoop-yarn-server-resourcemanager
hadoop-yarn-server-tests
hadoop-yarn-server-web-proxy
hadoop-yarn-site



I want to solve this [https://issues.apache.org/jira/browse/HDFS-336] bug which 
is for newcomer and want to get involved.


I want to know about the ways to proceed from this point.

Can anyone help me over this.


--
Lokesh Basu


RE: "could only be replicated to 0 nodes instead of minReplication" exception during job execution

2013-06-24 Thread Devaraj k
Could you check this page for any of these possible cause in your cluster.

http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo

Thanks
Devaraj k

From: Yuzhang Han [mailto:yuzhanghan1...@gmail.com]
Sent: 25 June 2013 09:34
To: user@hadoop.apache.org
Subject: Re: "could only be replicated to 0 nodes instead of minReplication" 
exception during job execution

Thank you, Omkar.
I didn't see other errors on datanode and namenode logs. My namenode 50070 
interface shows
Configured Capacity

:

393.72 GB

DFS Used

:

60.86 GB

Non DFS Used

:

137.51 GB

DFS Remaining

:

195.35 GB

DFS Used%

:

15.46%

DFS Remaining%

:

49.62%

Block Pool Used

:

60.86 GB

Block Pool Used%

:

15.46%

DataNodes usages

:

Min %

Median %

Max %

stdev %


14.55%

16.37%

16.37%

0.91%



It doesn't imply insufficient disk space, does it? Can you think of any other 
possible cause of the exceptions?

On Mon, Jun 24, 2013 at 6:17 PM, Omkar Joshi 
mailto:ojo...@hortonworks.com>> wrote:
Hi,

I see there are 2 datanodes and for some reason namenode is not able to create 
even single replica for requested blocks. are you sure the system on which 
these datanodes are running have sufficient disk space? Do you see any other 
errors in datanode/namenode logs?

What must be happening is as file creation in hdfs is failing it is marking 
that reduce attempt as failure and restarting it. Keep checking namenode state 
when it reaches 67%.

Thanks,
Omkar Joshi
Hortonworks Inc.<http://www.hortonworks.com>

On Mon, Jun 24, 2013 at 3:01 PM, Yuzhang Han 
mailto:yuzhanghan1...@gmail.com>> wrote:
Hello,
I am using YARN. I get some exceptions at my namenode and datanode. They are 
thrown when my Reduce progress gets 67%. Then, reduce phase is restarted from 
0% several times, but always restarts at this point. Can someone tell me what I 
should do? Many thanks!

Namenode log:

2013-06-24 19:08:50,345 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap 
updated: 10.224.2.190:50010<http://10.224.2.190:50010> is added to 
blk_654446797771285606_5062{blockUCState=UNDER_CONSTRUCTION, 
primaryNodeIndex=-1, 
replicas=[ReplicaUnderConstruction[10.224.2.190:50010|RBW]]} size 0

2013-06-24 19:08:50,349 WARN 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Not able to 
place enough replicas, still in need of 1 to reach 1

For more information, please enable DEBUG log level on 
org.apache.commons.logging.impl.Log4JLogger

2013-06-24 19:08:50,350 ERROR org.apache.hadoop.security.UserGroupInformation: 
PriviledgedActionException as:ubuntu (auth:SIMPLE) cause:java.io.IOException: 
File 
/output/_temporary/1/_temporary/attempt_1372090853102_0001_r_02_0/part-2
 could only be replicated to 0 nodes instead of minReplication (=1).  There are 
2 datanode(s) running and no node(s) are excluded in this operation.

2013-06-24 19:08:50,353 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 
on 9000, call org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from 
10.224.2.190:49375<http://10.224.2.190:49375>: error: java.io.IOException: File 
/output/_temporary/1/_temporary/attempt_1372090853102_0001_r_02_0/part-2
 could only be replicated to 0 nodes instead of minReplication (=1).  There are 
2 datanode(s) running and no node(s) are excluded in this operation.

java.io.IOException: File 
/output/_temporary/1/_temporary/attempt_1372090853102_0001_r_02_0/part-2
 could only be replicated to 0 nodes instead of minReplication (=1).  There are 
2 datanode(s) running and no node(s) are excluded in this operation.

at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1339)

at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2155)

at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:491)

at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:351)

at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:40744)

at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:454)

at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1014)

at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1741)

at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1737)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:416)

at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1478)

at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1735)

2013-06-24 19:08:50,413 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap 
updated: 10.224.

RE: Job end notification does not always work (Hadoop 2.x)

2013-06-24 Thread Devaraj k
It is not mandatory to have running HS in the cluster. Still the user can 
submit the job without HS in the cluster, and user may expect the Job/App End 
Notification.

Thanks
Devaraj k

From: Alejandro Abdelnur [mailto:t...@cloudera.com]
Sent: 24 June 2013 21:42
To: user@hadoop.apache.org
Cc: user@hadoop.apache.org
Subject: Re: Job end notification does not always work (Hadoop 2.x)

if we ought to do this in a yarn service it
should be the RM or the HS. the RM is, IMO, the natural fit. the HS, would be a 
good choice if we are concerned about the extra work this would cause in the 
RM. the problem with the current HS is that it is MR specific, we should 
generalize it for diff AM types.

thx

Alejandro
(phone typing)

On Jun 23, 2013, at 23:28, Devaraj k 
mailto:devara...@huawei.com>> wrote:
Even if we handle all the failure cases in AM for Job End Notification, we may 
miss cases like abrupt kill of AM when it is in last retry. If we choose NM to 
give the notification, again RM needs to identify which NM should give the 
end-notification as we don't have any direct protocol between AM and NM.

I feel it would be better to move End-Notification responsibility to RM as Yarn 
Service because it ensures 100% notification and also useful for other types of 
applications as well.


Thanks
Devaraj K

From: Ravi Prakash [mailto:ravi...@ymail.com]
Sent: 23 June 2013 19:01
To: user@hadoop.apache.org<mailto:user@hadoop.apache.org>
Subject: Re: Job end notification does not always work (Hadoop 2.x)

Hi Alejandro,

Thanks for your reply! I was thinking more along the lines Prashant suggested 
i.e. a failure during init() should still trigger an attempt to notify (by the 
AM). But now that you mention it, maybe we would be better of including this as 
a YARN feature after all (specially with all the new AMs being written). We 
could let the NM of the AM handle the notification burden, so that the RM 
doesn't get unduly taxed. Thoughts?

Thanks
Ravi



From: Alejandro Abdelnur mailto:t...@cloudera.com>>
To: "common-u...@hadoop.apache.org<mailto:common-u...@hadoop.apache.org>" 
mailto:user@hadoop.apache.org>>
Sent: Saturday, June 22, 2013 7:37 PM
Subject: Re: Job end notification does not always work (Hadoop 2.x)

If the AM fails before doing the job end notification, at any stage of the 
execution for whatever reason, the job end notification will never be deliver. 
There is not way to fix this unless the notification is done by a Yarn service. 
The 2 'candidate' services for doing this would be the RM and the HS. The job 
notification URL is in the job conf. The RM never sees the job conf, that rules 
out the RM out unless we add, at AM registration time the possibility to 
specify a callback URL. The HS has access to the job conf, but the HS is 
currently a 'passive' service.

thx

On Sat, Jun 22, 2013 at 3:48 PM, Arun C Murthy 
mailto:a...@hortonworks.com>> wrote:
Prashanth,

 Please file a jira.

 One thing to be aware of - AMs get restarted a certain number of times for 
fault-tolerance - which means we can't just assume that failure of a single AM 
is equivalent to failure of the job.

 Only the ResourceManager is in the appropriate position to judge failure of AM 
v/s failure-of-job.

hth,
Arun

On Jun 22, 2013, at 2:44 PM, Prashant Kommireddi 
mailto:prash1...@gmail.com>> wrote:



Thanks Ravi.

Well, in this case its a no-effort :) A failure of AM init should be considered 
as failure of the job? I looked at the code and best-effort makes sense with 
respect to retry logic etc. You make a good point that there would be no 
notification in case AM OOMs, but I do feel AM init failure should send a 
notification by other means.

On Sat, Jun 22, 2013 at 2:38 PM, Ravi Prakash 
mailto:ravi...@ymail.com>> wrote:
Hi Prashant,

I would tend to agree with you. Although job-end notification is only a 
"best-effort" mechanism (i.e. we cannot always guarantee notification for 
example when the AM OOMs), I agree with you that we can do more. If you feel 
strongly about this, please create a JIRA and possibly upload a patch.

Thanks
Ravi



From: Prashant Kommireddi mailto:prash1...@gmail.com>>
To: "user@hadoop.apache.org<mailto:user@hadoop.apache.org>" 
mailto:user@hadoop.apache.org>>
Sent: Thursday, June 20, 2013 9:45 PM
Subject: Job end notification does not always work (Hadoop 2.x)

Hello,
I came across an issue that occurs with the job notification callbacks in MR2. 
It works fine if the Application master has started, but does not send a 
callback if the initializing of AM fails.
Here is the code from MRAppMaster.java

.
...

  // set job classloader if configured

  MRApps.setJobClassLoader(conf);

  initAndStartAppMaster(appMaster, conf, jobUserName);

} catch (Throwable t) {

  LOG.fatal(

RE: Job end notification does not always work (Hadoop 2.x)

2013-06-23 Thread Devaraj k
Even if we handle all the failure cases in AM for Job End Notification, we may 
miss cases like abrupt kill of AM when it is in last retry. If we choose NM to 
give the notification, again RM needs to identify which NM should give the 
end-notification as we don't have any direct protocol between AM and NM.

I feel it would be better to move End-Notification responsibility to RM as Yarn 
Service because it ensures 100% notification and also useful for other types of 
applications as well.


Thanks
Devaraj K

From: Ravi Prakash [mailto:ravi...@ymail.com]
Sent: 23 June 2013 19:01
To: user@hadoop.apache.org
Subject: Re: Job end notification does not always work (Hadoop 2.x)

Hi Alejandro,

Thanks for your reply! I was thinking more along the lines Prashant suggested 
i.e. a failure during init() should still trigger an attempt to notify (by the 
AM). But now that you mention it, maybe we would be better of including this as 
a YARN feature after all (specially with all the new AMs being written). We 
could let the NM of the AM handle the notification burden, so that the RM 
doesn't get unduly taxed. Thoughts?

Thanks
Ravi



From: Alejandro Abdelnur mailto:t...@cloudera.com>>
To: "common-u...@hadoop.apache.org<mailto:common-u...@hadoop.apache.org>" 
mailto:user@hadoop.apache.org>>
Sent: Saturday, June 22, 2013 7:37 PM
Subject: Re: Job end notification does not always work (Hadoop 2.x)

If the AM fails before doing the job end notification, at any stage of the 
execution for whatever reason, the job end notification will never be deliver. 
There is not way to fix this unless the notification is done by a Yarn service. 
The 2 'candidate' services for doing this would be the RM and the HS. The job 
notification URL is in the job conf. The RM never sees the job conf, that rules 
out the RM out unless we add, at AM registration time the possibility to 
specify a callback URL. The HS has access to the job conf, but the HS is 
currently a 'passive' service.

thx

On Sat, Jun 22, 2013 at 3:48 PM, Arun C Murthy 
mailto:a...@hortonworks.com>> wrote:
Prashanth,

 Please file a jira.

 One thing to be aware of - AMs get restarted a certain number of times for 
fault-tolerance - which means we can't just assume that failure of a single AM 
is equivalent to failure of the job.

 Only the ResourceManager is in the appropriate position to judge failure of AM 
v/s failure-of-job.

hth,
Arun

On Jun 22, 2013, at 2:44 PM, Prashant Kommireddi 
mailto:prash1...@gmail.com>> wrote:


Thanks Ravi.

Well, in this case its a no-effort :) A failure of AM init should be considered 
as failure of the job? I looked at the code and best-effort makes sense with 
respect to retry logic etc. You make a good point that there would be no 
notification in case AM OOMs, but I do feel AM init failure should send a 
notification by other means.

On Sat, Jun 22, 2013 at 2:38 PM, Ravi Prakash 
mailto:ravi...@ymail.com>> wrote:
Hi Prashant,

I would tend to agree with you. Although job-end notification is only a 
"best-effort" mechanism (i.e. we cannot always guarantee notification for 
example when the AM OOMs), I agree with you that we can do more. If you feel 
strongly about this, please create a JIRA and possibly upload a patch.

Thanks
Ravi



From: Prashant Kommireddi mailto:prash1...@gmail.com>>
To: "user@hadoop.apache.org<mailto:user@hadoop.apache.org>" 
mailto:user@hadoop.apache.org>>
Sent: Thursday, June 20, 2013 9:45 PM
Subject: Job end notification does not always work (Hadoop 2.x)

Hello,
I came across an issue that occurs with the job notification callbacks in MR2. 
It works fine if the Application master has started, but does not send a 
callback if the initializing of AM fails.
Here is the code from MRAppMaster.java

.
...

  // set job classloader if configured

  MRApps.setJobClassLoader(conf);

  initAndStartAppMaster(appMaster, conf, jobUserName);

} catch (Throwable t) {

  LOG.fatal("Error starting MRAppMaster", t);

  System.exit(1);

}

  }

protected static void initAndStartAppMaster(final MRAppMaster appMaster,

  final YarnConfiguration conf, String jobUserName) throws IOException,

  InterruptedException {

UserGroupInformation.setConfiguration(conf);

UserGroupInformation appMasterUgi = UserGroupInformation

.createRemoteUser(jobUserName);

appMasterUgi.doAs(new PrivilegedExceptionAction() {

  @Override

  public Object run() throws Exception {

appMaster.init(conf);

appMaster.start();

if(appMaster.errorHappenedShutDown) {

  throw new IOException("Was asked to shut down.");

}

return null;

  }

});

  }
appMaster.init(conf) does not dispatch JobFinishEventHandler which is 
responsible for 

RE: MapReduce job not running - i think i keep all correct configuration.

2013-06-23 Thread Devaraj k
Do you see any problem in JT or TT for Job not running?


Thanks
Devaraj K

From: Pavan Kumar Polineni [mailto:smartsunny...@gmail.com]
Sent: 23 June 2013 19:20
To: user@hadoop.apache.org; Ravi Prakash
Subject: Re: MapReduce job not running - i think i keep all correct 
configuration.

Hi ravi,

after checking the config in one Mapred-site.xml i keep replication factor of 1 
instead of 2. after changing this i restarted all the demons . But still the 
problem exits. Can you come to gtalk i can explain you more. Thanks ..

On Sun, Jun 23, 2013 at 7:04 PM, Ravi Prakash 
mailto:ravi...@ymail.com>> wrote:
Hi Pavan,

I assure you this configuration works. The problem is very likely in your 
configuration files. Please look them over once again. Also did you restart 
your daemons after changing the configuration? Some configurations necessarily 
require a restart.

Ravi.



From: Pavan Kumar Polineni 
mailto:smartsunny...@gmail.com>>
To: user@hadoop.apache.org<mailto:user@hadoop.apache.org>
Sent: Sunday, June 23, 2013 6:20 AM
Subject: MapReduce job not running - i think i keep all correct configuration.


Hi all,

first i have a machine with all the demons are running on it. after that i 
added two data nodes. In this case MR job working fine.

Now i changed the first machine to just namenode by stopping all the demons 
except NN demon. and changed i data node to (SNN.JT,DN,TT) and all are working. 
i keep the other data node like that only.

I changed the configurations to link up the NN and JT.

>From here when i tried to run MR job this is not running ..

Please help Me. Thanks

--
 Pavan Kumar Polineni




--
 Pavan Kumar Polineni


RE: How Yarn execute MRv1 job?

2013-06-18 Thread Devaraj k
Hi Sam,
  Please find the answers for your queries.

>- Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job has 
>special execution process(map > shuffle > reduce) in Hadoop 1.x, and how Yarn 
>execute a MRv1 job? still include some special MR steps in Hadoop 1.x, like 
>map, sort, merge, combine and shuffle?

In Yarn, it is a concept of application. MR Job is one kind of application 
which makes use of MRAppMaster(i.e ApplicationMaster for the application). If 
we want to run different kinds of applications we should have ApplicationMaster 
for each kind of application.

>- Do the MRv1 parameters still work for Yarn? Like mapreduce.task.io.sort.mb 
>and mapreduce.map.sort.spill.percent?
These configurations still work for MR Job in Yarn.

>- What's the general process for ApplicationMaster of Yarn to execute a job?
MRAppMaster(Application Master for MR Job) does the Job life cycle which 
includes getting the containers for maps & reducers, launch the containers 
using NM, tacks the tasks status till completion, manage the failed tasks.

>2. In Hadoop 1.x, we can set the map/reduce slots by setting 
>'mapred.tasktracker.map.tasks.maximum' and 
>'mapred.tasktracker.reduce.tasks.maximum'
>- For Yarn, above tow parameter do not work any more, as yarn uses container 
>instead, right?
Correct, these params don't work in yarn. In Yarn it is completely based on the 
resources(memory, cpu). Application Master can request the RM for resources to 
complete the tasks for that application.

>- For Yarn, we can set the whole physical mem for a NodeManager using 
>'yarn.nodemanager.resource.memory-mb'. But how to set the default size of 
>physical mem of a container?
ApplicationMaster is responsible for getting the containers from RM by sending 
the resource requests. For MR Job, you can use "mapreduce.map.memory.mb" and 
"mapreduce.reduce.memory.mb" configurations for specifying the map & reduce 
container memory sizes.

>- How to set the maximum size of physical mem of a container? By the parameter 
>of 'mapred.child.java.opts'?
It can be set based on the resources requested for that container.


Thanks
Devaraj K
From: sam liu [mailto:samliuhad...@gmail.com]
Sent: 19 June 2013 08:16
To: user@hadoop.apache.org
Subject: How Yarn execute MRv1 job?

Hi,

1.In Hadoop 1.x, a job will be executed by map task and reduce task together, 
with a typical process(map > shuffle > reduce). In Yarn, as I know, a MRv1 job 
will be executed only by ApplicationMaster.
- Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job has 
special execution process(map > shuffle > reduce) in Hadoop 1.x, and how Yarn 
execute a MRv1 job? still include some special MR steps in Hadoop 1.x, like 
map, sort, merge, combine and shuffle?
- Do the MRv1 parameters still work for Yarn? Like mapreduce.task.io.sort.mb 
and mapreduce.map.sort.spill.percent?
- What's the general process for ApplicationMaster of Yarn to execute a job?

2. In Hadoop 1.x, we can set the map/reduce slots by setting 
'mapred.tasktracker.map.tasks.maximum' and 
'mapred.tasktracker.reduce.tasks.maximum'
- For Yarn, above tow parameter do not work any more, as yarn uses container 
instead, right?
- For Yarn, we can set the whole physical mem for a NodeManager using 
'yarn.nodemanager.resource.memory-mb'. But how to set the default size of 
physical mem of a container?
- How to set the maximum size of physical mem of a container? By the parameter 
of 'mapred.child.java.opts'?
Thanks!


RE: Debugging YARN AM

2013-06-17 Thread Devaraj k
Hi Curtis,

 "yarn.app.mapreduce.am.command-opts" configuration is specific to 
MRAppMaster. It is not applicable for DistributedShell AM.

If you want to dump out debug information then you can make use of the debug 
option of DistributedShell application. If you want to debug by connecting 
remotely, you need to update the DS application code accordingly.

Thanks
Devaraj K

From: Curtis Ullerich [mailto:curtisuller...@gmail.com]
Sent: 18 June 2013 08:19
To: user@hadoop.apache.org
Subject: Debugging YARN AM

Hi all,

I can successfully debug the MapReduce ApplicationMaster in standalone mode by 
launching the pi estimator example with this command:

hadoop jar hadoop-mapreduce-examples-3.0.0-SNAPSHOT.jar pi 
"-Dyarn.app.mapreduce.am.command-opts=-Xdebug 
-Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=8000" 10 10

and then attaching a debugger to port 8000 using Eclipse. This doesn't work 
with the DistributedShell example, presumably because it's not configurable 
with yarn.app.mapreduce.am.command as it's not MapReduce. Looking in 
yarn-default.xml, I don't see an equivalent parameter. For learning purposes, 
how can I debug the DistributedShell example (and other AMs)?

Thanks!

Curtis


RE: Environment variable representing classpath for AM launch

2013-06-17 Thread Devaraj k
Hi Rahul,

You can make use of the below configuration to set up the 
launch context for your application master.

  

  
CLASSPATH for YARN applications. A comma-separated list
of CLASSPATH entries
 yarn.application.classpath
 
$HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/share/hadoop/common/*,$HADOOP_COMMON_HOME/share/hadoop/common/lib/*,$HADOOP_HDFS_HOME/share/hadoop/hdfs/*,$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*,$HADOOP_YARN_HOME/share/hadoop/yarn/*,$HADOOP_YARN_HOME/share/hadoop/yarn/lib/*
  

Thanks
Devaraj K

From: Rahul Bhattacharjee [mailto:rahul.rec@gmail.com]
Sent: 18 June 2013 10:16
To: user@hadoop.apache.org
Subject: Environment variable representing classpath for AM launch

Hi,
Is ther's any environment variable (available in all nodes of an Yarn cluster) 
which represents a java classpath containing all the core jars of yarn. I was 
thinking to use that variable to setup the environment where to run the 
application master.
Thanks,
Rahul


Re: Unsubscribe

2013-06-17 Thread Devaraj k
Hi,

   You need to send a mail to user-unsubscr...@hadoop.apache.org for
unsubscribing from this list.

http://hadoop.apache.org/mailing_lists.html#User


Thanks
Devaraj K
On 6/17/13, Manny Vazquez  wrote:
>
>
> Manuel Vazquez, BI Administrator, IS Engineering
> Apollo Group | Apollo Tech
> 4035 S Riverpoint Parkway | MS: CF-L205| Phoenix, AZ 85040
> Office: 602-557-6979 | Cell: 602-317-1690
> email: manny.vazq...@apollogrp.edu<mailto:manny.vazq...@apollogrp.edu>
>
> P Please consider the environment before printing this email.
> http://www.apollogrp.edu<http://www.apollogrp.edu/>
>
>
> 
> This message is private and confidential. If you have received it in error,
> please notify the sender and remove it from your system.
>
>


RE: Assigning the same partition number to the mapper output

2013-06-16 Thread Devaraj k
If you are using TextOutputFormat for your job,  getRecordWriter() (i.e 
RecordWriter 
org.apache.hadoop.mapreduce.lib.output.TextOutputFormat.getRecordWriter(TaskAttemptContext
 job) throws IOException, InterruptedException) method uses  
FileOutputFormat.getDefaultWorkFile() for generating the file names. It uses 
$output/_temporary/$taskid/part-[mr]-$id format to generate the o/p path name 
for task in temp dir.

http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/lib/output/FileOutputFormat.html#getDefaultWorkFile(org.apache.hadoop.mapreduce.TaskAttemptContext,
 java.lang.String)


If you want to change the o/p path name for the task, you need to override this 
method for your Job accordingly whichever you want.

If you want change only base output name for the output file(default value is 
"part"), you can use "mapreduce.output.basename" configuration.



Thanks&Regards
  Devaraj K

From: Maysam Hossein Yabandeh [mailto:myaban...@qf.org.qa]
Sent: 12 June 2013 21:16
To: user@hadoop.apache.org
Cc: Maysam Hossein Yabandeh
Subject: Assigning the same partition number to the mapper output

Hi,

I was wondering if it is possible in hadoop to assign the same partition 
numbers to the map outputs. I am running a map-only job (with zero reducers) 
and hadoop shuffles the partitions in the output: i.e. input/part-m-X is 
processed by task number Y and hence generates output/part-m-Y (where X != 
Y).

Thanks
Maysam

CONFIDENTIALITY NOTICE:
This email and any attachments transmitted with it are confidential and 
intended for the use of individual or entity to which it is addressed. If you 
have received this email in error, please delete it immediately and inform the 
sender. Unless you are the intended recipient, you may not use, disclose, copy 
or distribute this email or any attachments included. The contents of the 
emails including any attachments may be subjected to copyrights law, In such 
case the contents may not be copied, adapted, distributed or transmitted 
without the consent of the copyright owner.


RE: JobTracker UI shows only one node instead of 2

2013-06-13 Thread Devaraj k
Can you check the Job Tracker and task Tracker log files, whether any problem 
while starting the Task Tracker or any problem while connecting to Job 
tracker...

Thanks
Devaraj

From: Vikas Jadhav [mailto:vikascjadha...@gmail.com]
Sent: 13 June 2013 12:22
To: user@hadoop.apache.org
Subject: JobTracker UI shows only one node instead of 2

I have set up hadoop cluster on two node but JobTracker UI in Cluster summary 
shows only one node

Namenode shows Live nodes 2 but data is always put on same master node
not on slave node

On master node - jps
 all process are running

On slave node -jps
tasktracke and datanode are running




i have checked /etc/host also it is correct
what should i do ?

--


  Regards,
   Vikas


RE: Hadoop in Pseudo-Distributed

2012-08-12 Thread Devaraj k
Can you go through this issue 
https://issues.apache.org/jira/browse/HADOOP-7489, It is discussed and provided 
some workarounds for this problem.





Thanks

Devaraj


From: Subho Banerjee [subs.z...@gmail.com]
Sent: Monday, August 13, 2012 10:47 AM
To: user@hadoop.apache.org
Subject: Hadoop in Pseudo-Distributed


Hello,

I am running hadoop v1.0.3 in Mac OS X 10.8 with Java_1.6.0_33-b03-424


When running hadoop on pseudo-distributed mode, the map seems to work, but it 
cannot compute the reduce.

12/08/13 08:58:12 INFO mapred.JobClient: Running job: job_201208130857_0001
12/08/13 08:58:13 INFO mapred.JobClient: map 0% reduce 0%
12/08/13 08:58:27 INFO mapred.JobClient: map 20% reduce 0%
12/08/13 08:58:33 INFO mapred.JobClient: map 30% reduce 0%
12/08/13 08:58:36 INFO mapred.JobClient: map 40% reduce 0%
12/08/13 08:58:39 INFO mapred.JobClient: map 50% reduce 0%
12/08/13 08:58:42 INFO mapred.JobClient: map 60% reduce 0%
12/08/13 08:58:45 INFO mapred.JobClient: map 70% reduce 0%
12/08/13 08:58:48 INFO mapred.JobClient: map 80% reduce 0%
12/08/13 08:58:51 INFO mapred.JobClient: map 90% reduce 0%
12/08/13 08:58:54 INFO mapred.JobClient: map 100% reduce 0%
12/08/13 08:59:14 INFO mapred.JobClient: Task Id : 
attempt_201208130857_0001_m_00_0, Status : FAILED
Too many fetch-failures
12/08/13 08:59:14 WARN mapred.JobClient: Error reading task outputServer 
returned HTTP response code: 403 for URL: 
http://10.1.66.17:50060/tasklog?plaintext=true&attemptid=attempt_201208130857_0001_m_00_0&filter=stdout
12/08/13 08:59:14 WARN mapred.JobClient: Error reading task outputServer 
returned HTTP response code: 403 for URL: 
http://10.1.66.17:50060/tasklog?plaintext=true&attemptid=attempt_201208130857_0001_m_00_0&filter=stderr
12/08/13 08:59:18 INFO mapred.JobClient: map 89% reduce 0%
12/08/13 08:59:21 INFO mapred.JobClient: map 100% reduce 0%
12/08/13 09:00:14 INFO mapred.JobClient: Task Id : 
attempt_201208130857_0001_m_01_0, Status : FAILED
Too many fetch-failures

Here is what I get when I try to see the tasklog using the links given in the 
output

http://10.1.66.17:50060/tasklog?plaintext=true&attemptid=attempt_201208130857_0001_m_00_0&filter=stderr
 --->
2012-08-13 08:58:39.189 java[74092:1203] Unable to load realm info from 
SCDynamicStore

http://10.1.66.17:50060/tasklog?plaintext=true&attemptid=attempt_201208130857_0001_m_00_0&filter=stdout
 --->

I have changed my hadoop-env.sh acoording to Mathew Buckett in 
https://issues.apache.org/jira/browse/HADOOP-7489

Also this error of Unable to load realm info from SCDynamicStore does not show 
up when I do 'hadoop namenode -format' or 'start-all.sh'

I am also attaching a zipped copy of my logs


Cheers,

Subho.