Failed to run wordcount on YARN

2013-07-12 Thread Liu, Raymond
Hi 

I just start to try out hadoop2.0, I use the 2.0.5-alpha package

And follow 

http://hadoop.apache.org/docs/r2.0.5-alpha/hadoop-project-dist/hadoop-common/ClusterSetup.html

to setup a cluster in non-security mode. HDFS works fine with client tools.

While when I run wordcount example, there are errors :

./bin/hadoop jar 
./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.0.5-alpha.jar wordcount 
/tmp /out


13/07/12 15:05:53 INFO mapreduce.Job: Task Id : 
attempt_1373609123233_0004_m_04_0, Status : FAILED
Error: java.io.FileNotFoundException: Path is not a file: /tmp/hadoop-yarn
at 
org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:42)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1317)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1276)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1252)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1225)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:403)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:239)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:40728)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:454)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1014)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1741)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1737)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1478)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1735)

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:525)
at 
org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:90)
at 
org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:57)
at 
org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:986)
at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:974)
at 
org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:157)
at 
org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:124)
at org.apache.hadoop.hdfs.DFSInputStream.(DFSInputStream.java:117)
at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1131)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:244)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:77)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:713)
at 
org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:89)
at 
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:519)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:756)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:158)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1478)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:153)

I check the HDFS and found /tmp/hadoop-yarn is there , this dir's owner is the 
same as the job user.

And to ensure it works, I also create /tmp/hadoop-yarn on local fs. None of it 
works.

Any idea what might be the problem? Thx!


Best Regards,
Raymond Liu



how to add JournalNodes

2013-07-12 Thread lei liu
I use QJM for HA in hadoop2.0,  now  there are three JournalNodes in HDFS
cluster, I want to add two new JournalNodes to HDFS cluster, how to do it?
Do I need to restart HDFS cluster?


Thanks,

LiuLei


Re: copy files from ftp to hdfs in parallel, distcp failed

2013-07-12 Thread Hao Ren

Le 11/07/2013 20:47, Balaji Narayanan (பாலாஜி நாராயணன்) a écrit :

multiple copy jobs to hdfs


Thank you for your reply and the link.

I read the link before, but I didn't find any examples about copying 
file from ftp to hdfs.


There are about 20-40 file in my directory. I just want to move or copy 
that directory to hdfs on Amazon EC2.


Actually, I am new to hadoop. I would like to know how to do multiple 
copy jobs to hdfs without distcp.


Thank you again.

--
Hao Ren
ClaraVista
www.claravista.fr


Re: how to add JournalNodes

2013-07-12 Thread Harsh J
You need to restart your NameNodes to get them to use the new QJM
5-host-set configs, and I think you can do that without downtime if
you're already in HA mode by restarting one NN at a time.

To add new JNs first though, you will currently have to rsync their
directory from a good JN to get them into the cluster (i.e. rsync the
data from a good one before you start the new JNs). They will not
auto-join otherwise.

On Fri, Jul 12, 2013 at 12:57 PM, lei liu  wrote:
> I use QJM for HA in hadoop2.0,  now  there are three JournalNodes in HDFS
> cluster, I want to add two new JournalNodes to HDFS cluster, how to do it?
> Do I need to restart HDFS cluster?
>
>
> Thanks,
>
> LiuLei



-- 
Harsh J


RE: Failed to run wordcount on YARN

2013-07-12 Thread Devaraj k
Hi Raymond, 

In Hadoop 2.0.5 version, FileInputFormat new API doesn't support 
reading the files recursively in input dir. In supports only having the input 
dir with files. If the input dir has any child dir's then it throws below error.

This has been added in trunk with this JIRA 
https://issues.apache.org/jira/browse/MAPREDUCE-3193.

You can give input dir to the Job which doesn't have nested dir's or you can 
make use of the old FileInputFormat API to read files recursively in the sub 
dir's.

Thanks
Devaraj k

-Original Message-
From: Liu, Raymond [mailto:raymond@intel.com] 
Sent: 12 July 2013 12:57
To: user@hadoop.apache.org
Subject: Failed to run wordcount on YARN

Hi 

I just start to try out hadoop2.0, I use the 2.0.5-alpha package

And follow 

http://hadoop.apache.org/docs/r2.0.5-alpha/hadoop-project-dist/hadoop-common/ClusterSetup.html

to setup a cluster in non-security mode. HDFS works fine with client tools.

While when I run wordcount example, there are errors :

./bin/hadoop jar 
./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.0.5-alpha.jar wordcount 
/tmp /out


13/07/12 15:05:53 INFO mapreduce.Job: Task Id : 
attempt_1373609123233_0004_m_04_0, Status : FAILED
Error: java.io.FileNotFoundException: Path is not a file: /tmp/hadoop-yarn
at 
org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:42)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1317)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1276)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1252)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1225)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:403)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:239)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:40728)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:454)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1014)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1741)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1737)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1478)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1735)

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:525)
at 
org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:90)
at 
org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:57)
at 
org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:986)
at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:974)
at 
org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:157)
at 
org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:124)
at org.apache.hadoop.hdfs.DFSInputStream.(DFSInputStream.java:117)
at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1131)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:244)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:77)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:713)
at 
org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:89)
at 
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:519)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:756)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:158)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1478)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:153)

I check the HDFS and found /tmp/hadoop-yarn is there , th

unsubscribe

2013-07-12 Thread Margusja




RE: unsubscribe

2013-07-12 Thread Devaraj k
You need to send a mail to user-unsubscr...@hadoop.apache.org for unsubscribe. 

http://hadoop.apache.org/mailing_lists.html#User

Thanks
Devaraj k


-Original Message-
From: Margusja [mailto:mar...@roo.ee] 
Sent: 12 July 2013 14:26
To: user@hadoop.apache.org
Subject: unsubscribe




RE: Taktracker in namenode failure

2013-07-12 Thread Devaraj k
Could you tell, what is the Map Output Value class you are configuring while 
submitting Job and what is the type of the value writing from the Mapper. If 
both of these mismatches then it will trow the below error.

Thanks
Devaraj k

From: Ramya S [mailto:ram...@suntecgroup.com]
Sent: 12 July 2013 14:46
To: user@hadoop.apache.org
Subject: Taktracker in namenode failure

Hi,

Why only tasktracker in namenode faill during  job execution with error.
I have attached the snapshot of error screen with this mail

java.io.IOException: Type mismatch in value from map: expected 
org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.IntWritable

at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1019)

at 
org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:691)

at 
org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)

at WordCount$TokenizerMapper.map(WordCount.java:30)

at WordCount$TokenizerMapper.map(WordCount.java:19)

at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)

at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)

at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)

at org.apache.hadoop.mapred.Child$4.run(Child.java:255)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:416)

at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)

at org.apache.hadoop.mapred.Child.main(Child.java:249)



but  this same task is reassigned to another tasktracker and getting executed. 
why?


Best Regards,
Ramya


how to get hadoop HDFS path?

2013-07-12 Thread ch huang
i want set hdfs path ,AND add the path into hbase,here is my code

 Path path = new Path("hdfs:192.168.10.22:9000/alex/test.jar");
  System.out.println(":
"+path.toString()+"|"+TestMyCo.class.getCanonicalName()+"|"+Coprocessor.PRIORITY_USER);

  htd.setValue("COPROCESSOR$1", path.toString()+"|"
+ TestMyCo.class.getCanonicalName()+"|"+Coprocessor.PRIORITY_USER);

and the real value which i find is

hbase(main):012:0> describe 'mytest'
DESCRIPTION
ENABLED
 {NAME => 'mytest', COPROCESSOR$1 => 'hdfs:/
192.168.10.22:9000/alex/test.jar|TestMyCo|1073741823', FAMILIES => [{N true
 AME => 'myfl', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'NONE',
REPLICATION_SCOPE => '0', VERSIONS => '3', C
 OMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => '2147483647',
KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '6553
 6', IN_MEMORY => 'false', ENCODE_ON_DISK => 'true', BLOCKCACHE => 'true'}]}
1 row(s) in 0.0930 seconds


RE: Taktracker in namenode failure

2013-07-12 Thread Ramya S
Both the map output value  class configured and the output value  written from 
the mapper is Text class. So there is no mismatch in the value class.
 
 But when the same MR program is run with 2 tasktrackers(without tasktracker in 
namenode) exception is not occuring.
 
The problem is only with the tasktracker running in the namenode.
 
 
 
Thanks & Regards
 
Ramya.S



From: Devaraj k [mailto:devara...@huawei.com]
Sent: Fri 7/12/2013 3:04 PM
To: user@hadoop.apache.org
Subject: RE: Taktracker in namenode failure



Could you tell, what is the Map Output Value class you are configuring while 
submitting Job and what is the type of the value writing from the Mapper. If 
both of these mismatches then it will trow the below error.

 

Thanks

Devaraj k

 

From: Ramya S [mailto:ram...@suntecgroup.com] 
Sent: 12 July 2013 14:46
To: user@hadoop.apache.org
Subject: Taktracker in namenode failure

 

Hi,

 

Why only tasktracker in namenode faill during  job execution with error.

I have attached the snapshot of error screen with this mail

java.io.IOException: Type mismatch in value from map: expected 
org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.IntWritable
at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1019)
at 
org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:691)
at 
org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
at WordCount$TokenizerMapper.map(WordCount.java:30)
at WordCount$TokenizerMapper.map(WordCount.java:19)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
 
but  this same task is reassigned to another tasktracker and getting executed. 
why?

 

 

Best Regards,

Ramya

<>

RE: Taktracker in namenode failure

2013-07-12 Thread Devaraj k
I think, there is mismatch of jar’s coming in the classpath for the map tasks 
when it runs in different machines. You can find out this, by giving some 
unique name for your Mapper class, Job Submit class and then submit the Job.

Thanks
Devaraj k

From: Ramya S [mailto:ram...@suntecgroup.com]
Sent: 12 July 2013 15:27
To: user@hadoop.apache.org
Subject: RE: Taktracker in namenode failure

Both the map output value  class configured and the output value  written from 
the mapper is Text class. So there is no mismatch in the value class.

 But when the same MR program is run with 2 tasktrackers(without tasktracker in 
namenode) exception is not occuring.

The problem is only with the tasktracker running in the namenode.



Thanks & Regards

Ramya.S


From: Devaraj k [mailto:devara...@huawei.com]
Sent: Fri 7/12/2013 3:04 PM
To: user@hadoop.apache.org
Subject: RE: Taktracker in namenode failure
Could you tell, what is the Map Output Value class you are configuring while 
submitting Job and what is the type of the value writing from the Mapper. If 
both of these mismatches then it will trow the below error.

Thanks
Devaraj k

From: Ramya S [mailto:ram...@suntecgroup.com]
Sent: 12 July 2013 14:46
To: user@hadoop.apache.org
Subject: Taktracker in namenode failure

Hi,

Why only tasktracker in namenode faill during  job execution with error.
I have attached the snapshot of error screen with this mail

java.io.IOException: Type mismatch in value from map: expected 
org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.IntWritable

at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1019)

at 
org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:691)

at 
org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)

at WordCount$TokenizerMapper.map(WordCount.java:30)

at WordCount$TokenizerMapper.map(WordCount.java:19)

at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)

at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)

at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)

at org.apache.hadoop.mapred.Child$4.run(Child.java:255)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:416)

at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)

at org.apache.hadoop.mapred.Child.main(Child.java:249)



but  this same task is reassigned to another tasktracker and getting executed. 
why?


Best Regards,
Ramya


Re: unsubscribe

2013-07-12 Thread Manal Helal
how do I keep my subscription, but receive no notifications of threads I
didn't initiate or subscribe to,

thanks,


On 12 July 2013 11:11, Devaraj k  wrote:

> You need to send a mail to user-unsubscr...@hadoop.apache.org for
> unsubscribe.
>
> http://hadoop.apache.org/mailing_lists.html#User
>
> Thanks
> Devaraj k
>
>
> -Original Message-
> From: Margusja [mailto:mar...@roo.ee]
> Sent: 12 July 2013 14:26
> To: user@hadoop.apache.org
> Subject: unsubscribe
>
>
>


-- 
Kind Regards,

Manal Helal


Re: unsubscribe

2013-07-12 Thread Nitin Pawar
Manal,

Its a user based mailing list and not a topic based. so its a boolean
value, if you are subscribed to mails you will get all of them else you
will not get any


On Fri, Jul 12, 2013 at 4:24 PM, Manal Helal  wrote:

> how do I keep my subscription, but receive no notifications of threads I
> didn't initiate or subscribe to,
>
> thanks,
>
>
> On 12 July 2013 11:11, Devaraj k  wrote:
>
>> You need to send a mail to user-unsubscr...@hadoop.apache.org for
>> unsubscribe.
>>
>> http://hadoop.apache.org/mailing_lists.html#User
>>
>> Thanks
>> Devaraj k
>>
>>
>> -Original Message-
>> From: Margusja [mailto:mar...@roo.ee]
>> Sent: 12 July 2013 14:26
>> To: user@hadoop.apache.org
>> Subject: unsubscribe
>>
>>
>>
>
>
> --
> Kind Regards,
>
> Manal Helal
>



-- 
Nitin Pawar


Re: Staging directory ENOTDIR error.

2013-07-12 Thread Ram
Hi jay,
what hadoop command you are given.

Hi,



From,
Ramesh.




On Fri, Jul 12, 2013 at 7:54 AM, Devaraj k  wrote:

>  Hi Jay,
>
> ** **
>
>Here client is trying to create a staging directory in local file
> system,  which actually should create in HDFS.
>
> ** **
>
> Could you check whether do you have configured “fs.defaultFS”
> configuration in client with the HDFS.
>
> 
>
> ** **
>
> Thanks
>
> Devaraj k
>
> ** **
>
> *From:* Jay Vyas [mailto:jayunit...@gmail.com]
> *Sent:* 12 July 2013 04:12
> *To:* common-u...@hadoop.apache.org
> *Subject:* Staging directory ENOTDIR error.
>
> ** **
>
> Hi , I'm getting an ungoogleable exception, never seen this before. 
>
> This is on a hadoop 1.1. cluster... It appears that its permissions
> related... 
>
> Any thoughts as to how this could crop up?
>
> I assume its a bug in my filesystem, but not sure.
>
>
> 13/07/11 18:39:43 ERROR security.UserGroupInformation:
> PriviledgedActionException as:root cause:ENOTDIR: Not a directory
> ENOTDIR: Not a directory
> at org.apache.hadoop.io.nativeio.NativeIO.chmod(Native Method)
> at org.apache.hadoop.fs.FileUtil.execSetPermission(FileUtil.java:699)
> at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:654)
> at
> org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:509)
> at
> org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:344)
> at
> org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:189)
> at
> org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:116)
>
> 
>
>
> --
> Jay Vyas
> http://jayunit100.blogspot.com 
>


Re: copy files from ftp to hdfs in parallel, distcp failed

2013-07-12 Thread Ram
Hi,
   Please configure the following in core-ste.xml and try.
   Use hadoop fs -ls file:///  -- to display local file system files
   Use hadoop fs -ls ftp://   -- to display ftp files if
it is listing files go for distcp.

reference from
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/core-default.xml


fs.ftp.host0.0.0.0FTP filesystem connects to this serverfs.ftp.host.port21FTP
filesystem connects to fs.ftp.host on this port
and try to set the property also

reference from hadoop definitive guide hadoop file system.

Filesystem URI scheme Java implementation
Description
  (all under org.apache.hadoop)

FTP ftp fs.ftp.FTPFileSystem
A filesystem backed by an FTP server.


Hi,



From,
Ramesh.




On Fri, Jul 12, 2013 at 1:04 PM, Hao Ren  wrote:

> Le 11/07/2013 20:47, Balaji Narayanan (பாலாஜி நாராயணன்) a écrit :
>
>> multiple copy jobs to hdfs
>>
>
> Thank you for your reply and the link.
>
> I read the link before, but I didn't find any examples about copying file
> from ftp to hdfs.
>
> There are about 20-40 file in my directory. I just want to move or copy
> that directory to hdfs on Amazon EC2.
>
> Actually, I am new to hadoop. I would like to know how to do multiple copy
> jobs to hdfs without distcp.
>
> Thank you again.
>
>
> --
> Hao Ren
> ClaraVista
> www.claravista.fr
>


Re: Taktracker in namenode failure

2013-07-12 Thread Ram
Hi,
The problem is with jar file only, to check run any other MR job or
sample wordcount job on namenode tasktracker, if it is running no problem
with namenode tasktracker, if not running there may be problem with
tasktracker configuration, then compare with other node tasktracker
configuration. i.e tasktracker configuration means mapred configuration.

Hi,



From,
Ramesh.




On Fri, Jul 12, 2013 at 3:37 PM, Devaraj k  wrote:

>  I think, there is mismatch of jar’s coming in the classpath for the map
> tasks when it runs in different machines. You can find out this, by giving
> some unique name for your Mapper class, Job Submit class and then submit
> the Job.
>
> ** **
>
> Thanks
>
> Devaraj k
>
> ** **
>
> *From:* Ramya S [mailto:ram...@suntecgroup.com]
> *Sent:* 12 July 2013 15:27
> *To:* user@hadoop.apache.org
> *Subject:* RE: Taktracker in namenode failure
>
>  ** **
>
> Both the map output value  class configured and the output value  written
> from the mapper is Text class. So there is no mismatch in the value class.
> 
>
>  
>
>  But when the same MR program is run with 2 tasktrackers(without
> tasktracker in namenode) exception is not occuring.
>
>  
>
> The problem is only with the tasktracker running in the namenode.
>
>  
>
>  
>
>  
>
> *Thanks & Regards*
>
>  
>
> *Ramya.S*
>
> ** **
>  --
>
> *From:* Devaraj k [mailto:devara...@huawei.com ]
> *Sent:* Fri 7/12/2013 3:04 PM
> *To:* user@hadoop.apache.org
> *Subject:* RE: Taktracker in namenode failure
>
> Could you tell, what is the Map Output Value class you are configuring
> while submitting Job and what is the type of the value writing from the
> Mapper. If both of these mismatches then it will trow the below error.
>
>  
>
> Thanks
>
> Devaraj k
>
>  
>
> *From:* Ramya S [mailto:ram...@suntecgroup.com ]
> *Sent:* 12 July 2013 14:46
> *To:* user@hadoop.apache.org
> *Subject:* Taktracker in namenode failure
>
>  
>
> Hi,
>
>  
>
> Why only tasktracker in namenode faill during  job execution with error.**
> **
>
> I have attached the snapshot of error screen with this mail
>
> java.io.IOException: Type mismatch in value from map: expected 
> org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.IntWritable
>
> at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1019)
>
> at 
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:691)
>
> at 
> org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
>
> at WordCount$TokenizerMapper.map(WordCount.java:30)
>
> at WordCount$TokenizerMapper.map(WordCount.java:19)
>
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>
> at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>
> at java.security.AccessController.doPrivileged(Native Method)
>
> at javax.security.auth.Subject.doAs(Subject.java:416)
>
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
>
> at org.apache.hadoop.mapred.Child.main(Child.java:249)
>
>  
>
> but  this same task is reassigned to another tasktracker and getting 
> executed. why?
>
>   
>
>  
>
> *Best Regards,*
>
> *Ramya*
>


Re: How are 'PHYSICAL_MEMORY_BYTES' and 'VIRTUAL_MEMORY_BYTES' calculated?

2013-07-12 Thread Shahab Yunus
I think they are cumulative but per task.

Physical memory bytes
(PHYSICAL_MEMORY_BYTES)
The physical memory being used by a task in bytes, as reported by
/proc/meminfo.
Virtual memory bytes
(VIRTUAL_MEMORY_BYTES)
The virtual memory being used by a task in bytes, as reported by
/proc/meminfo.

This is from the Definitive Guide book. Page 260.

Regards,
Shhab


On Thu, Jul 11, 2013 at 12:47 PM, hadoop qi  wrote:

> Hello,
>
> I am wondering how memory counters  'PHYSICAL_MEMORY_BYTES'  and
> 'VIRTUAL_MEMORY_BYTES'  are calculated? They are peaks of memory usage or
> cumulative usage?
>
> Thanks for help,
>


Hadoop property precedence

2013-07-12 Thread Shalish VJ
Hi,
 
 
Suppose block size set in configuration file at client side is 64MB,
block size set in configuration file at name node side is 128MB and block size 
set in configuration file at datanode side is something else.
Please advice, If the client is writing a file to hdfs,which property would be 
executed.
 
Thanks,
Shalish.

Re: Hadoop property precedence

2013-07-12 Thread Raj K Singh
I think it will take 128MB of file size which is given in namenode config
file but nt sure not even try it.


Raj K Singh
http://www.rajkrrsingh.blogspot.com
Mobile  Tel: +91 (0)9899821370


On Fri, Jul 12, 2013 at 10:20 PM, Shalish VJ  wrote:

> Hi,
>
>
> Suppose block size set in configuration file at client side is 64MB,
> block size set in configuration file at name node side is 128MB and block
> size set in configuration file at datanode side is something else.
> Please advice, If the client is writing a file to hdfs,which property
> would be executed.
>
> Thanks,
> Shalish.
>


UNSUBSCRIBE

2013-07-12 Thread Brent Nikolaus



Re: How are 'PHYSICAL_MEMORY_BYTES' and 'VIRTUAL_MEMORY_BYTES' calculated?

2013-07-12 Thread Vinod Kumar Vavilapalli
They are the running metrics. While the task is running, they will tell you how 
much pmem/vmem it is using at that point of time. Obviously at the end of job, 
it will be the last snapshot.

Thanks,
+Vinod

On Jul 12, 2013, at 6:47 AM, Shahab Yunus wrote:

> I think they are cumulative but per task.
> 
> Physical memory bytes
> (PHYSICAL_MEMORY_BYTES)
> The physical memory being used by a task in bytes, as reported by 
> /proc/meminfo.
> Virtual memory bytes
> (VIRTUAL_MEMORY_BYTES)
> The virtual memory being used by a task in bytes, as reported by 
> /proc/meminfo.
> 
> This is from the Definitive Guide book. Page 260.
> 
> Regards,
> Shhab
> 
> 
> On Thu, Jul 11, 2013 at 12:47 PM, hadoop qi  wrote:
> Hello, 
> 
> I am wondering how memory counters  'PHYSICAL_MEMORY_BYTES'  and 
> 'VIRTUAL_MEMORY_BYTES'  are calculated? They are peaks of memory usage or 
> cumulative usage? 
> 
> Thanks for help, 
> 



Re: UNSUBSCRIBE

2013-07-12 Thread sure bhands
Please send an email to user-unsubscr...@hadoop.apache.org to unsubscribe.

Thanks,
Surendra


On Fri, Jul 12, 2013 at 10:24 AM, Brent Nikolaus wrote:

>
>


Re: Staging directory ENOTDIR error.

2013-07-12 Thread Jay Vyas
This was a very odd error - it turns out that i had created a file, called
"tmp" in my fs root directory, which meant that
when the jobs were trying to write to the tmp directory, they ran into the
not-a-dir exception.

In any case, I think the error reporting in NativeIO class should be
revised.

On Thu, Jul 11, 2013 at 10:24 PM, Devaraj k  wrote:

>  Hi Jay,
>
> ** **
>
>Here client is trying to create a staging directory in local file
> system,  which actually should create in HDFS.
>
> ** **
>
> Could you check whether do you have configured “fs.defaultFS”
> configuration in client with the HDFS.
>
> 
>
> ** **
>
> Thanks
>
> Devaraj k
>
> ** **
>
> *From:* Jay Vyas [mailto:jayunit...@gmail.com]
> *Sent:* 12 July 2013 04:12
> *To:* common-u...@hadoop.apache.org
> *Subject:* Staging directory ENOTDIR error.
>
> ** **
>
> Hi , I'm getting an ungoogleable exception, never seen this before. 
>
> This is on a hadoop 1.1. cluster... It appears that its permissions
> related... 
>
> Any thoughts as to how this could crop up?
>
> I assume its a bug in my filesystem, but not sure.
>
>
> 13/07/11 18:39:43 ERROR security.UserGroupInformation:
> PriviledgedActionException as:root cause:ENOTDIR: Not a directory
> ENOTDIR: Not a directory
> at org.apache.hadoop.io.nativeio.NativeIO.chmod(Native Method)
> at org.apache.hadoop.fs.FileUtil.execSetPermission(FileUtil.java:699)
> at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:654)
> at
> org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:509)
> at
> org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:344)
> at
> org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:189)
> at
> org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:116)
>
> 
>
>
> --
> Jay Vyas
> http://jayunit100.blogspot.com 
>



-- 
Jay Vyas
http://jayunit100.blogspot.com


Re: how to get hadoop HDFS path?

2013-07-12 Thread deepak rosario tharigopla
You can get the hdfs file system as follows
Configuration conf = new Configuration();
conf.addResource(new
Path("/home/dpancras/TradeStation/CassandraPigHadoop/WebContent/WEB-INF/core-site.xml"));
conf.addResource(new
Path("/home/dpancras/TradeStation/CassandraPigHadoop/WebContent/WEB-INF/hdfs-site.xml"));
FileSystem fs = FileSystem.get(conf);


On Fri, Jul 12, 2013 at 4:40 AM, ch huang  wrote:

> i want set hdfs path ,AND add the path into hbase,here is my code
>
>  Path path = new Path("hdfs:192.168.10.22:9000/alex/test.jar");
>   System.out.println(":
> "+path.toString()+"|"+TestMyCo.class.getCanonicalName()+"|"+Coprocessor.PRIORITY_USER);
>
>   htd.setValue("COPROCESSOR$1", path.toString()+"|"
> + TestMyCo.class.getCanonicalName()+"|"+Coprocessor.PRIORITY_USER);
>
> and the real value which i find is
>
> hbase(main):012:0> describe 'mytest'
> DESCRIPTION
> ENABLED
>  {NAME => 'mytest', COPROCESSOR$1 => 'hdfs:/
> 192.168.10.22:9000/alex/test.jar|TestMyCo|1073741823',
> FAMILIES => [{N true
>  AME => 'myfl', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'NONE',
> REPLICATION_SCOPE => '0', VERSIONS => '3', C
>  OMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => '2147483647',
> KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '6553
>  6', IN_MEMORY => 'false', ENCODE_ON_DISK => 'true', BLOCKCACHE =>
> 'true'}]}
> 1 row(s) in 0.0930 seconds
>



-- 
Thanks & Regards
Deepak Rosario Pancras
*Achiever/Responsibility/Arranger/Maximizer/Harmony*


Re: how to get hadoop HDFS path?

2013-07-12 Thread deepak rosario tharigopla
Configuration conf = new Configuration();
conf.addResource(new
Path("/home/dpancras/TradeStation/CassandraPigHadoop/WebContent/WEB-INF/core-site.xml"));
conf.addResource(new
Path("/home/dpancras/TradeStation/CassandraPigHadoop/WebContent/WEB-INF/hdfs-site.xml"));
FileSystem fs = FileSystem.get(conf);
 Path path = new
Path("/alex/test.jar");
//Use relative path here
  System.out.println(": "+path.toString()+"|"+TestMyCo.class.
getCanonicalName()+"|"+Coprocessor.PRIORITY_USER);

  htd.setValue("COPROCESSOR$1", path.toString()+"|"
+ TestMyCo.class.getCanonicalName()+"|"+Coprocessor.PRIORITY_USER);


On Fri, Jul 12, 2013 at 2:00 PM, deepak rosario tharigopla <
rozartharigo...@gmail.com> wrote:

> You can get the hdfs file system as follows
> Configuration conf = new Configuration();
> conf.addResource(new
> Path("/home/dpancras/TradeStation/CassandraPigHadoop/WebContent/WEB-INF/core-site.xml"));
> conf.addResource(new
> Path("/home/dpancras/TradeStation/CassandraPigHadoop/WebContent/WEB-INF/hdfs-site.xml"));
> FileSystem fs = FileSystem.get(conf);
>
>
> On Fri, Jul 12, 2013 at 4:40 AM, ch huang  wrote:
>
>> i want set hdfs path ,AND add the path into hbase,here is my code
>>
>>  Path path = new Path("hdfs:192.168.10.22:9000/alex/test.jar");
>>   System.out.println(":
>> "+path.toString()+"|"+TestMyCo.class.getCanonicalName()+"|"+Coprocessor.PRIORITY_USER);
>>
>>   htd.setValue("COPROCESSOR$1", path.toString()+"|"
>> + TestMyCo.class.getCanonicalName()+"|"+Coprocessor.PRIORITY_USER);
>>
>> and the real value which i find is
>>
>> hbase(main):012:0> describe 'mytest'
>> DESCRIPTION
>> ENABLED
>>  {NAME => 'mytest', COPROCESSOR$1 => 'hdfs:/
>> 192.168.10.22:9000/alex/test.jar|TestMyCo|1073741823',
>> FAMILIES => [{N true
>>  AME => 'myfl', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'NONE',
>> REPLICATION_SCOPE => '0', VERSIONS => '3', C
>>  OMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => '2147483647',
>> KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '6553
>>  6', IN_MEMORY => 'false', ENCODE_ON_DISK => 'true', BLOCKCACHE =>
>> 'true'}]}
>> 1 row(s) in 0.0930 seconds
>>
>
>
>
> --
> Thanks & Regards
> Deepak Rosario Pancras
> *Achiever/Responsibility/Arranger/Maximizer/Harmony*
>



-- 
Thanks & Regards
Deepak Rosario Pancras
*Achiever/Responsibility/Arranger/Maximizer/Harmony*


Re: How are 'PHYSICAL_MEMORY_BYTES' and 'VIRTUAL_MEMORY_BYTES' calculated?

2013-07-12 Thread hadoop qi
Thanks for the response. So they represent the total physical memory
(virtual memory) has been allocated to the job (e.g., from heap and stack)
during its entire life time? I am still confused how to get the cumulative
number from /proc/meminfo. I think from /proc/meminfo we can only get the
memory usage of a  process in a particular time point (looked like a
snapshot of the status of the process). If these numbers are added, the sum
would be much more than memory allocated to the program.


On Fri, Jul 12, 2013 at 6:47 AM, Shahab Yunus wrote:

> I think they are cumulative but per task.
>
> Physical memory bytes
> (PHYSICAL_MEMORY_BYTES)
> The physical memory being used by a task in bytes, as reported by
> /proc/meminfo.
> Virtual memory bytes
> (VIRTUAL_MEMORY_BYTES)
> The virtual memory being used by a task in bytes, as reported by
> /proc/meminfo.
>
> This is from the Definitive Guide book. Page 260.
>
> Regards,
> Shhab
>
>
> On Thu, Jul 11, 2013 at 12:47 PM, hadoop qi  wrote:
>
>> Hello,
>>
>> I am wondering how memory counters  'PHYSICAL_MEMORY_BYTES'  and
>> 'VIRTUAL_MEMORY_BYTES'  are calculated? They are peaks of memory usage or
>> cumulative usage?
>>
>> Thanks for help,
>>
>
>


Re: How are 'PHYSICAL_MEMORY_BYTES' and 'VIRTUAL_MEMORY_BYTES' calculated?

2013-07-12 Thread Shahab Yunus
As Vinod Kumar Vavilapalli they are indeed snapshots in point and time. So
they are neither the peak usage from the whole duration of the job, nor
cumulative aggregate that increases over time.

Regards,
Shahab


On Fri, Jul 12, 2013 at 4:47 PM, hadoop qi  wrote:

> Thanks for the response. So they represent the total physical memory
> (virtual memory) has been allocated to the job (e.g., from heap and stack)
> during its entire life time? I am still confused how to get the cumulative
> number from /proc/meminfo. I think from /proc/meminfo we can only get the
> memory usage of a  process in a particular time point (looked like a
> snapshot of the status of the process). If these numbers are added, the sum
> would be much more than memory allocated to the program.
>
>
> On Fri, Jul 12, 2013 at 6:47 AM, Shahab Yunus wrote:
>
>> I think they are cumulative but per task.
>>
>> Physical memory bytes
>> (PHYSICAL_MEMORY_BYTES)
>> The physical memory being used by a task in bytes, as reported by
>> /proc/meminfo.
>> Virtual memory bytes
>> (VIRTUAL_MEMORY_BYTES)
>> The virtual memory being used by a task in bytes, as reported by
>> /proc/meminfo.
>>
>> This is from the Definitive Guide book. Page 260.
>>
>> Regards,
>> Shhab
>>
>>
>> On Thu, Jul 11, 2013 at 12:47 PM, hadoop qi  wrote:
>>
>>> Hello,
>>>
>>> I am wondering how memory counters  'PHYSICAL_MEMORY_BYTES'  and
>>> 'VIRTUAL_MEMORY_BYTES'  are calculated? They are peaks of memory usage or
>>> cumulative usage?
>>>
>>> Thanks for help,
>>>
>>
>>
>


Running hadoop for processing sources in full sky maps

2013-07-12 Thread andrea zonca
Hi,

I have few tens of full sky maps, in binary format (FITS) of about 600MB each.

For each sky map I already have a catalog of the position of few
thousand sources, i.e. stars, galaxies, radio sources.

For each source I would like to:

open the full sky map
extract the relevant section, typically 20MB or less
run some statistics on them
aggregate the outputs to a catalog

I would like to run hadoop, possibly using python via the streaming
interface, to process them in parallel.

I think the input to the mapper should be each record of the catalogs,
then the python mapper can open the full sky map, do the processing
and print the output to stdout.

Is this a reasonable approach?
If so, I need to be able to configure hadoop so that a full sky map is
copied locally to the nodes that are processing one of its sources.
How can I achieve that?
Also, what is the best way to feed the input data to hadoop? for each
source I have a reference to the full sky map, latitude and longitude

Thanks,
I posted this question on StackOverflow:
http://stackoverflow.com/questions/17617654/running-hadoop-for-processing-sources-in-full-sky-maps

Regards,
Andrea Zonca


How to control of the output of "/stacks"

2013-07-12 Thread Shinichi Yamashita
Hi,

I can see the stack trace of the node when I access "/stacks" of Web UI.
And stack trace is output in the log file of the node, too.
Because the expansion of the log file and hard to see it, I don't want
to output it in a log file.
Is there the method to solve this problem?

Regards,
Shinichi Yamashita


Re: How are 'PHYSICAL_MEMORY_BYTES' and 'VIRTUAL_MEMORY_BYTES' calculated?

2013-07-12 Thread hadoop qi
If a program terminates normally, i would assume its memory usage close to
zero or at least very small, because the program release most memory it was
allocated. That would mean all jobs in Hadoop had very small memory usage,
because only the last point was measured. Actually, this is not case. We
can see some programs with several G memory usage. Is that means at the end
of the program it still hold several G memory? I am still confused.

Regards,
Qi


On Fri, Jul 12, 2013 at 1:51 PM, Shahab Yunus wrote:

> As Vinod Kumar Vavilapalli they are indeed snapshots in point and time. So
> they are neither the peak usage from the whole duration of the job, nor
> cumulative aggregate that increases over time.
>
> Regards,
> Shahab
>
>
> On Fri, Jul 12, 2013 at 4:47 PM, hadoop qi  wrote:
>
>> Thanks for the response. So they represent the total physical memory
>> (virtual memory) has been allocated to the job (e.g., from heap and stack)
>> during its entire life time? I am still confused how to get the cumulative
>> number from /proc/meminfo. I think from /proc/meminfo we can only get the
>> memory usage of a  process in a particular time point (looked like a
>> snapshot of the status of the process). If these numbers are added, the sum
>> would be much more than memory allocated to the program.
>>
>>
>> On Fri, Jul 12, 2013 at 6:47 AM, Shahab Yunus wrote:
>>
>>> I think they are cumulative but per task.
>>>
>>> Physical memory bytes
>>> (PHYSICAL_MEMORY_BYTES)
>>> The physical memory being used by a task in bytes, as reported by
>>> /proc/meminfo.
>>> Virtual memory bytes
>>> (VIRTUAL_MEMORY_BYTES)
>>> The virtual memory being used by a task in bytes, as reported by
>>> /proc/meminfo.
>>>
>>> This is from the Definitive Guide book. Page 260.
>>>
>>> Regards,
>>> Shhab
>>>
>>>
>>> On Thu, Jul 11, 2013 at 12:47 PM, hadoop qi wrote:
>>>
 Hello,

 I am wondering how memory counters  'PHYSICAL_MEMORY_BYTES'  and
 'VIRTUAL_MEMORY_BYTES'  are calculated? They are peaks of memory usage or
 cumulative usage?

 Thanks for help,

>>>
>>>
>>
>


Re: How are 'PHYSICAL_MEMORY_BYTES' and 'VIRTUAL_MEMORY_BYTES' calculated?

2013-07-12 Thread Vinod Kumar Vavilapalli

No, every so often, 3 seconds IIRC, it capture pmem and vmem which corresponds 
to the usage of the process and its children at *that* specific point of time. 
Cumulative = cumulative across the process and its children.

Thanks,
+Vinod

On Jul 12, 2013, at 1:47 PM, hadoop qi wrote:

> Thanks for the response. So they represent the total physical memory (virtual 
> memory) has been allocated to the job (e.g., from heap and stack) during its 
> entire life time? I am still confused how to get the cumulative number from 
> /proc/meminfo. I think from /proc/meminfo we can only get the memory usage of 
> a  process in a particular time point (looked like a snapshot of the status 
> of the process). If these numbers are added, the sum would be much more than 
> memory allocated to the program. 
> 
> 
> On Fri, Jul 12, 2013 at 6:47 AM, Shahab Yunus  wrote:
> I think they are cumulative but per task.
> 
> Physical memory bytes
> (PHYSICAL_MEMORY_BYTES)
> The physical memory being used by a task in bytes, as reported by 
> /proc/meminfo.
> Virtual memory bytes
> (VIRTUAL_MEMORY_BYTES)
> The virtual memory being used by a task in bytes, as reported by 
> /proc/meminfo.
> 
> This is from the Definitive Guide book. Page 260.
> 
> Regards,
> Shhab
> 
> 
> On Thu, Jul 11, 2013 at 12:47 PM, hadoop qi  wrote:
> Hello, 
> 
> I am wondering how memory counters  'PHYSICAL_MEMORY_BYTES'  and 
> 'VIRTUAL_MEMORY_BYTES'  are calculated? They are peaks of memory usage or 
> cumulative usage? 
> 
> Thanks for help, 
> 
> 



Re: How are 'PHYSICAL_MEMORY_BYTES' and 'VIRTUAL_MEMORY_BYTES' calculated?

2013-07-12 Thread Vinod Kumar Vavilapalli

The snapshotting stops much before the actual JVM exits - when a map/reduce 
task is deemed done, the metric collection stops. The JVM exits a little later.

Thanks,
+Vinod

On Jul 12, 2013, at 3:57 PM, hadoop qi wrote:

> If a program terminates normally, i would assume its memory usage close to 
> zero or at least very small, because the program release most memory it was 
> allocated. That would mean all jobs in Hadoop had very small memory usage, 
> because only the last point was measured. Actually, this is not case. We can 
> see some programs with several G memory usage. Is that means at the end of 
> the program it still hold several G memory? I am still confused. 
> 
> Regards,
> Qi
> 
> 
> On Fri, Jul 12, 2013 at 1:51 PM, Shahab Yunus  wrote:
> As Vinod Kumar Vavilapalli they are indeed snapshots in point and time. So 
> they are neither the peak usage from the whole duration of the job, nor 
> cumulative aggregate that increases over time.
> 
> Regards,
> Shahab
> 
> 
> On Fri, Jul 12, 2013 at 4:47 PM, hadoop qi  wrote:
> Thanks for the response. So they represent the total physical memory (virtual 
> memory) has been allocated to the job (e.g., from heap and stack) during its 
> entire life time? I am still confused how to get the cumulative number from 
> /proc/meminfo. I think from /proc/meminfo we can only get the memory usage of 
> a  process in a particular time point (looked like a snapshot of the status 
> of the process). If these numbers are added, the sum would be much more than 
> memory allocated to the program. 
> 
> 
> On Fri, Jul 12, 2013 at 6:47 AM, Shahab Yunus  wrote:
> I think they are cumulative but per task.
> 
> Physical memory bytes
> (PHYSICAL_MEMORY_BYTES)
> The physical memory being used by a task in bytes, as reported by 
> /proc/meminfo.
> Virtual memory bytes
> (VIRTUAL_MEMORY_BYTES)
> The virtual memory being used by a task in bytes, as reported by 
> /proc/meminfo.
> 
> This is from the Definitive Guide book. Page 260.
> 
> Regards,
> Shhab
> 
> 
> On Thu, Jul 11, 2013 at 12:47 PM, hadoop qi  wrote:
> Hello, 
> 
> I am wondering how memory counters  'PHYSICAL_MEMORY_BYTES'  and 
> 'VIRTUAL_MEMORY_BYTES'  are calculated? They are peaks of memory usage or 
> cumulative usage? 
> 
> Thanks for help, 
> 
> 
> 
> 



Maven artifacts for 0.23.9

2013-07-12 Thread Eugene Dzhurinsky
Hello!

Where is it possible to get Maven artifacts for recent Hadoop release?

Thanks!
-- 
Eugene N Dzhurinsky


pgpuMe4RbtBeW.pgp
Description: PGP signature


Re: How to control of the output of "/stacks"

2013-07-12 Thread Harsh J
The logging has sometimes come useful in debugging (i.e. if the stack
on the UI went uncaptured, the log helps). It is currently not
specifically toggle-able. I suppose it is OK to set it as DEBUG
though. Can you file a JIRA for that please?

The only way you can disable it right now is by (brute-forcibly)
adding the below to the daemon's log4j.properties:

log4j.logger.org.apache.hadoop.http.HttpServer=WARN

Which may not be so ideal as we may miss other INFO from HttpServer generically.

On Sat, Jul 13, 2013 at 3:24 AM, Shinichi Yamashita
 wrote:
> Hi,
>
> I can see the stack trace of the node when I access "/stacks" of Web UI.
> And stack trace is output in the log file of the node, too.
> Because the expansion of the log file and hard to see it, I don't want
> to output it in a log file.
> Is there the method to solve this problem?
>
> Regards,
> Shinichi Yamashita



-- 
Harsh J


[no subject]

2013-07-12 Thread Anit Alexander
Hello,

I am encountering a problem in cdh4 environment.
I can successfully run the map reduce job in the hadoop cluster. But when i
migrated the same map reduce to my cdh4 environment it creates an error
stating that it cannot read the next block(each block is 64 mb). Why is
that so?

Hadoop environment: hadoop 1.0.3
java version 1.6

chd4 environment: CDH4.2.0
java version 1.6

Regards,
Anit Alexander


Re:

2013-07-12 Thread Suresh Srinivas
Please use CDH mailing list. This is apache hadoop mailing list. 

Sent from phone

On Jul 12, 2013, at 7:51 PM, Anit Alexander  wrote:

> Hello,
> 
> I am encountering a problem in cdh4 environment. 
> I can successfully run the map reduce job in the hadoop cluster. But when i 
> migrated the same map reduce to my cdh4 environment it creates an error 
> stating that it cannot read the next block(each block is 64 mb). Why is that 
> so?
> 
> Hadoop environment: hadoop 1.0.3
> java version 1.6
> 
> chd4 environment: CDH4.2.0
> java version 1.6
> 
> Regards,
> Anit Alexander