Failed to run wordcount on YARN

2013-07-12 Thread Liu, Raymond
Hi 

I just start to try out hadoop2.0, I use the 2.0.5-alpha package

And follow 

http://hadoop.apache.org/docs/r2.0.5-alpha/hadoop-project-dist/hadoop-common/ClusterSetup.html

to setup a cluster in non-security mode. HDFS works fine with client tools.

While when I run wordcount example, there are errors :

./bin/hadoop jar 
./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.0.5-alpha.jar wordcount 
/tmp /out


13/07/12 15:05:53 INFO mapreduce.Job: Task Id : 
attempt_1373609123233_0004_m_04_0, Status : FAILED
Error: java.io.FileNotFoundException: Path is not a file: /tmp/hadoop-yarn
at 
org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:42)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1317)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1276)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1252)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1225)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:403)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:239)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:40728)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:454)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1014)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1741)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1737)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1478)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1735)

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:525)
at 
org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:90)
at 
org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:57)
at 
org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:986)
at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:974)
at 
org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:157)
at 
org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:124)
at org.apache.hadoop.hdfs.DFSInputStream.init(DFSInputStream.java:117)
at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1131)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:244)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:77)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:713)
at 
org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:89)
at 
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:519)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:756)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:158)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1478)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:153)

I check the HDFS and found /tmp/hadoop-yarn is there , this dir's owner is the 
same as the job user.

And to ensure it works, I also create /tmp/hadoop-yarn on local fs. None of it 
works.

Any idea what might be the problem? Thx!


Best Regards,
Raymond Liu



Re: copy files from ftp to hdfs in parallel, distcp failed

2013-07-12 Thread Hao Ren

Le 11/07/2013 20:47, Balaji Narayanan (பாலாஜி நாராயணன்) a écrit :

multiple copy jobs to hdfs


Thank you for your reply and the link.

I read the link before, but I didn't find any examples about copying 
file from ftp to hdfs.


There are about 20-40 file in my directory. I just want to move or copy 
that directory to hdfs on Amazon EC2.


Actually, I am new to hadoop. I would like to know how to do multiple 
copy jobs to hdfs without distcp.


Thank you again.

--
Hao Ren
ClaraVista
www.claravista.fr


Re: how to add JournalNodes

2013-07-12 Thread Harsh J
You need to restart your NameNodes to get them to use the new QJM
5-host-set configs, and I think you can do that without downtime if
you're already in HA mode by restarting one NN at a time.

To add new JNs first though, you will currently have to rsync their
directory from a good JN to get them into the cluster (i.e. rsync the
data from a good one before you start the new JNs). They will not
auto-join otherwise.

On Fri, Jul 12, 2013 at 12:57 PM, lei liu liulei...@gmail.com wrote:
 I use QJM for HA in hadoop2.0,  now  there are three JournalNodes in HDFS
 cluster, I want to add two new JournalNodes to HDFS cluster, how to do it?
 Do I need to restart HDFS cluster?


 Thanks,

 LiuLei



-- 
Harsh J


unsubscribe

2013-07-12 Thread Margusja




RE: Taktracker in namenode failure

2013-07-12 Thread Ramya S
Both the map output value  class configured and the output value  written from 
the mapper is Text class. So there is no mismatch in the value class.
 
 But when the same MR program is run with 2 tasktrackers(without tasktracker in 
namenode) exception is not occuring.
 
The problem is only with the tasktracker running in the namenode.
 
 
 
Thanks  Regards
 
Ramya.S



From: Devaraj k [mailto:devara...@huawei.com]
Sent: Fri 7/12/2013 3:04 PM
To: user@hadoop.apache.org
Subject: RE: Taktracker in namenode failure



Could you tell, what is the Map Output Value class you are configuring while 
submitting Job and what is the type of the value writing from the Mapper. If 
both of these mismatches then it will trow the below error.

 

Thanks

Devaraj k

 

From: Ramya S [mailto:ram...@suntecgroup.com] 
Sent: 12 July 2013 14:46
To: user@hadoop.apache.org
Subject: Taktracker in namenode failure

 

Hi,

 

Why only tasktracker in namenode faill during  job execution with error.

I have attached the snapshot of error screen with this mail

java.io.IOException: Type mismatch in value from map: expected 
org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.IntWritable
at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1019)
at 
org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:691)
at 
org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
at WordCount$TokenizerMapper.map(WordCount.java:30)
at WordCount$TokenizerMapper.map(WordCount.java:19)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
 
but  this same task is reassigned to another tasktracker and getting executed. 
why?

 

 

Best Regards,

Ramya

winmail.dat

RE: Taktracker in namenode failure

2013-07-12 Thread Devaraj k
I think, there is mismatch of jar’s coming in the classpath for the map tasks 
when it runs in different machines. You can find out this, by giving some 
unique name for your Mapper class, Job Submit class and then submit the Job.

Thanks
Devaraj k

From: Ramya S [mailto:ram...@suntecgroup.com]
Sent: 12 July 2013 15:27
To: user@hadoop.apache.org
Subject: RE: Taktracker in namenode failure

Both the map output value  class configured and the output value  written from 
the mapper is Text class. So there is no mismatch in the value class.

 But when the same MR program is run with 2 tasktrackers(without tasktracker in 
namenode) exception is not occuring.

The problem is only with the tasktracker running in the namenode.



Thanks  Regards

Ramya.S


From: Devaraj k [mailto:devara...@huawei.com]
Sent: Fri 7/12/2013 3:04 PM
To: user@hadoop.apache.orgmailto:user@hadoop.apache.org
Subject: RE: Taktracker in namenode failure
Could you tell, what is the Map Output Value class you are configuring while 
submitting Job and what is the type of the value writing from the Mapper. If 
both of these mismatches then it will trow the below error.

Thanks
Devaraj k

From: Ramya S [mailto:ram...@suntecgroup.com]
Sent: 12 July 2013 14:46
To: user@hadoop.apache.orgmailto:user@hadoop.apache.org
Subject: Taktracker in namenode failure

Hi,

Why only tasktracker in namenode faill during  job execution with error.
I have attached the snapshot of error screen with this mail

java.io.IOException: Type mismatch in value from map: expected 
org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.IntWritable

at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1019)

at 
org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:691)

at 
org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)

at WordCount$TokenizerMapper.map(WordCount.java:30)

at WordCount$TokenizerMapper.map(WordCount.java:19)

at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)

at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)

at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)

at org.apache.hadoop.mapred.Child$4.run(Child.java:255)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:416)

at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)

at org.apache.hadoop.mapred.Child.main(Child.java:249)



but  this same task is reassigned to another tasktracker and getting executed. 
why?


Best Regards,
Ramya


Re: Staging directory ENOTDIR error.

2013-07-12 Thread Ram
Hi jay,
what hadoop command you are given.

Hi,



From,
Ramesh.




On Fri, Jul 12, 2013 at 7:54 AM, Devaraj k devara...@huawei.com wrote:

  Hi Jay,

 ** **

Here client is trying to create a staging directory in local file
 system,  which actually should create in HDFS.

 ** **

 Could you check whether do you have configured “fs.defaultFS”
 configuration in client with the HDFS.

 

 ** **

 Thanks

 Devaraj k

 ** **

 *From:* Jay Vyas [mailto:jayunit...@gmail.com]
 *Sent:* 12 July 2013 04:12
 *To:* common-u...@hadoop.apache.org
 *Subject:* Staging directory ENOTDIR error.

 ** **

 Hi , I'm getting an ungoogleable exception, never seen this before. 

 This is on a hadoop 1.1. cluster... It appears that its permissions
 related... 

 Any thoughts as to how this could crop up?

 I assume its a bug in my filesystem, but not sure.


 13/07/11 18:39:43 ERROR security.UserGroupInformation:
 PriviledgedActionException as:root cause:ENOTDIR: Not a directory
 ENOTDIR: Not a directory
 at org.apache.hadoop.io.nativeio.NativeIO.chmod(Native Method)
 at org.apache.hadoop.fs.FileUtil.execSetPermission(FileUtil.java:699)
 at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:654)
 at
 org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:509)
 at
 org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:344)
 at
 org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:189)
 at
 org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:116)

 


 --
 Jay Vyas
 http://jayunit100.blogspot.com 



Re: copy files from ftp to hdfs in parallel, distcp failed

2013-07-12 Thread Ram
Hi,
   Please configure the following in core-ste.xml and try.
   Use hadoop fs -ls file:///  -- to display local file system files
   Use hadoop fs -ls ftp://your ftp location   -- to display ftp files if
it is listing files go for distcp.

reference from
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/core-default.xml


fs.ftp.host0.0.0.0FTP filesystem connects to this serverfs.ftp.host.port21FTP
filesystem connects to fs.ftp.host on this port
and try to set the property also

reference from hadoop definitive guide hadoop file system.

Filesystem URI scheme Java implementation
Description
  (all under org.apache.hadoop)

FTP ftp fs.ftp.FTPFileSystem
A filesystem backed by an FTP server.


Hi,



From,
Ramesh.




On Fri, Jul 12, 2013 at 1:04 PM, Hao Ren h@claravista.fr wrote:

 Le 11/07/2013 20:47, Balaji Narayanan (பாலாஜி நாராயணன்) a écrit :

 multiple copy jobs to hdfs


 Thank you for your reply and the link.

 I read the link before, but I didn't find any examples about copying file
 from ftp to hdfs.

 There are about 20-40 file in my directory. I just want to move or copy
 that directory to hdfs on Amazon EC2.

 Actually, I am new to hadoop. I would like to know how to do multiple copy
 jobs to hdfs without distcp.

 Thank you again.


 --
 Hao Ren
 ClaraVista
 www.claravista.fr



Re: Taktracker in namenode failure

2013-07-12 Thread Ram
Hi,
The problem is with jar file only, to check run any other MR job or
sample wordcount job on namenode tasktracker, if it is running no problem
with namenode tasktracker, if not running there may be problem with
tasktracker configuration, then compare with other node tasktracker
configuration. i.e tasktracker configuration means mapred configuration.

Hi,



From,
Ramesh.




On Fri, Jul 12, 2013 at 3:37 PM, Devaraj k devara...@huawei.com wrote:

  I think, there is mismatch of jar’s coming in the classpath for the map
 tasks when it runs in different machines. You can find out this, by giving
 some unique name for your Mapper class, Job Submit class and then submit
 the Job.

 ** **

 Thanks

 Devaraj k

 ** **

 *From:* Ramya S [mailto:ram...@suntecgroup.com]
 *Sent:* 12 July 2013 15:27
 *To:* user@hadoop.apache.org
 *Subject:* RE: Taktracker in namenode failure

  ** **

 Both the map output value  class configured and the output value  written
 from the mapper is Text class. So there is no mismatch in the value class.
 

  

  But when the same MR program is run with 2 tasktrackers(without
 tasktracker in namenode) exception is not occuring.

  

 The problem is only with the tasktracker running in the namenode.

  

  

  

 *Thanks  Regards*

  

 *Ramya.S*

 ** **
  --

 *From:* Devaraj k [mailto:devara...@huawei.com devara...@huawei.com]
 *Sent:* Fri 7/12/2013 3:04 PM
 *To:* user@hadoop.apache.org
 *Subject:* RE: Taktracker in namenode failure

 Could you tell, what is the Map Output Value class you are configuring
 while submitting Job and what is the type of the value writing from the
 Mapper. If both of these mismatches then it will trow the below error.

  

 Thanks

 Devaraj k

  

 *From:* Ramya S [mailto:ram...@suntecgroup.com ram...@suntecgroup.com]
 *Sent:* 12 July 2013 14:46
 *To:* user@hadoop.apache.org
 *Subject:* Taktracker in namenode failure

  

 Hi,

  

 Why only tasktracker in namenode faill during  job execution with error.**
 **

 I have attached the snapshot of error screen with this mail

 java.io.IOException: Type mismatch in value from map: expected 
 org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.IntWritable

 at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1019)

 at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:691)

 at 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)

 at WordCount$TokenizerMapper.map(WordCount.java:30)

 at WordCount$TokenizerMapper.map(WordCount.java:19)

 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)

 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)

 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)

 at org.apache.hadoop.mapred.Child$4.run(Child.java:255)

 at java.security.AccessController.doPrivileged(Native Method)

 at javax.security.auth.Subject.doAs(Subject.java:416)

 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)

 at org.apache.hadoop.mapred.Child.main(Child.java:249)

  

 but  this same task is reassigned to another tasktracker and getting 
 executed. why?

   

  

 *Best Regards,*

 *Ramya*



Hadoop property precedence

2013-07-12 Thread Shalish VJ
Hi,
 
 
Suppose block size set in configuration file at client side is 64MB,
block size set in configuration file at name node side is 128MB and block size 
set in configuration file at datanode side is something else.
Please advice, If the client is writing a file to hdfs,which property would be 
executed.
 
Thanks,
Shalish.

UNSUBSCRIBE

2013-07-12 Thread Brent Nikolaus



Re: How are 'PHYSICAL_MEMORY_BYTES' and 'VIRTUAL_MEMORY_BYTES' calculated?

2013-07-12 Thread Vinod Kumar Vavilapalli
They are the running metrics. While the task is running, they will tell you how 
much pmem/vmem it is using at that point of time. Obviously at the end of job, 
it will be the last snapshot.

Thanks,
+Vinod

On Jul 12, 2013, at 6:47 AM, Shahab Yunus wrote:

 I think they are cumulative but per task.
 
 Physical memory bytes
 (PHYSICAL_MEMORY_BYTES)
 The physical memory being used by a task in bytes, as reported by 
 /proc/meminfo.
 Virtual memory bytes
 (VIRTUAL_MEMORY_BYTES)
 The virtual memory being used by a task in bytes, as reported by 
 /proc/meminfo.
 
 This is from the Definitive Guide book. Page 260.
 
 Regards,
 Shhab
 
 
 On Thu, Jul 11, 2013 at 12:47 PM, hadoop qi hadoop@gmail.com wrote:
 Hello, 
 
 I am wondering how memory counters  'PHYSICAL_MEMORY_BYTES'  and 
 'VIRTUAL_MEMORY_BYTES'  are calculated? They are peaks of memory usage or 
 cumulative usage? 
 
 Thanks for help, 
 



Re: UNSUBSCRIBE

2013-07-12 Thread sure bhands
Please send an email to user-unsubscr...@hadoop.apache.org to unsubscribe.

Thanks,
Surendra


On Fri, Jul 12, 2013 at 10:24 AM, Brent Nikolaus bnikol...@gmail.comwrote:





Re: Staging directory ENOTDIR error.

2013-07-12 Thread Jay Vyas
This was a very odd error - it turns out that i had created a file, called
tmp in my fs root directory, which meant that
when the jobs were trying to write to the tmp directory, they ran into the
not-a-dir exception.

In any case, I think the error reporting in NativeIO class should be
revised.

On Thu, Jul 11, 2013 at 10:24 PM, Devaraj k devara...@huawei.com wrote:

  Hi Jay,

 ** **

Here client is trying to create a staging directory in local file
 system,  which actually should create in HDFS.

 ** **

 Could you check whether do you have configured “fs.defaultFS”
 configuration in client with the HDFS.

 

 ** **

 Thanks

 Devaraj k

 ** **

 *From:* Jay Vyas [mailto:jayunit...@gmail.com]
 *Sent:* 12 July 2013 04:12
 *To:* common-u...@hadoop.apache.org
 *Subject:* Staging directory ENOTDIR error.

 ** **

 Hi , I'm getting an ungoogleable exception, never seen this before. 

 This is on a hadoop 1.1. cluster... It appears that its permissions
 related... 

 Any thoughts as to how this could crop up?

 I assume its a bug in my filesystem, but not sure.


 13/07/11 18:39:43 ERROR security.UserGroupInformation:
 PriviledgedActionException as:root cause:ENOTDIR: Not a directory
 ENOTDIR: Not a directory
 at org.apache.hadoop.io.nativeio.NativeIO.chmod(Native Method)
 at org.apache.hadoop.fs.FileUtil.execSetPermission(FileUtil.java:699)
 at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:654)
 at
 org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:509)
 at
 org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:344)
 at
 org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:189)
 at
 org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:116)

 


 --
 Jay Vyas
 http://jayunit100.blogspot.com 




-- 
Jay Vyas
http://jayunit100.blogspot.com


Re: how to get hadoop HDFS path?

2013-07-12 Thread deepak rosario tharigopla
You can get the hdfs file system as follows
Configuration conf = new Configuration();
conf.addResource(new
Path(/home/dpancras/TradeStation/CassandraPigHadoop/WebContent/WEB-INF/core-site.xml));
conf.addResource(new
Path(/home/dpancras/TradeStation/CassandraPigHadoop/WebContent/WEB-INF/hdfs-site.xml));
FileSystem fs = FileSystem.get(conf);


On Fri, Jul 12, 2013 at 4:40 AM, ch huang justlo...@gmail.com wrote:

 i want set hdfs path ,AND add the path into hbase,here is my code

  Path path = new Path(hdfs:192.168.10.22:9000/alex/test.jar);
   System.out.println(:
 +path.toString()+|+TestMyCo.class.getCanonicalName()+|+Coprocessor.PRIORITY_USER);

   htd.setValue(COPROCESSOR$1, path.toString()+|
 + TestMyCo.class.getCanonicalName()+|+Coprocessor.PRIORITY_USER);

 and the real value which i find is

 hbase(main):012:0 describe 'mytest'
 DESCRIPTION
 ENABLED
  {NAME = 'mytest', COPROCESSOR$1 = 'hdfs:/
 192.168.10.22:9000/alex/test.jar|TestMyCo|1073741823http://192.168.10.22:9000/alex/test.jar%7CTestMyCo%7C1073741823',
 FAMILIES = [{N true
  AME = 'myfl', DATA_BLOCK_ENCODING = 'NONE', BLOOMFILTER = 'NONE',
 REPLICATION_SCOPE = '0', VERSIONS = '3', C
  OMPRESSION = 'NONE', MIN_VERSIONS = '0', TTL = '2147483647',
 KEEP_DELETED_CELLS = 'false', BLOCKSIZE = '6553
  6', IN_MEMORY = 'false', ENCODE_ON_DISK = 'true', BLOCKCACHE =
 'true'}]}
 1 row(s) in 0.0930 seconds




-- 
Thanks  Regards
Deepak Rosario Pancras
*Achiever/Responsibility/Arranger/Maximizer/Harmony*


Re: how to get hadoop HDFS path?

2013-07-12 Thread deepak rosario tharigopla
Configuration conf = new Configuration();
conf.addResource(new
Path(/home/dpancras/TradeStation/CassandraPigHadoop/WebContent/WEB-INF/core-site.xml));
conf.addResource(new
Path(/home/dpancras/TradeStation/CassandraPigHadoop/WebContent/WEB-INF/hdfs-site.xml));
FileSystem fs = FileSystem.get(conf);
 Path path = new
Path(/alex/test.jarhttp://192.168.10.22:9000/alex/test.jar);
//Use relative path here
  System.out.println(: +path.toString()+|+TestMyCo.class.
getCanonicalName()+|+Coprocessor.PRIORITY_USER);

  htd.setValue(COPROCESSOR$1, path.toString()+|
+ TestMyCo.class.getCanonicalName()+|+Coprocessor.PRIORITY_USER);


On Fri, Jul 12, 2013 at 2:00 PM, deepak rosario tharigopla 
rozartharigo...@gmail.com wrote:

 You can get the hdfs file system as follows
 Configuration conf = new Configuration();
 conf.addResource(new
 Path(/home/dpancras/TradeStation/CassandraPigHadoop/WebContent/WEB-INF/core-site.xml));
 conf.addResource(new
 Path(/home/dpancras/TradeStation/CassandraPigHadoop/WebContent/WEB-INF/hdfs-site.xml));
 FileSystem fs = FileSystem.get(conf);


 On Fri, Jul 12, 2013 at 4:40 AM, ch huang justlo...@gmail.com wrote:

 i want set hdfs path ,AND add the path into hbase,here is my code

  Path path = new Path(hdfs:192.168.10.22:9000/alex/test.jar);
   System.out.println(:
 +path.toString()+|+TestMyCo.class.getCanonicalName()+|+Coprocessor.PRIORITY_USER);

   htd.setValue(COPROCESSOR$1, path.toString()+|
 + TestMyCo.class.getCanonicalName()+|+Coprocessor.PRIORITY_USER);

 and the real value which i find is

 hbase(main):012:0 describe 'mytest'
 DESCRIPTION
 ENABLED
  {NAME = 'mytest', COPROCESSOR$1 = 'hdfs:/
 192.168.10.22:9000/alex/test.jar|TestMyCo|1073741823http://192.168.10.22:9000/alex/test.jar%7CTestMyCo%7C1073741823',
 FAMILIES = [{N true
  AME = 'myfl', DATA_BLOCK_ENCODING = 'NONE', BLOOMFILTER = 'NONE',
 REPLICATION_SCOPE = '0', VERSIONS = '3', C
  OMPRESSION = 'NONE', MIN_VERSIONS = '0', TTL = '2147483647',
 KEEP_DELETED_CELLS = 'false', BLOCKSIZE = '6553
  6', IN_MEMORY = 'false', ENCODE_ON_DISK = 'true', BLOCKCACHE =
 'true'}]}
 1 row(s) in 0.0930 seconds




 --
 Thanks  Regards
 Deepak Rosario Pancras
 *Achiever/Responsibility/Arranger/Maximizer/Harmony*




-- 
Thanks  Regards
Deepak Rosario Pancras
*Achiever/Responsibility/Arranger/Maximizer/Harmony*


Re: How are 'PHYSICAL_MEMORY_BYTES' and 'VIRTUAL_MEMORY_BYTES' calculated?

2013-07-12 Thread hadoop qi
Thanks for the response. So they represent the total physical memory
(virtual memory) has been allocated to the job (e.g., from heap and stack)
during its entire life time? I am still confused how to get the cumulative
number from /proc/meminfo. I think from /proc/meminfo we can only get the
memory usage of a  process in a particular time point (looked like a
snapshot of the status of the process). If these numbers are added, the sum
would be much more than memory allocated to the program.


On Fri, Jul 12, 2013 at 6:47 AM, Shahab Yunus shahab.yu...@gmail.comwrote:

 I think they are cumulative but per task.

 Physical memory bytes
 (PHYSICAL_MEMORY_BYTES)
 The physical memory being used by a task in bytes, as reported by
 /proc/meminfo.
 Virtual memory bytes
 (VIRTUAL_MEMORY_BYTES)
 The virtual memory being used by a task in bytes, as reported by
 /proc/meminfo.

 This is from the Definitive Guide book. Page 260.

 Regards,
 Shhab


 On Thu, Jul 11, 2013 at 12:47 PM, hadoop qi hadoop@gmail.com wrote:

 Hello,

 I am wondering how memory counters  'PHYSICAL_MEMORY_BYTES'  and
 'VIRTUAL_MEMORY_BYTES'  are calculated? They are peaks of memory usage or
 cumulative usage?

 Thanks for help,





Re: How are 'PHYSICAL_MEMORY_BYTES' and 'VIRTUAL_MEMORY_BYTES' calculated?

2013-07-12 Thread Shahab Yunus
As Vinod Kumar Vavilapalli they are indeed snapshots in point and time. So
they are neither the peak usage from the whole duration of the job, nor
cumulative aggregate that increases over time.

Regards,
Shahab


On Fri, Jul 12, 2013 at 4:47 PM, hadoop qi hadoop@gmail.com wrote:

 Thanks for the response. So they represent the total physical memory
 (virtual memory) has been allocated to the job (e.g., from heap and stack)
 during its entire life time? I am still confused how to get the cumulative
 number from /proc/meminfo. I think from /proc/meminfo we can only get the
 memory usage of a  process in a particular time point (looked like a
 snapshot of the status of the process). If these numbers are added, the sum
 would be much more than memory allocated to the program.


 On Fri, Jul 12, 2013 at 6:47 AM, Shahab Yunus shahab.yu...@gmail.comwrote:

 I think they are cumulative but per task.

 Physical memory bytes
 (PHYSICAL_MEMORY_BYTES)
 The physical memory being used by a task in bytes, as reported by
 /proc/meminfo.
 Virtual memory bytes
 (VIRTUAL_MEMORY_BYTES)
 The virtual memory being used by a task in bytes, as reported by
 /proc/meminfo.

 This is from the Definitive Guide book. Page 260.

 Regards,
 Shhab


 On Thu, Jul 11, 2013 at 12:47 PM, hadoop qi hadoop@gmail.com wrote:

 Hello,

 I am wondering how memory counters  'PHYSICAL_MEMORY_BYTES'  and
 'VIRTUAL_MEMORY_BYTES'  are calculated? They are peaks of memory usage or
 cumulative usage?

 Thanks for help,






Running hadoop for processing sources in full sky maps

2013-07-12 Thread andrea zonca
Hi,

I have few tens of full sky maps, in binary format (FITS) of about 600MB each.

For each sky map I already have a catalog of the position of few
thousand sources, i.e. stars, galaxies, radio sources.

For each source I would like to:

open the full sky map
extract the relevant section, typically 20MB or less
run some statistics on them
aggregate the outputs to a catalog

I would like to run hadoop, possibly using python via the streaming
interface, to process them in parallel.

I think the input to the mapper should be each record of the catalogs,
then the python mapper can open the full sky map, do the processing
and print the output to stdout.

Is this a reasonable approach?
If so, I need to be able to configure hadoop so that a full sky map is
copied locally to the nodes that are processing one of its sources.
How can I achieve that?
Also, what is the best way to feed the input data to hadoop? for each
source I have a reference to the full sky map, latitude and longitude

Thanks,
I posted this question on StackOverflow:
http://stackoverflow.com/questions/17617654/running-hadoop-for-processing-sources-in-full-sky-maps

Regards,
Andrea Zonca


How to control of the output of /stacks

2013-07-12 Thread Shinichi Yamashita
Hi,

I can see the stack trace of the node when I access /stacks of Web UI.
And stack trace is output in the log file of the node, too.
Because the expansion of the log file and hard to see it, I don't want
to output it in a log file.
Is there the method to solve this problem?

Regards,
Shinichi Yamashita


Re: How are 'PHYSICAL_MEMORY_BYTES' and 'VIRTUAL_MEMORY_BYTES' calculated?

2013-07-12 Thread Vinod Kumar Vavilapalli

No, every so often, 3 seconds IIRC, it capture pmem and vmem which corresponds 
to the usage of the process and its children at *that* specific point of time. 
Cumulative = cumulative across the process and its children.

Thanks,
+Vinod

On Jul 12, 2013, at 1:47 PM, hadoop qi wrote:

 Thanks for the response. So they represent the total physical memory (virtual 
 memory) has been allocated to the job (e.g., from heap and stack) during its 
 entire life time? I am still confused how to get the cumulative number from 
 /proc/meminfo. I think from /proc/meminfo we can only get the memory usage of 
 a  process in a particular time point (looked like a snapshot of the status 
 of the process). If these numbers are added, the sum would be much more than 
 memory allocated to the program. 
 
 
 On Fri, Jul 12, 2013 at 6:47 AM, Shahab Yunus shahab.yu...@gmail.com wrote:
 I think they are cumulative but per task.
 
 Physical memory bytes
 (PHYSICAL_MEMORY_BYTES)
 The physical memory being used by a task in bytes, as reported by 
 /proc/meminfo.
 Virtual memory bytes
 (VIRTUAL_MEMORY_BYTES)
 The virtual memory being used by a task in bytes, as reported by 
 /proc/meminfo.
 
 This is from the Definitive Guide book. Page 260.
 
 Regards,
 Shhab
 
 
 On Thu, Jul 11, 2013 at 12:47 PM, hadoop qi hadoop@gmail.com wrote:
 Hello, 
 
 I am wondering how memory counters  'PHYSICAL_MEMORY_BYTES'  and 
 'VIRTUAL_MEMORY_BYTES'  are calculated? They are peaks of memory usage or 
 cumulative usage? 
 
 Thanks for help, 
 
 



Maven artifacts for 0.23.9

2013-07-12 Thread Eugene Dzhurinsky
Hello!

Where is it possible to get Maven artifacts for recent Hadoop release?

Thanks!
-- 
Eugene N Dzhurinsky


pgpuMe4RbtBeW.pgp
Description: PGP signature


Re: How to control of the output of /stacks

2013-07-12 Thread Harsh J
The logging has sometimes come useful in debugging (i.e. if the stack
on the UI went uncaptured, the log helps). It is currently not
specifically toggle-able. I suppose it is OK to set it as DEBUG
though. Can you file a JIRA for that please?

The only way you can disable it right now is by (brute-forcibly)
adding the below to the daemon's log4j.properties:

log4j.logger.org.apache.hadoop.http.HttpServer=WARN

Which may not be so ideal as we may miss other INFO from HttpServer generically.

On Sat, Jul 13, 2013 at 3:24 AM, Shinichi Yamashita
yamashita...@oss.nttdata.co.jp wrote:
 Hi,

 I can see the stack trace of the node when I access /stacks of Web UI.
 And stack trace is output in the log file of the node, too.
 Because the expansion of the log file and hard to see it, I don't want
 to output it in a log file.
 Is there the method to solve this problem?

 Regards,
 Shinichi Yamashita



-- 
Harsh J


Re:

2013-07-12 Thread Suresh Srinivas
Please use CDH mailing list. This is apache hadoop mailing list. 

Sent from phone

On Jul 12, 2013, at 7:51 PM, Anit Alexander anitama...@gmail.com wrote:

 Hello,
 
 I am encountering a problem in cdh4 environment. 
 I can successfully run the map reduce job in the hadoop cluster. But when i 
 migrated the same map reduce to my cdh4 environment it creates an error 
 stating that it cannot read the next block(each block is 64 mb). Why is that 
 so?
 
 Hadoop environment: hadoop 1.0.3
 java version 1.6
 
 chd4 environment: CDH4.2.0
 java version 1.6
 
 Regards,
 Anit Alexander