Failed to run wordcount on YARN
Hi I just start to try out hadoop2.0, I use the 2.0.5-alpha package And follow http://hadoop.apache.org/docs/r2.0.5-alpha/hadoop-project-dist/hadoop-common/ClusterSetup.html to setup a cluster in non-security mode. HDFS works fine with client tools. While when I run wordcount example, there are errors : ./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.0.5-alpha.jar wordcount /tmp /out 13/07/12 15:05:53 INFO mapreduce.Job: Task Id : attempt_1373609123233_0004_m_04_0, Status : FAILED Error: java.io.FileNotFoundException: Path is not a file: /tmp/hadoop-yarn at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:42) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1317) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1276) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1252) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1225) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:403) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:239) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:40728) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:454) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1014) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1741) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1737) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1478) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1735) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:525) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:90) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:57) at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:986) at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:974) at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:157) at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:124) at org.apache.hadoop.hdfs.DFSInputStream.init(DFSInputStream.java:117) at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1131) at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:244) at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:77) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:713) at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:89) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:519) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:756) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:158) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1478) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:153) I check the HDFS and found /tmp/hadoop-yarn is there , this dir's owner is the same as the job user. And to ensure it works, I also create /tmp/hadoop-yarn on local fs. None of it works. Any idea what might be the problem? Thx! Best Regards, Raymond Liu
Re: copy files from ftp to hdfs in parallel, distcp failed
Le 11/07/2013 20:47, Balaji Narayanan (பாலாஜி நாராயணன்) a écrit : multiple copy jobs to hdfs Thank you for your reply and the link. I read the link before, but I didn't find any examples about copying file from ftp to hdfs. There are about 20-40 file in my directory. I just want to move or copy that directory to hdfs on Amazon EC2. Actually, I am new to hadoop. I would like to know how to do multiple copy jobs to hdfs without distcp. Thank you again. -- Hao Ren ClaraVista www.claravista.fr
Re: how to add JournalNodes
You need to restart your NameNodes to get them to use the new QJM 5-host-set configs, and I think you can do that without downtime if you're already in HA mode by restarting one NN at a time. To add new JNs first though, you will currently have to rsync their directory from a good JN to get them into the cluster (i.e. rsync the data from a good one before you start the new JNs). They will not auto-join otherwise. On Fri, Jul 12, 2013 at 12:57 PM, lei liu liulei...@gmail.com wrote: I use QJM for HA in hadoop2.0, now there are three JournalNodes in HDFS cluster, I want to add two new JournalNodes to HDFS cluster, how to do it? Do I need to restart HDFS cluster? Thanks, LiuLei -- Harsh J
unsubscribe
RE: Taktracker in namenode failure
Both the map output value class configured and the output value written from the mapper is Text class. So there is no mismatch in the value class. But when the same MR program is run with 2 tasktrackers(without tasktracker in namenode) exception is not occuring. The problem is only with the tasktracker running in the namenode. Thanks Regards Ramya.S From: Devaraj k [mailto:devara...@huawei.com] Sent: Fri 7/12/2013 3:04 PM To: user@hadoop.apache.org Subject: RE: Taktracker in namenode failure Could you tell, what is the Map Output Value class you are configuring while submitting Job and what is the type of the value writing from the Mapper. If both of these mismatches then it will trow the below error. Thanks Devaraj k From: Ramya S [mailto:ram...@suntecgroup.com] Sent: 12 July 2013 14:46 To: user@hadoop.apache.org Subject: Taktracker in namenode failure Hi, Why only tasktracker in namenode faill during job execution with error. I have attached the snapshot of error screen with this mail java.io.IOException: Type mismatch in value from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.IntWritable at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1019) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:691) at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80) at WordCount$TokenizerMapper.map(WordCount.java:30) at WordCount$TokenizerMapper.map(WordCount.java:19) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:416) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) at org.apache.hadoop.mapred.Child.main(Child.java:249) but this same task is reassigned to another tasktracker and getting executed. why? Best Regards, Ramya winmail.dat
RE: Taktracker in namenode failure
I think, there is mismatch of jar’s coming in the classpath for the map tasks when it runs in different machines. You can find out this, by giving some unique name for your Mapper class, Job Submit class and then submit the Job. Thanks Devaraj k From: Ramya S [mailto:ram...@suntecgroup.com] Sent: 12 July 2013 15:27 To: user@hadoop.apache.org Subject: RE: Taktracker in namenode failure Both the map output value class configured and the output value written from the mapper is Text class. So there is no mismatch in the value class. But when the same MR program is run with 2 tasktrackers(without tasktracker in namenode) exception is not occuring. The problem is only with the tasktracker running in the namenode. Thanks Regards Ramya.S From: Devaraj k [mailto:devara...@huawei.com] Sent: Fri 7/12/2013 3:04 PM To: user@hadoop.apache.orgmailto:user@hadoop.apache.org Subject: RE: Taktracker in namenode failure Could you tell, what is the Map Output Value class you are configuring while submitting Job and what is the type of the value writing from the Mapper. If both of these mismatches then it will trow the below error. Thanks Devaraj k From: Ramya S [mailto:ram...@suntecgroup.com] Sent: 12 July 2013 14:46 To: user@hadoop.apache.orgmailto:user@hadoop.apache.org Subject: Taktracker in namenode failure Hi, Why only tasktracker in namenode faill during job execution with error. I have attached the snapshot of error screen with this mail java.io.IOException: Type mismatch in value from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.IntWritable at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1019) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:691) at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80) at WordCount$TokenizerMapper.map(WordCount.java:30) at WordCount$TokenizerMapper.map(WordCount.java:19) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:416) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) at org.apache.hadoop.mapred.Child.main(Child.java:249) but this same task is reassigned to another tasktracker and getting executed. why? Best Regards, Ramya
Re: Staging directory ENOTDIR error.
Hi jay, what hadoop command you are given. Hi, From, Ramesh. On Fri, Jul 12, 2013 at 7:54 AM, Devaraj k devara...@huawei.com wrote: Hi Jay, ** ** Here client is trying to create a staging directory in local file system, which actually should create in HDFS. ** ** Could you check whether do you have configured “fs.defaultFS” configuration in client with the HDFS. ** ** Thanks Devaraj k ** ** *From:* Jay Vyas [mailto:jayunit...@gmail.com] *Sent:* 12 July 2013 04:12 *To:* common-u...@hadoop.apache.org *Subject:* Staging directory ENOTDIR error. ** ** Hi , I'm getting an ungoogleable exception, never seen this before. This is on a hadoop 1.1. cluster... It appears that its permissions related... Any thoughts as to how this could crop up? I assume its a bug in my filesystem, but not sure. 13/07/11 18:39:43 ERROR security.UserGroupInformation: PriviledgedActionException as:root cause:ENOTDIR: Not a directory ENOTDIR: Not a directory at org.apache.hadoop.io.nativeio.NativeIO.chmod(Native Method) at org.apache.hadoop.fs.FileUtil.execSetPermission(FileUtil.java:699) at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:654) at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:509) at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:344) at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:189) at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:116) -- Jay Vyas http://jayunit100.blogspot.com
Re: copy files from ftp to hdfs in parallel, distcp failed
Hi, Please configure the following in core-ste.xml and try. Use hadoop fs -ls file:/// -- to display local file system files Use hadoop fs -ls ftp://your ftp location -- to display ftp files if it is listing files go for distcp. reference from http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/core-default.xml fs.ftp.host0.0.0.0FTP filesystem connects to this serverfs.ftp.host.port21FTP filesystem connects to fs.ftp.host on this port and try to set the property also reference from hadoop definitive guide hadoop file system. Filesystem URI scheme Java implementation Description (all under org.apache.hadoop) FTP ftp fs.ftp.FTPFileSystem A filesystem backed by an FTP server. Hi, From, Ramesh. On Fri, Jul 12, 2013 at 1:04 PM, Hao Ren h@claravista.fr wrote: Le 11/07/2013 20:47, Balaji Narayanan (பாலாஜி நாராயணன்) a écrit : multiple copy jobs to hdfs Thank you for your reply and the link. I read the link before, but I didn't find any examples about copying file from ftp to hdfs. There are about 20-40 file in my directory. I just want to move or copy that directory to hdfs on Amazon EC2. Actually, I am new to hadoop. I would like to know how to do multiple copy jobs to hdfs without distcp. Thank you again. -- Hao Ren ClaraVista www.claravista.fr
Re: Taktracker in namenode failure
Hi, The problem is with jar file only, to check run any other MR job or sample wordcount job on namenode tasktracker, if it is running no problem with namenode tasktracker, if not running there may be problem with tasktracker configuration, then compare with other node tasktracker configuration. i.e tasktracker configuration means mapred configuration. Hi, From, Ramesh. On Fri, Jul 12, 2013 at 3:37 PM, Devaraj k devara...@huawei.com wrote: I think, there is mismatch of jar’s coming in the classpath for the map tasks when it runs in different machines. You can find out this, by giving some unique name for your Mapper class, Job Submit class and then submit the Job. ** ** Thanks Devaraj k ** ** *From:* Ramya S [mailto:ram...@suntecgroup.com] *Sent:* 12 July 2013 15:27 *To:* user@hadoop.apache.org *Subject:* RE: Taktracker in namenode failure ** ** Both the map output value class configured and the output value written from the mapper is Text class. So there is no mismatch in the value class. But when the same MR program is run with 2 tasktrackers(without tasktracker in namenode) exception is not occuring. The problem is only with the tasktracker running in the namenode. *Thanks Regards* *Ramya.S* ** ** -- *From:* Devaraj k [mailto:devara...@huawei.com devara...@huawei.com] *Sent:* Fri 7/12/2013 3:04 PM *To:* user@hadoop.apache.org *Subject:* RE: Taktracker in namenode failure Could you tell, what is the Map Output Value class you are configuring while submitting Job and what is the type of the value writing from the Mapper. If both of these mismatches then it will trow the below error. Thanks Devaraj k *From:* Ramya S [mailto:ram...@suntecgroup.com ram...@suntecgroup.com] *Sent:* 12 July 2013 14:46 *To:* user@hadoop.apache.org *Subject:* Taktracker in namenode failure Hi, Why only tasktracker in namenode faill during job execution with error.** ** I have attached the snapshot of error screen with this mail java.io.IOException: Type mismatch in value from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.IntWritable at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1019) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:691) at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80) at WordCount$TokenizerMapper.map(WordCount.java:30) at WordCount$TokenizerMapper.map(WordCount.java:19) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:416) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) at org.apache.hadoop.mapred.Child.main(Child.java:249) but this same task is reassigned to another tasktracker and getting executed. why? *Best Regards,* *Ramya*
Hadoop property precedence
Hi, Suppose block size set in configuration file at client side is 64MB, block size set in configuration file at name node side is 128MB and block size set in configuration file at datanode side is something else. Please advice, If the client is writing a file to hdfs,which property would be executed. Thanks, Shalish.
UNSUBSCRIBE
Re: How are 'PHYSICAL_MEMORY_BYTES' and 'VIRTUAL_MEMORY_BYTES' calculated?
They are the running metrics. While the task is running, they will tell you how much pmem/vmem it is using at that point of time. Obviously at the end of job, it will be the last snapshot. Thanks, +Vinod On Jul 12, 2013, at 6:47 AM, Shahab Yunus wrote: I think they are cumulative but per task. Physical memory bytes (PHYSICAL_MEMORY_BYTES) The physical memory being used by a task in bytes, as reported by /proc/meminfo. Virtual memory bytes (VIRTUAL_MEMORY_BYTES) The virtual memory being used by a task in bytes, as reported by /proc/meminfo. This is from the Definitive Guide book. Page 260. Regards, Shhab On Thu, Jul 11, 2013 at 12:47 PM, hadoop qi hadoop@gmail.com wrote: Hello, I am wondering how memory counters 'PHYSICAL_MEMORY_BYTES' and 'VIRTUAL_MEMORY_BYTES' are calculated? They are peaks of memory usage or cumulative usage? Thanks for help,
Re: UNSUBSCRIBE
Please send an email to user-unsubscr...@hadoop.apache.org to unsubscribe. Thanks, Surendra On Fri, Jul 12, 2013 at 10:24 AM, Brent Nikolaus bnikol...@gmail.comwrote:
Re: Staging directory ENOTDIR error.
This was a very odd error - it turns out that i had created a file, called tmp in my fs root directory, which meant that when the jobs were trying to write to the tmp directory, they ran into the not-a-dir exception. In any case, I think the error reporting in NativeIO class should be revised. On Thu, Jul 11, 2013 at 10:24 PM, Devaraj k devara...@huawei.com wrote: Hi Jay, ** ** Here client is trying to create a staging directory in local file system, which actually should create in HDFS. ** ** Could you check whether do you have configured “fs.defaultFS” configuration in client with the HDFS. ** ** Thanks Devaraj k ** ** *From:* Jay Vyas [mailto:jayunit...@gmail.com] *Sent:* 12 July 2013 04:12 *To:* common-u...@hadoop.apache.org *Subject:* Staging directory ENOTDIR error. ** ** Hi , I'm getting an ungoogleable exception, never seen this before. This is on a hadoop 1.1. cluster... It appears that its permissions related... Any thoughts as to how this could crop up? I assume its a bug in my filesystem, but not sure. 13/07/11 18:39:43 ERROR security.UserGroupInformation: PriviledgedActionException as:root cause:ENOTDIR: Not a directory ENOTDIR: Not a directory at org.apache.hadoop.io.nativeio.NativeIO.chmod(Native Method) at org.apache.hadoop.fs.FileUtil.execSetPermission(FileUtil.java:699) at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:654) at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:509) at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:344) at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:189) at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:116) -- Jay Vyas http://jayunit100.blogspot.com -- Jay Vyas http://jayunit100.blogspot.com
Re: how to get hadoop HDFS path?
You can get the hdfs file system as follows Configuration conf = new Configuration(); conf.addResource(new Path(/home/dpancras/TradeStation/CassandraPigHadoop/WebContent/WEB-INF/core-site.xml)); conf.addResource(new Path(/home/dpancras/TradeStation/CassandraPigHadoop/WebContent/WEB-INF/hdfs-site.xml)); FileSystem fs = FileSystem.get(conf); On Fri, Jul 12, 2013 at 4:40 AM, ch huang justlo...@gmail.com wrote: i want set hdfs path ,AND add the path into hbase,here is my code Path path = new Path(hdfs:192.168.10.22:9000/alex/test.jar); System.out.println(: +path.toString()+|+TestMyCo.class.getCanonicalName()+|+Coprocessor.PRIORITY_USER); htd.setValue(COPROCESSOR$1, path.toString()+| + TestMyCo.class.getCanonicalName()+|+Coprocessor.PRIORITY_USER); and the real value which i find is hbase(main):012:0 describe 'mytest' DESCRIPTION ENABLED {NAME = 'mytest', COPROCESSOR$1 = 'hdfs:/ 192.168.10.22:9000/alex/test.jar|TestMyCo|1073741823http://192.168.10.22:9000/alex/test.jar%7CTestMyCo%7C1073741823', FAMILIES = [{N true AME = 'myfl', DATA_BLOCK_ENCODING = 'NONE', BLOOMFILTER = 'NONE', REPLICATION_SCOPE = '0', VERSIONS = '3', C OMPRESSION = 'NONE', MIN_VERSIONS = '0', TTL = '2147483647', KEEP_DELETED_CELLS = 'false', BLOCKSIZE = '6553 6', IN_MEMORY = 'false', ENCODE_ON_DISK = 'true', BLOCKCACHE = 'true'}]} 1 row(s) in 0.0930 seconds -- Thanks Regards Deepak Rosario Pancras *Achiever/Responsibility/Arranger/Maximizer/Harmony*
Re: how to get hadoop HDFS path?
Configuration conf = new Configuration(); conf.addResource(new Path(/home/dpancras/TradeStation/CassandraPigHadoop/WebContent/WEB-INF/core-site.xml)); conf.addResource(new Path(/home/dpancras/TradeStation/CassandraPigHadoop/WebContent/WEB-INF/hdfs-site.xml)); FileSystem fs = FileSystem.get(conf); Path path = new Path(/alex/test.jarhttp://192.168.10.22:9000/alex/test.jar); //Use relative path here System.out.println(: +path.toString()+|+TestMyCo.class. getCanonicalName()+|+Coprocessor.PRIORITY_USER); htd.setValue(COPROCESSOR$1, path.toString()+| + TestMyCo.class.getCanonicalName()+|+Coprocessor.PRIORITY_USER); On Fri, Jul 12, 2013 at 2:00 PM, deepak rosario tharigopla rozartharigo...@gmail.com wrote: You can get the hdfs file system as follows Configuration conf = new Configuration(); conf.addResource(new Path(/home/dpancras/TradeStation/CassandraPigHadoop/WebContent/WEB-INF/core-site.xml)); conf.addResource(new Path(/home/dpancras/TradeStation/CassandraPigHadoop/WebContent/WEB-INF/hdfs-site.xml)); FileSystem fs = FileSystem.get(conf); On Fri, Jul 12, 2013 at 4:40 AM, ch huang justlo...@gmail.com wrote: i want set hdfs path ,AND add the path into hbase,here is my code Path path = new Path(hdfs:192.168.10.22:9000/alex/test.jar); System.out.println(: +path.toString()+|+TestMyCo.class.getCanonicalName()+|+Coprocessor.PRIORITY_USER); htd.setValue(COPROCESSOR$1, path.toString()+| + TestMyCo.class.getCanonicalName()+|+Coprocessor.PRIORITY_USER); and the real value which i find is hbase(main):012:0 describe 'mytest' DESCRIPTION ENABLED {NAME = 'mytest', COPROCESSOR$1 = 'hdfs:/ 192.168.10.22:9000/alex/test.jar|TestMyCo|1073741823http://192.168.10.22:9000/alex/test.jar%7CTestMyCo%7C1073741823', FAMILIES = [{N true AME = 'myfl', DATA_BLOCK_ENCODING = 'NONE', BLOOMFILTER = 'NONE', REPLICATION_SCOPE = '0', VERSIONS = '3', C OMPRESSION = 'NONE', MIN_VERSIONS = '0', TTL = '2147483647', KEEP_DELETED_CELLS = 'false', BLOCKSIZE = '6553 6', IN_MEMORY = 'false', ENCODE_ON_DISK = 'true', BLOCKCACHE = 'true'}]} 1 row(s) in 0.0930 seconds -- Thanks Regards Deepak Rosario Pancras *Achiever/Responsibility/Arranger/Maximizer/Harmony* -- Thanks Regards Deepak Rosario Pancras *Achiever/Responsibility/Arranger/Maximizer/Harmony*
Re: How are 'PHYSICAL_MEMORY_BYTES' and 'VIRTUAL_MEMORY_BYTES' calculated?
Thanks for the response. So they represent the total physical memory (virtual memory) has been allocated to the job (e.g., from heap and stack) during its entire life time? I am still confused how to get the cumulative number from /proc/meminfo. I think from /proc/meminfo we can only get the memory usage of a process in a particular time point (looked like a snapshot of the status of the process). If these numbers are added, the sum would be much more than memory allocated to the program. On Fri, Jul 12, 2013 at 6:47 AM, Shahab Yunus shahab.yu...@gmail.comwrote: I think they are cumulative but per task. Physical memory bytes (PHYSICAL_MEMORY_BYTES) The physical memory being used by a task in bytes, as reported by /proc/meminfo. Virtual memory bytes (VIRTUAL_MEMORY_BYTES) The virtual memory being used by a task in bytes, as reported by /proc/meminfo. This is from the Definitive Guide book. Page 260. Regards, Shhab On Thu, Jul 11, 2013 at 12:47 PM, hadoop qi hadoop@gmail.com wrote: Hello, I am wondering how memory counters 'PHYSICAL_MEMORY_BYTES' and 'VIRTUAL_MEMORY_BYTES' are calculated? They are peaks of memory usage or cumulative usage? Thanks for help,
Re: How are 'PHYSICAL_MEMORY_BYTES' and 'VIRTUAL_MEMORY_BYTES' calculated?
As Vinod Kumar Vavilapalli they are indeed snapshots in point and time. So they are neither the peak usage from the whole duration of the job, nor cumulative aggregate that increases over time. Regards, Shahab On Fri, Jul 12, 2013 at 4:47 PM, hadoop qi hadoop@gmail.com wrote: Thanks for the response. So they represent the total physical memory (virtual memory) has been allocated to the job (e.g., from heap and stack) during its entire life time? I am still confused how to get the cumulative number from /proc/meminfo. I think from /proc/meminfo we can only get the memory usage of a process in a particular time point (looked like a snapshot of the status of the process). If these numbers are added, the sum would be much more than memory allocated to the program. On Fri, Jul 12, 2013 at 6:47 AM, Shahab Yunus shahab.yu...@gmail.comwrote: I think they are cumulative but per task. Physical memory bytes (PHYSICAL_MEMORY_BYTES) The physical memory being used by a task in bytes, as reported by /proc/meminfo. Virtual memory bytes (VIRTUAL_MEMORY_BYTES) The virtual memory being used by a task in bytes, as reported by /proc/meminfo. This is from the Definitive Guide book. Page 260. Regards, Shhab On Thu, Jul 11, 2013 at 12:47 PM, hadoop qi hadoop@gmail.com wrote: Hello, I am wondering how memory counters 'PHYSICAL_MEMORY_BYTES' and 'VIRTUAL_MEMORY_BYTES' are calculated? They are peaks of memory usage or cumulative usage? Thanks for help,
Running hadoop for processing sources in full sky maps
Hi, I have few tens of full sky maps, in binary format (FITS) of about 600MB each. For each sky map I already have a catalog of the position of few thousand sources, i.e. stars, galaxies, radio sources. For each source I would like to: open the full sky map extract the relevant section, typically 20MB or less run some statistics on them aggregate the outputs to a catalog I would like to run hadoop, possibly using python via the streaming interface, to process them in parallel. I think the input to the mapper should be each record of the catalogs, then the python mapper can open the full sky map, do the processing and print the output to stdout. Is this a reasonable approach? If so, I need to be able to configure hadoop so that a full sky map is copied locally to the nodes that are processing one of its sources. How can I achieve that? Also, what is the best way to feed the input data to hadoop? for each source I have a reference to the full sky map, latitude and longitude Thanks, I posted this question on StackOverflow: http://stackoverflow.com/questions/17617654/running-hadoop-for-processing-sources-in-full-sky-maps Regards, Andrea Zonca
How to control of the output of /stacks
Hi, I can see the stack trace of the node when I access /stacks of Web UI. And stack trace is output in the log file of the node, too. Because the expansion of the log file and hard to see it, I don't want to output it in a log file. Is there the method to solve this problem? Regards, Shinichi Yamashita
Re: How are 'PHYSICAL_MEMORY_BYTES' and 'VIRTUAL_MEMORY_BYTES' calculated?
No, every so often, 3 seconds IIRC, it capture pmem and vmem which corresponds to the usage of the process and its children at *that* specific point of time. Cumulative = cumulative across the process and its children. Thanks, +Vinod On Jul 12, 2013, at 1:47 PM, hadoop qi wrote: Thanks for the response. So they represent the total physical memory (virtual memory) has been allocated to the job (e.g., from heap and stack) during its entire life time? I am still confused how to get the cumulative number from /proc/meminfo. I think from /proc/meminfo we can only get the memory usage of a process in a particular time point (looked like a snapshot of the status of the process). If these numbers are added, the sum would be much more than memory allocated to the program. On Fri, Jul 12, 2013 at 6:47 AM, Shahab Yunus shahab.yu...@gmail.com wrote: I think they are cumulative but per task. Physical memory bytes (PHYSICAL_MEMORY_BYTES) The physical memory being used by a task in bytes, as reported by /proc/meminfo. Virtual memory bytes (VIRTUAL_MEMORY_BYTES) The virtual memory being used by a task in bytes, as reported by /proc/meminfo. This is from the Definitive Guide book. Page 260. Regards, Shhab On Thu, Jul 11, 2013 at 12:47 PM, hadoop qi hadoop@gmail.com wrote: Hello, I am wondering how memory counters 'PHYSICAL_MEMORY_BYTES' and 'VIRTUAL_MEMORY_BYTES' are calculated? They are peaks of memory usage or cumulative usage? Thanks for help,
Maven artifacts for 0.23.9
Hello! Where is it possible to get Maven artifacts for recent Hadoop release? Thanks! -- Eugene N Dzhurinsky pgpuMe4RbtBeW.pgp Description: PGP signature
Re: How to control of the output of /stacks
The logging has sometimes come useful in debugging (i.e. if the stack on the UI went uncaptured, the log helps). It is currently not specifically toggle-able. I suppose it is OK to set it as DEBUG though. Can you file a JIRA for that please? The only way you can disable it right now is by (brute-forcibly) adding the below to the daemon's log4j.properties: log4j.logger.org.apache.hadoop.http.HttpServer=WARN Which may not be so ideal as we may miss other INFO from HttpServer generically. On Sat, Jul 13, 2013 at 3:24 AM, Shinichi Yamashita yamashita...@oss.nttdata.co.jp wrote: Hi, I can see the stack trace of the node when I access /stacks of Web UI. And stack trace is output in the log file of the node, too. Because the expansion of the log file and hard to see it, I don't want to output it in a log file. Is there the method to solve this problem? Regards, Shinichi Yamashita -- Harsh J
Re:
Please use CDH mailing list. This is apache hadoop mailing list. Sent from phone On Jul 12, 2013, at 7:51 PM, Anit Alexander anitama...@gmail.com wrote: Hello, I am encountering a problem in cdh4 environment. I can successfully run the map reduce job in the hadoop cluster. But when i migrated the same map reduce to my cdh4 environment it creates an error stating that it cannot read the next block(each block is 64 mb). Why is that so? Hadoop environment: hadoop 1.0.3 java version 1.6 chd4 environment: CDH4.2.0 java version 1.6 Regards, Anit Alexander