split big files into small ones to later copy
I have one 500GB plain-text file in HDFS, and I want to copy locally, to zip it and put it on another machine in a local disk. The problem is that I don't have enough space in the local disk where HDFS is, to then zip it and transfer to another host. Can I split the file into small files to be able to copy to the local disk? Any suggestions on how to do a copy? -- Best regards,
Re: Why my tests shows Yarn is worse than MRv1 for terasort?
Hey Sam, Thanks for sharing your results. I'm definitely curious about what's causing the difference. A couple observations: It looks like you've got yarn.nodemanager.resource.memory-mb in there twice with two different values. Your max JVM memory of 1000 MB is (dangerously?) close to the default mapreduce.map/reduce.memory.mb of 1024 MB. Are any of your tasks getting killed for running over resource limits? -Sandy On Thu, Jun 6, 2013 at 10:21 PM, sam liu samliuhad...@gmail.com wrote: The terasort execution log shows that reduce spent about 5.5 mins from 33% to 35% as below. 13/06/10 08:02:22 INFO mapreduce.Job: map 100% reduce 31% 13/06/10 08:02:25 INFO mapreduce.Job: map 100% reduce 32% 13/06/10 *08:02:46* INFO mapreduce.Job: map 100% reduce 33% 13/06/10 *08:08:16* INFO mapreduce.Job: map 100% reduce 35% 13/06/10 08:08:19 INFO mapreduce.Job: map 100% reduce 40% 13/06/10 08:08:22 INFO mapreduce.Job: map 100% reduce 43% Any way, below are my configurations for your reference. Thanks! *(A) core-site.xml* only define 'fs.default.name' and 'hadoop.tmp.dir' *(B) hdfs-site.xml* property namedfs.replication/name value1/value /property property namedfs.name.dir/name value/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_name_dir/value /property property namedfs.data.dir/name value/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_data_dir/value /property property namedfs.block.size/name value134217728/value!-- 128MB -- /property property namedfs.namenode.handler.count/name value64/value /property property namedfs.datanode.handler.count/name value10/value /property *(C) mapred-site.xml* property namemapreduce.cluster.temp.dir/name value/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_temp/value descriptionNo description/description finaltrue/final /property property namemapreduce.cluster.local.dir/name value/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_local_dir/value descriptionNo description/description finaltrue/final /property property namemapreduce.child.java.opts/name value-Xmx1000m/value /property property namemapreduce.framework.name/name valueyarn/value /property property namemapreduce.tasktracker.map.tasks.maximum/name value8/value /property property namemapreduce.tasktracker.reduce.tasks.maximum/name value4/value /property property namemapreduce.tasktracker.outofband.heartbeat/name valuetrue/value /property *(D) yarn-site.xml* property nameyarn.resourcemanager.resource-tracker.address/name valuenode1:18025/value descriptionhost is the hostname of the resource manager and port is the port on which the NodeManagers contact the Resource Manager. /description /property property descriptionThe address of the RM web application./description nameyarn.resourcemanager.webapp.address/name valuenode1:18088/value /property property nameyarn.resourcemanager.scheduler.address/name valuenode1:18030/value descriptionhost is the hostname of the resourcemanager and port is the port on which the Applications in the cluster talk to the Resource Manager. /description /property property nameyarn.resourcemanager.address/name valuenode1:18040/value descriptionthe host is the hostname of the ResourceManager and the port is the port on which the clients can talk to the Resource Manager. /description /property property nameyarn.nodemanager.local-dirs/name value/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_local_dir/value descriptionthe local directories used by the nodemanager/description /property property nameyarn.nodemanager.address/name value0.0.0.0:18050/value descriptionthe nodemanagers bind to this port/description /property property nameyarn.nodemanager.resource.memory-mb/name value10240/value descriptionthe amount of memory on the NodeManager in GB/description /property property nameyarn.nodemanager.remote-app-log-dir/name value/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_app-logs/value descriptiondirectory on hdfs where the application logs are moved to /description /property property nameyarn.nodemanager.log-dirs/name value/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_log/value descriptionthe directories used by Nodemanagers as log directories/description /property property nameyarn.nodemanager.aux-services/name valuemapreduce.shuffle/value descriptionshuffle service that needs to be set for Map Reduce to run /description /property property nameyarn.resourcemanager.client.thread-count/name value64/value /property property nameyarn.nodemanager.resource.cpu-cores/name value24/value /property property
Re: Issue with -libjars option in cluster in Hadoop 1.0
On 06/06/2013 08:09 PM, Shahab Yunus wrote: It is trying to read the JSON4J.jar from local/home/hadoop. Does that jar exist at this path on the client from which you are invoking it? Does this jar in the current dir from which your are kicking off the job? Yes and yes. In fact, the job goes through because it runs fine on the master, which is where I'm starting this. It just fails on the slave. So I'm starting the job in /local/home/hadoop on the master with this: hadoop jar mrtest.jar my.MRTestJob -libjars JSON4J.jar in out It doesn't matter if I use an absolute or relative path, file:// protocol or not, the result is always the same. On Thu, Jun 6, 2013 at 1:33 PM, Thilo Goetz twgo...@gmx.de mailto:twgo...@gmx.de wrote: On 06/06/2013 06:58 PM, Shahab Yunus wrote: Are you following the guidelines as mentioned here: http://grepalex.com/2013/02/__25/hadoop-libjars/ http://grepalex.com/2013/02/25/hadoop-libjars/ Now I am, so thanks for that :-) Still doesn't work though. Following the hint in that post I looked at the job config, which has this: tmpjars file:/local/home/hadoop/__JSON4J.jar I assume that's the correct value. Any other ideas? --Thilo Regards, Shahab On Thu, Jun 6, 2013 at 12:51 PM, Thilo Goetz twgo...@gmx.de mailto:twgo...@gmx.de mailto:twgo...@gmx.de mailto:twgo...@gmx.de wrote: Hi all, I'm using hadoop 1.0 (yes it's old, but there is nothing I can do about that). I have some M/R programs what work perfectly on a single node setup. However, they consistently fail in the cluster I have available. I have tracked this down to the fact that extra jars I include on the command line with -libjars are not available on the slaves. I get FileNotFoundExceptions for those jars. For example, I run this: hadoop jar mrtest.jar my.MRTestJob -libjars JSON4J.jar in out The I get (on the slave): java.io.FileNotFoundException: File /local/home/hadoop/JSON4J.jar does not exist. at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:397) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:251) at org.apache.hadoop.filecache.TaskDistributedCacheManager.setupCache(TaskDistributedCac\ heManager.java:179) at org.apache.hadoop.mapred.TaskTracker$4.run(TaskTracker.java:1193) at java.security.AccessController.doPrivileged(AccessController.java:284) at javax.security.auth.Subject.doAs(Subject.java:573) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1128) at org.apache.hadoop.mapred.TaskTracker.initializeJob(TaskTracker.java:1184) at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:1099) at org.apache.hadoop.mapred.TaskTracker$5.run(TaskTracker.java:2382) at java.lang.Thread.run(Thread.java:736) Where /local/home/hadoop is where I ran the code on the master. As far as I can tell from my internet research, this is supposed to work in hadoop 1.0, correct? It may well be that the cluster is somehow misconfigured (didn't set it up myself), so I would appreciate any hints as to what I should be looking at in terms of configuration. Oh and btw, the fat jar approach where I put all classes required by the M/R code in the main jar works perfectly. However, I would like to avoid that if I possibly can. Any help appreciated! --Thilo
Re: Is counter a static var
Is counter like a static var. If so is it persisted on the name node or data node. Any input please. Thanks Sai
Re: Is it possible to define num of mappers to run for a job
Is it possible to define num of mappers to run for a job. What r the conditions we need to be aware of when defining such a thing. Please help. Thanks Sai
Re: Pool slot questions
1. Can we think of a job pool similar to a queue. 2. Is it possible to configure a slot if so how. Please help. Thanks Sai
question about LinuxResourceCalculatorPlugin
Hi, LinuxResourceCalculatorPlugin and ProcfsBasedProcessTree get info about memory and cpu, but who use this parameters and why? I want try to make analog for freebsd.
protobuf.ServiceException: OutOfMemoryError
Hi all I find that some if my DNs go to dead. and the datanode log shows as [1]: I got the java.lang.OutOfMemoryError: Java heap space. I wander how this could come out. [1]: 1496 at file /home/hadoop/datadir/current/BP-471453121-172.16.250.16-1369298226760/current/finalized/subdir56/subdir46/blk_814336670004895 2013-06-07 13:49:35,526 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Deleted block BP-471453121-172.16.250.16-1369298226760 blk_7448905973553274685_691506 at file /home/hadoop/datadir/current/BP-471453121-172.16.250.16-1369298226760/current/finalized/subdir56/subdir46/blk_7448905973553274685 2013-06-07 13:49:35,526 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Deleted block BP-471453121-172.16.250.16-1369298226760 blk_-1754381991139616539_691508 at file /home/hadoop/datadir/current/BP-471453121-172.16.250.16-1369298226760/current/finalized/subdir56/subdir46/blk_-1754381991139616539 2013-06-07 13:49:35,527 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Deleted block BP-471453121-172.16.250.16-1369298226760 blk_-3379745807160766937_691492 at file /home/hadoop/datadir/current/BP-471453121-172.16.250.16-1369298226760/current/finalized/subdir56/subdir46/blk_-3379745807160766937 2013-06-07 13:51:44,231 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: IOException in offerService java.io.IOException: com.google.protobuf.ServiceException: java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.ipc.ProtobufHelper.getRemoteException(ProtobufHelper.java:47) at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.blockReport(DatanodeProtocolClientSideTranslatorPB.java:203) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.blockReport(BPServiceActor.java:399) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:551) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:674) at java.lang.Thread.run(Thread.java:662) Caused by: com.google.protobuf.ServiceException: java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:212) at $Proxy10.blockReport(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83) at $Proxy10.blockReport(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.blockReport(DatanodeProtocolClientSideTranslatorPB.java:201) ... 4 more Caused by: java.lang.OutOfMemoryError: Java heap space 2013-06-07 13:51:46,983 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Unexpected exception in block pool Block pool BP-471453121-172.16.250.16-1369298226760 (storage id DS-1482330176-172.16.250.19-50010-1369298284371) service to wxossetl2/ 172.16.250.16:8020 java.lang.OutOfMemoryError: Java heap space 2013-06-07 13:51:46,983 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service for: Block pool BP-471453121-172.16.250.16-1369298226760 (storage id DS-1482330176-172.16.250.19-50010-1369298284371) service to wxossetl2/ 172.16.250.16:8020 2013-06-07 13:51:47,090 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Removed Block pool BP-471453121-172.16.250.16-1369298226760 (storage id DS-1482330176-172.16.250.19-50010-1369298284371) 2013-06-07 13:51:47,090 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Removed bpid=BP-471453121-172.16.250.16-1369298226760 from blockPoolScannerMap 2013-06-07 13:51:47,090 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Removing block pool BP-471453121-172.16.250.16-1369298226760 2013-06-07 13:51:49,091 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting Datanode 2013-06-07 13:51:49,098 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 0 2013-06-07 13:51:49,101 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG: / SHUTDOWN_MSG: Shutting down DataNode at wxossetl5/172.16.250.19
Re: Pool slot questions
Sai, This is regarding all your recent emails and questions. I suggest that you read Hadoop: The Definitive Guide by Tom White ( http://www.amazon.com/Hadoop-Definitive-Guide-Tom-White/dp/1449311520) as it goes through all of your queries in detail and with examples. The questions that you are asking are pretty basic and the answers are available and well documented all over the web. In parallel you can also download the code which is free and easily available and start looking into them. Regards, Shahab On Fri, Jun 7, 2013 at 8:02 AM, Sai Sai saigr...@yahoo.in wrote: 1. Can we think of a job pool similar to a queue. 2. Is it possible to configure a slot if so how. Please help. Thanks Sai
RE: Why/When partitioner is used.
There are kind of two parts to this. The semantics of MapReduce promise that all tuples sharing the same key value are sent to the same reducer, so that you can write useful MR applications that do things like “count words” or “summarize by date”. In order to accomplish that, the shuffle phase of MR performs a partitioning by key to move tuples sharing the same key to the same node where they can be processed together. You can think of key-partitioning as a strategy that assists in parallel distributed sorting. john From: Sai Sai [mailto:saigr...@yahoo.in] Sent: Friday, June 07, 2013 5:17 AM To: user@hadoop.apache.org Subject: Re: Why/When partitioner is used. I always get confused why we should partition and what is the use of it. Why would one want to send all the keys starting with A to Reducer1 and B to R2 and so on... Is it just to parallelize the reduce process. Please help. Thanks Sai
what is the Interval of BlockPoolSliceScanner.
Hi all I find that when datanode startup it will run BlockPoolSliceScanner to verificate the BP. What is the Interval of BlockPoolSliceScanner, and when does the BlockPoolSliceScanner begin to work? And
DirectoryScanner's OutOfMemoryError
Hi All I have found that the DirectoryScanner gets error: Error compiling report because of java.lang.OutOfMemoryError: Java heap space. The log details are as [1]: How does the error come out ,and how to solve this exception? [1] 2013-06-07 22:20:28,199 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Took 2737ms to process 1 commands from NN 2013-06-07 22:20:28,199 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Deleted block BP-471453121-172.16.250.16-1369298226760 blk_1870928037426403148_709040 at file /home/hadoop/datadir/current/BP-471453121-172.16.250.16-1369298226760/current/finalized/subdir24/subdir25/blk_1870928037426403148 2013-06-07 22:20:28,199 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Deleted block BP-471453121-172.16.250.16-1369298226760 blk_3693882010743127822_709044 at file /home/hadoop/datadir/current/BP-471453121-172.16.250.16-1369298226760/current/finalized/subdir24/subdir25/blk_3693882010743127822 2013-06-07 22:20:28,743 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Deleted block BP-471453121-172.16.250.16-1369298226760 blk_-5452984265504491579_709036 at file /home/hadoop/datadir/current/BP-471453121-172.16.250.16-1369298226760/current/finalized/subdir24/subdir25/blk_-5452984265504491579 2013-06-07 22:20:28,743 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Deleted block BP-471453121-172.16.250.16-1369298226760 blk_-1078215880381545528_709050 at file /home/hadoop/datadir/current/BP-471453121-172.16.250.16-1369298226760/current/finalized/subdir24/subdir25/blk_-1078215880381545528 2013-06-07 22:20:28,744 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Deleted block BP-471453121-172.16.250.16-1369298226760 blk_8107220088215975918_709064 at file /home/hadoop/datadir/current/BP-471453121-172.16.250.16-1369298226760/current/finalized/subdir24/subdir25/blk_8107220088215975918 2013-06-07 22:20:29,278 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Deleted block BP-471453121-172.16.250.16-1369298226760 blk_-3527717187851336238_709052 at file /home/hadoop/datadir/current/BP-471453121-172.16.250.16-1369298226760/current/finalized/subdir24/subdir25/blk_-3527717187851336238 2013-06-07 22:20:29,812 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Deleted block BP-471453121-172.16.250.16-1369298226760 blk_1766998327682981895_709042 at file /home/hadoop/datadir/current/BP-471453121-172.16.250.16-1369298226760/current/finalized/subdir24/subdir25/blk_1766998327682981895 2013-06-07 22:20:29,812 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Deleted block BP-471453121-172.16.250.16-1369298226760 blk_1650592414141359061_709028 at file /home/hadoop/datadir/current/BP-471453121-172.16.250.16-1369298226760/current/finalized/subdir24/subdir25/blk_1650592414141359061 2013-06-07 22:20:29,812 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Deleted block BP-471453121-172.16.250.16-1369298226760 blk_-6527697040536951940_709038 at file /home/hadoop/datadir/current/BP-471453121-172.16.250.16-1369298226760/current/finalized/subdir24/subdir25/blk_-6527697040536951940 2013-06-07 22:20:43,766 ERROR org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: Error compiling report java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: Java heap space at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222) at java.util.concurrent.FutureTask.get(FutureTask.java:83) at org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.getDiskReport(DirectoryScanner.java:468) at org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.scan(DirectoryScanner.java:349) at org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:330) at org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.run(DirectoryScanner.java:286) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2882) at
History server - Yarn
Hello, I was doing some sort of prototyping on top of YARN. I was able to launch AM and then AM in turn was able to spawn a few containers and do certain job.The yarn application terminated successfully. My question is about the history server. I think the history server is an offering from yarn , nothing specific to hadoop. I wanted to know as how to use history server in non-hadoop application based on Yarn? Or is this a part of hadoop. Thanks, Rahul
Re: Why/When partitioner is used.
There are practical applications for defining your own partitioner as well: 1) Controlling database concurrency. For instance, lets say you have a distributed datastore like HBase or even your own mysql sharding scheme. Using the default HashPartitioner, keys will get for the most part randomly distributed across your reducers. If your reduce code does database saves or gets, this could cause periods where all reducers are hitting a single database. This may be more concurrency than your database can handle, so you could use a partitioner to send all keys you know would hit Shard A to reducers 1,2,3, and and all that would hit Shard B to reducers 4,5,6. 2) I've also used partitioners when I want to do some cross-key operations such as deduping, counting, or otherwise. You can further combine the custom partitioner with your own custom comparator and grouping comparator to do many advanced operations based the application you are working on. Since a single Reducer instance is used to reduce() all tuples in a partition, being able to control exactly which records make it onto a partition is a hugely valuable tool. On Fri, Jun 7, 2013 at 10:03 AM, John Lilley john.lil...@redpoint.netwrote: There are kind of two parts to this. The semantics of MapReduce promise that all tuples sharing the same key value are sent to the same reducer, so that you can write useful MR applications that do things like “count words” or “summarize by date”. In order to accomplish that, the shuffle phase of MR performs a partitioning by key to move tuples sharing the same key to the same node where they can be processed together. You can think of key-partitioning as a strategy that assists in parallel distributed sorting. john ** ** *From:* Sai Sai [mailto:saigr...@yahoo.in] *Sent:* Friday, June 07, 2013 5:17 AM *To:* user@hadoop.apache.org *Subject:* Re: Why/When partitioner is used. ** ** I always get confused why we should partition and what is the use of it.** ** Why would one want to send all the keys starting with A to Reducer1 and B to R2 and so on... Is it just to parallelize the reduce process. Please help. Thanks Sai
Re: History server - Yarn
Hi Rahul, The job history server is currently specific to MapReduce. -Sandy On Fri, Jun 7, 2013 at 8:56 AM, Rahul Bhattacharjee rahul.rec@gmail.com wrote: Hello, I was doing some sort of prototyping on top of YARN. I was able to launch AM and then AM in turn was able to spawn a few containers and do certain job.The yarn application terminated successfully. My question is about the history server. I think the history server is an offering from yarn , nothing specific to hadoop. I wanted to know as how to use history server in non-hadoop application based on Yarn? Or is this a part of hadoop. Thanks, Rahul
Re: History server - Yarn
Thanks Sandy. On Fri, Jun 7, 2013 at 9:29 PM, Sandy Ryza sandy.r...@cloudera.com wrote: Hi Rahul, The job history server is currently specific to MapReduce. -Sandy On Fri, Jun 7, 2013 at 8:56 AM, Rahul Bhattacharjee rahul.rec@gmail.com wrote: Hello, I was doing some sort of prototyping on top of YARN. I was able to launch AM and then AM in turn was able to spawn a few containers and do certain job.The yarn application terminated successfully. My question is about the history server. I think the history server is an offering from yarn , nothing specific to hadoop. I wanted to know as how to use history server in non-hadoop application based on Yarn? Or is this a part of hadoop. Thanks, Rahul
Re: Please explain FSNamesystemState TotalLoad
Regarding TotalLoad, what would be normal operating tolerances per node for this metric? When should one become concerned? Thanks again to everyone participating in this community. :) Nick From: Suresh Srinivas sur...@hortonworks.commailto:sur...@hortonworks.com Reply-To: user@hadoop.apache.orgmailto:user@hadoop.apache.org user@hadoop.apache.orgmailto:user@hadoop.apache.org Date: Thursday, June 6, 2013 4:14 PM To: hdfs-u...@hadoop.apache.orgmailto:hdfs-u...@hadoop.apache.org user@hadoop.apache.orgmailto:user@hadoop.apache.org Subject: Re: Please explain FSNamesystemState TotalLoad It is the total number of transceivers (readers and writers) reported by all the datanodes. Datanode reports this count in periodic heartbeat to the namenode. On Thu, Jun 6, 2013 at 1:48 PM, Nick Niemeyer nnieme...@riotgames.commailto:nnieme...@riotgames.com wrote: Can someone please explain what TotalLoad represents below? Thanks for your response in advance! Version: hadoop-0.20-namenode-0.20.2+923.197-1 Example pulled from the output of via the name node: # curl -i http://localhost:50070/jmx { name : hadoop:service=NameNode,name=FSNamesystemState, modelerType : org.apache.hadoop.hdfs.server.namenode.FSNamesystem, CapacityTotal : #, CapacityUsed : #, CapacityRemaining : #, TotalLoad : #, BlocksTotal : #, FilesTotal : #, PendingReplicationBlocks : 0, UnderReplicatedBlocks : 0, ScheduledReplicationBlocks : 0, FSState : Operational } Thanks, Nick -- http://hortonworks.com/download/
Re: DirectoryScanner's OutOfMemoryError
Please see https://issues.apache.org/jira/browse/HDFS-4461. You may have to raise your heap for DN if you've accumulated a lot of blocks per DN. On Fri, Jun 7, 2013 at 8:33 PM, YouPeng Yang yypvsxf19870...@gmail.com wrote: Hi All I have found that the DirectoryScanner gets error: Error compiling report because of java.lang.OutOfMemoryError: Java heap space. The log details are as [1]: How does the error come out ,and how to solve this exception? [1] 2013-06-07 22:20:28,199 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Took 2737ms to process 1 commands from NN 2013-06-07 22:20:28,199 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Deleted block BP-471453121-172.16.250.16-1369298226760 blk_1870928037426403148_709040 at file /home/hadoop/datadir/current/BP-471453121-172.16.250.16-1369298226760/current/finalized/subdir24/subdir25/blk_1870928037426403148 2013-06-07 22:20:28,199 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Deleted block BP-471453121-172.16.250.16-1369298226760 blk_3693882010743127822_709044 at file /home/hadoop/datadir/current/BP-471453121-172.16.250.16-1369298226760/current/finalized/subdir24/subdir25/blk_3693882010743127822 2013-06-07 22:20:28,743 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Deleted block BP-471453121-172.16.250.16-1369298226760 blk_-5452984265504491579_709036 at file /home/hadoop/datadir/current/BP-471453121-172.16.250.16-1369298226760/current/finalized/subdir24/subdir25/blk_-5452984265504491579 2013-06-07 22:20:28,743 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Deleted block BP-471453121-172.16.250.16-1369298226760 blk_-1078215880381545528_709050 at file /home/hadoop/datadir/current/BP-471453121-172.16.250.16-1369298226760/current/finalized/subdir24/subdir25/blk_-1078215880381545528 2013-06-07 22:20:28,744 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Deleted block BP-471453121-172.16.250.16-1369298226760 blk_8107220088215975918_709064 at file /home/hadoop/datadir/current/BP-471453121-172.16.250.16-1369298226760/current/finalized/subdir24/subdir25/blk_8107220088215975918 2013-06-07 22:20:29,278 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Deleted block BP-471453121-172.16.250.16-1369298226760 blk_-3527717187851336238_709052 at file /home/hadoop/datadir/current/BP-471453121-172.16.250.16-1369298226760/current/finalized/subdir24/subdir25/blk_-3527717187851336238 2013-06-07 22:20:29,812 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Deleted block BP-471453121-172.16.250.16-1369298226760 blk_1766998327682981895_709042 at file /home/hadoop/datadir/current/BP-471453121-172.16.250.16-1369298226760/current/finalized/subdir24/subdir25/blk_1766998327682981895 2013-06-07 22:20:29,812 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Deleted block BP-471453121-172.16.250.16-1369298226760 blk_1650592414141359061_709028 at file /home/hadoop/datadir/current/BP-471453121-172.16.250.16-1369298226760/current/finalized/subdir24/subdir25/blk_1650592414141359061 2013-06-07 22:20:29,812 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Deleted block BP-471453121-172.16.250.16-1369298226760 blk_-6527697040536951940_709038 at file /home/hadoop/datadir/current/BP-471453121-172.16.250.16-1369298226760/current/finalized/subdir24/subdir25/blk_-6527697040536951940 2013-06-07 22:20:43,766 ERROR org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: Error compiling report java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: Java heap space at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222) at java.util.concurrent.FutureTask.get(FutureTask.java:83) at org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.getDiskReport(DirectoryScanner.java:468) at org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.scan(DirectoryScanner.java:349) at org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:330) at org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.run(DirectoryScanner.java:286) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204) at
Re: Why/When partitioner is used.
Why not also ask yourself, what if you do not send all keys to the same reducer? Would you get the results you desire that way? :) On Fri, Jun 7, 2013 at 4:47 PM, Sai Sai saigr...@yahoo.in wrote: I always get confused why we should partition and what is the use of it. Why would one want to send all the keys starting with A to Reducer1 and B to R2 and so on... Is it just to parallelize the reduce process. Please help. Thanks Sai -- Harsh J
Re: Mapreduce using JSONObjects
A side point for Hadoop experts: a comparator is used for sorting in the shuffle. If a comparator always returns -1 for unequal objects, then sorting will take longer than it should because there will be a certain amount of items that are compared more than once. Is this true? On 06/05/2013 04:10 PM, Max Lebedev wrote: I’ve taken your advice and made a wrapper class which implements WritableComparable. Thank you very much for your help. I believe everything is working fine on that front. I used google’s gson for the comparison. public int compareTo(Object o) { JsonElement o1 = PARSER.parse(this.json.toString()); JsonElement o2 = PARSER.parse(o.toString()); if(o2.equals(o1)) return 0; else return -1; } The problem I have now is that only consecutive duplicates are detected. Given 6 lines: {ts:1368758947.291035,isSecure:true,version:2,source:sdk,debug:false} {ts:1368758947.291035,version:2,source:sdk,isSecure:true,debug:false} {ts:1368758947.291035,version:2,source:sdk,isSecure:true,debug:true} {ts:1368758947.291035,isSecure:false,version:2,source:sdk,debug:false} {ts:1368758947.291035, source:sdk,isSecure:false,version:2,debug:false} {ts:1368758947.291035,isSecure:true,version:2,source:sdk,debug:false} I get back 1, 3, 4, and 6. I should be getting 1, 3 and 4 as 6 is exactly equal to 1. If I switch 5 and 6, the original line 5 is no longer filtered (I get 1,3,4,5,6). I’ve noticed that the compareTo method is called a total of 13 times. I assume that in order for all 6 of the keys to be compared, 15 comparisons need to be made. Am I missing something here? I’ve tested the compareTo manually and line 1 and 6 are interpreted as equal. My map reduce code currently looks like this: class DupFilter{ private static final Gson GSON = new Gson(); private static final JsonParser PARSER = new JsonParser(); public static class Map extends MapReduceBase implements MapperLongWritable, Text, JSONWrapper, IntWritable { public void map(LongWritable key, Text value, OutputCollectorJSONWrapper, IntWritable output, Reporter reporter) throws IOException{ JsonElement je = PARSER.parse(value.toString()); JSONWrapper jow = null; jow = new JSONWrapper(value.toString()); IntWritable one = new IntWritable(1); output.collect(jow, one); } } public static class Reduce extends MapReduceBase implements ReducerJSONWrapper, IntWritable, JSONWrapper, IntWritable { public void reduce(JSONWrapper jow, IteratorIntWritable values, OutputCollectorJSONWrapper, IntWritable output, Reporter reporter) throws IOException { int sum = 0; while (values.hasNext()) sum += values.next().get(); output.collect(jow, new IntWritable(sum)); } } public static void main(String[] args) throws Exception { JobConf conf = new JobConf(DupFilter.class); conf.setJobName(dupfilter); conf.setOutputKeyClass(JSONWrapper.class); conf.setOutputValueClass(IntWritable.class); conf.setMapperClass(Map.class); conf.setReducerClass(Reduce.class); conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(TextOutputFormat.class); FileInputFormat.setInputPaths(conf, new Path(args[0])); FileOutputFormat.setOutputPath(conf, new Path(args[1])); JobClient.runJob(conf); } } Thanks, Max Lebedev On Tue, Jun 4, 2013 at 10:58 PM, Rahul Bhattacharjee rahul.rec@gmail.com mailto:rahul.rec@gmail.com wrote: I agree with Shahab , you have to ensure that the key are writable comparable and values are writable in order to be used in MR. You can have writable comparable implementation wrapping the actual json object. Thanks, Rahul On Wed, Jun 5, 2013 at 5:09 AM, Mischa Tuffield mis...@mmt.me.uk mailto:mis...@mmt.me.uk wrote: Hello, On 4 Jun 2013, at 23:49, Max Lebedev ma...@actionx.com mailto:ma...@actionx.com wrote: Hi. I've been trying to use JSONObjects to identify duplicates in JSONStrings. The duplicate strings contain the same data, but not necessarily in the same order. For example the following two lines should be identified as duplicates (and filtered). {ts:1368758947.291035,isSecure:true,version:2,source:sdk,debug:false {ts:1368758947.291035,version:2,source:sdk,isSecure:true,debug:false} Can you not use the timestamp as a URI and emit them as URIs. Then you have your mapper emit the following kv : output.collect(ts, value); And you would have a straight forward reducer that can dedup based on the timestamps. If above doesn't work for you, I would look at the jackson library for mangling json in java. It method of using java beans for json is clean from a code pov and comes with lots of
Re: Please explain FSNamesystemState TotalLoad
On Fri, Jun 7, 2013 at 9:10 AM, Nick Niemeyer nnieme...@riotgames.comwrote: Regarding TotalLoad, what would be normal operating tolerances per node for this metric? When should one become concerned? Thanks again to everyone participating in this community. :) Why do you want to be concered :) I have not seen many issues related to high TotalLoad. This is mainly useful in terms of understanding how many concurrent jobs/file accesses are happening and how busy datanodes are. When you are debugging issues where cluster slow down due to overload, or correlating a run of big jobs, this is useful. Knowing what it represent, you would find many other uses as well. From: Suresh Srinivas sur...@hortonworks.com Reply-To: user@hadoop.apache.org user@hadoop.apache.org Date: Thursday, June 6, 2013 4:14 PM To: hdfs-u...@hadoop.apache.org user@hadoop.apache.org Subject: Re: Please explain FSNamesystemState TotalLoad It is the total number of transceivers (readers and writers) reported by all the datanodes. Datanode reports this count in periodic heartbeat to the namenode. On Thu, Jun 6, 2013 at 1:48 PM, Nick Niemeyer nnieme...@riotgames.comwrote: Can someone please explain what TotalLoad represents below? Thanks for your response in advance! Version: hadoop-0.20-namenode-0.20.2+923.197-1 Example pulled from the output of via the name node: # curl -i http://localhost:50070/jmx { name : hadoop:service=NameNode,name=FSNamesystemState, modelerType : org.apache.hadoop.hdfs.server.namenode.FSNamesystem, CapacityTotal : #, CapacityUsed : #, CapacityRemaining : #, * TotalLoad : #,* BlocksTotal : #, FilesTotal : #, PendingReplicationBlocks : 0, UnderReplicatedBlocks : 0, ScheduledReplicationBlocks : 0, FSState : Operational } Thanks, Nick -- http://hortonworks.com/download/ -- http://hortonworks.com/download/
Re: Pool slot questions
Totally agree with Shahab, just a quick answer, but detail is your homework Can we think of a job pool similar to a queue. I do think, partition the slot resource into different chunk size. FS. inside can be choose between FIFO or FAIR Queue. it's FIFO. cool thing about queue in Yarn is sub-pool check it out... Is it possible to configure a slot if so how. http://lmgtfy.com/?q=fair+scheduler+hadoop+tutorial Good luck On Jun 7, 2013, at 6:10 AM, Shahab Yunus shahab.yu...@gmail.commailto:shahab.yu...@gmail.com wrote: Sai, This is regarding all your recent emails and questions. I suggest that you read Hadoop: The Definitive Guide by Tom White (http://www.amazon.com/Hadoop-Definitive-Guide-Tom-White/dp/1449311520) as it goes through all of your queries in detail and with examples. The questions that you are asking are pretty basic and the answers are available and well documented all over the web. In parallel you can also download the code which is free and easily available and start looking into them. Regards, Shahab On Fri, Jun 7, 2013 at 8:02 AM, Sai Sai saigr...@yahoo.inmailto:saigr...@yahoo.in wrote: 1. Can we think of a job pool similar to a queue. 2. Is it possible to configure a slot if so how. Please help. Thanks Sai
Re: Mapreduce using JSONObjects
Hi again. I am attempting to compare the strings as JSON objects using hashcodes with the ultimate goal to remove duplicates. I've have implemented the following solution. 1. I parse the input line into a JsonElement using the Google JSON parser (Gson), 2. I take the hash code of the resulting JSONElement. And I use it as the Key for Key,Val output pairs. It seems to work fine. As I am new to hadoop, I just want to run this by the community. Is there some reason this wouldn't work? Thank you very much for your help For reference, here is my code: class DupFilter{ private static final JsonParser PARSER = new JsonParser(); public static class Map extends MapReduceBase implements MapperLongWritable, Text, IntWritable, Text { public void map(LongWritable key, Text value, OutputCollectorIntWritable, Text output, Reporter reporter) throws IOException{ if(value.equals(null) || value.getLength() == 0) return; JsonElement je = PARSER.parse(value.toString()); int hash = je.hashCode(); output.collect(new IntWritable(hash), value); } } public static class Reduce extends MapReduceBase implements ReducerIntWritable, Text, IntWritable, Text { public void reduce(IntWritable key, IteratorText values, OutputCollectorIntWritable, Text output, Reporter reporter) throws IOException { output.collect(key, values.next()); } } public static void main(String[] args) throws Exception { JobConf conf = new JobConf(DupFilter.class); conf.setOutputKeyClass(IntWritable.class); conf.setOutputValueClass(Text.class); conf.setMapperClass(Map.class); conf.setReducerClass(Reduce.class); conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(TextOutputFormat.class); FileInputFormat.setInputPaths(conf, new Path(args[0])); FileOutputFormat.setOutputPath(conf, new Path(args[1])); JobClient.runJob(conf); } } On Fri, Jun 7, 2013 at 1:16 PM, Lance Norskog goks...@gmail.com wrote: A side point for Hadoop experts: a comparator is used for sorting in the shuffle. If a comparator always returns -1 for unequal objects, then sorting will take longer than it should because there will be a certain amount of items that are compared more than once. Is this true? On 06/05/2013 04:10 PM, Max Lebedev wrote: I’ve taken your advice and made a wrapper class which implements WritableComparable. Thank you very much for your help. I believe everything is working fine on that front. I used google’s gson for the comparison. public int compareTo(Object o) { JsonElement o1 = PARSER.parse(this.json.toString()); JsonElement o2 = PARSER.parse(o.toString()); if(o2.equals(o1)) return 0; else return -1; } The problem I have now is that only consecutive duplicates are detected. Given 6 lines: {ts:1368758947.291035,isSecure:true,version:2,source:sdk,debug:false} {ts:1368758947.291035,version:2,source:sdk,isSecure:true,debug:false} {ts:1368758947.291035,version:2,source:sdk,isSecure:true,debug:true} {ts:1368758947.291035,isSecure:false,version:2,source:sdk,debug:false} {ts:1368758947.291035, source:sdk,isSecure:false,version:2,debug:false} {ts:1368758947.291035,isSecure:true,version:2,source:sdk,debug:false} I get back 1, 3, 4, and 6. I should be getting 1, 3 and 4 as 6 is exactly equal to 1. If I switch 5 and 6, the original line 5 is no longer filtered (I get 1,3,4,5,6). I’ve noticed that the compareTo method is called a total of 13 times. I assume that in order for all 6 of the keys to be compared, 15 comparisons need to be made. Am I missing something here? I’ve tested the compareTo manually and line 1 and 6 are interpreted as equal. My map reduce code currently looks like this: class DupFilter{ private static final Gson GSON = new Gson(); private static final JsonParser PARSER = new JsonParser(); public static class Map extends MapReduceBase implements MapperLongWritable, Text, JSONWrapper, IntWritable { public void map(LongWritable key, Text value, OutputCollectorJSONWrapper, IntWritable output, Reporter reporter) throws IOException{ JsonElement je = PARSER.parse(value.toString()); JSONWrapper jow = null; jow = new JSONWrapper(value.toString()); IntWritable one = new IntWritable(1); output.collect(jow, one); } } public static class Reduce extends MapReduceBase implements ReducerJSONWrapper, IntWritable, JSONWrapper, IntWritable { public void reduce(JSONWrapper jow, IteratorIntWritable values, OutputCollectorJSONWrapper, IntWritable output, Reporter reporter) throws IOException { int sum = 0; while (values.hasNext()) sum += values.next().get();
Job History files location of 2.0.4
Dear All, I recently moved from Hadoop0.20.2 to 2.0.4, and I am trying to find the old job history files (used to be in hdfs, output/_logs/history), it records detailed time information for every task attempts. But now it is not on hdfs anymore, I copied the entire / from hdfs to my local dir, but am not able to find this location. Could anyone give any advise on where are the files? Thanks, Boyu
Re: Job History files location of 2.0.4
See this; http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201302.mbox/%3c1360184802.61630.yahoomail...@web141205.mail.bf1.yahoo.com%3E Regards, Shahab On Fri, Jun 7, 2013 at 4:33 PM, Boyu Zhang boyuzhan...@gmail.com wrote: Dear All, I recently moved from Hadoop0.20.2 to 2.0.4, and I am trying to find the old job history files (used to be in hdfs, output/_logs/history), it records detailed time information for every task attempts. But now it is not on hdfs anymore, I copied the entire / from hdfs to my local dir, but am not able to find this location. Could anyone give any advise on where are the files? Thanks, Boyu
Re: Job History files location of 2.0.4
Thanks Shahab, I saw the link, but it is not the case for me. I copied everything from hdfs ($HADOOP_HOME/bin/hdfs dfs -copyToLocal / $local_dir). But did not see the logs. Did it work for you? Thanks, Boyu On Fri, Jun 7, 2013 at 1:52 PM, Shahab Yunus shahab.yu...@gmail.com wrote: See this; http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201302.mbox/%3c1360184802.61630.yahoomail...@web141205.mail.bf1.yahoo.com%3E Regards, Shahab On Fri, Jun 7, 2013 at 4:33 PM, Boyu Zhang boyuzhan...@gmail.com wrote: Dear All, I recently moved from Hadoop0.20.2 to 2.0.4, and I am trying to find the old job history files (used to be in hdfs, output/_logs/history), it records detailed time information for every task attempts. But now it is not on hdfs anymore, I copied the entire / from hdfs to my local dir, but am not able to find this location. Could anyone give any advise on where are the files? Thanks, Boyu
How to add and remove datanode dynamically?
How can we add and remove datanodes dynamically? means that there is a namenode and some datanodes running, in that cluster how can we add more datanodes? -- *With regards ---* *Mohammad Mustaqeem*, M.Tech (CSE) MNNIT Allahabad 9026604270
Re: Job History files location of 2.0.4
Hi Shahab, How old were they? They are new, I did the copy automatically right after the job completed, in a script. I am assuming they were from the jobs run on the older version, right? I run the job using the hadoop version 2.0.4 if this is what you mean. Or are you looking for new jobs's log that you are running after the upgrade? I am looking for new job's log, I did a fresh install of hadoop2.0.4, then run the job, then copied back the entire hdfs directory. What about the local file systems? Are the logs there still? Where should I find the logs (job logs, not daemon logs) in the local file systems? Thanks, Boyu Regards, Shahab On Fri, Jun 7, 2013 at 4:56 PM, Boyu Zhang boyuzhan...@gmail.com wrote: Thanks Shahab, I saw the link, but it is not the case for me. I copied everything from hdfs ($HADOOP_HOME/bin/hdfs dfs -copyToLocal / $local_dir). But did not see the logs. Did it work for you? Thanks, Boyu On Fri, Jun 7, 2013 at 1:52 PM, Shahab Yunus shahab.yu...@gmail.comwrote: See this; http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201302.mbox/%3c1360184802.61630.yahoomail...@web141205.mail.bf1.yahoo.com%3E Regards, Shahab On Fri, Jun 7, 2013 at 4:33 PM, Boyu Zhang boyuzhan...@gmail.comwrote: Dear All, I recently moved from Hadoop0.20.2 to 2.0.4, and I am trying to find the old job history files (used to be in hdfs, output/_logs/history), it records detailed time information for every task attempts. But now it is not on hdfs anymore, I copied the entire / from hdfs to my local dir, but am not able to find this location. Could anyone give any advise on where are the files? Thanks, Boyu
Re: Job History files location of 2.0.4
What value do you have for hadoop.log.dir property? On Fri, Jun 7, 2013 at 5:20 PM, Boyu Zhang boyuzhan...@gmail.com wrote: Hi Shahab, How old were they? They are new, I did the copy automatically right after the job completed, in a script. I am assuming they were from the jobs run on the older version, right? I run the job using the hadoop version 2.0.4 if this is what you mean. Or are you looking for new jobs's log that you are running after the upgrade? I am looking for new job's log, I did a fresh install of hadoop2.0.4, then run the job, then copied back the entire hdfs directory. What about the local file systems? Are the logs there still? Where should I find the logs (job logs, not daemon logs) in the local file systems? Thanks, Boyu Regards, Shahab On Fri, Jun 7, 2013 at 4:56 PM, Boyu Zhang boyuzhan...@gmail.com wrote: Thanks Shahab, I saw the link, but it is not the case for me. I copied everything from hdfs ($HADOOP_HOME/bin/hdfs dfs -copyToLocal / $local_dir). But did not see the logs. Did it work for you? Thanks, Boyu On Fri, Jun 7, 2013 at 1:52 PM, Shahab Yunus shahab.yu...@gmail.comwrote: See this; http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201302.mbox/%3c1360184802.61630.yahoomail...@web141205.mail.bf1.yahoo.com%3E Regards, Shahab On Fri, Jun 7, 2013 at 4:33 PM, Boyu Zhang boyuzhan...@gmail.comwrote: Dear All, I recently moved from Hadoop0.20.2 to 2.0.4, and I am trying to find the old job history files (used to be in hdfs, output/_logs/history), it records detailed time information for every task attempts. But now it is not on hdfs anymore, I copied the entire / from hdfs to my local dir, but am not able to find this location. Could anyone give any advise on where are the files? Thanks, Boyu
Re: Job History files location of 2.0.4
I used a directory that is local to every slave node: export HADOOP_LOG_DIR=/scratch/$USER/$PBS_JOBID/hadoop-$USER/log. I did not change the hadoop.job.history.user.location, I thought if I don't change this property, the job history is going to be stored in hdfs under output/_logs dir. Then after the job completes, I copied back the logs to the server. Thanks a lot, Boyu On Fri, Jun 7, 2013 at 2:32 PM, Shahab Yunus shahab.yu...@gmail.com wrote: What value do you have for hadoop.log.dir property? On Fri, Jun 7, 2013 at 5:20 PM, Boyu Zhang boyuzhan...@gmail.com wrote: Hi Shahab, How old were they? They are new, I did the copy automatically right after the job completed, in a script. I am assuming they were from the jobs run on the older version, right? I run the job using the hadoop version 2.0.4 if this is what you mean. Or are you looking for new jobs's log that you are running after the upgrade? I am looking for new job's log, I did a fresh install of hadoop2.0.4, then run the job, then copied back the entire hdfs directory. What about the local file systems? Are the logs there still? Where should I find the logs (job logs, not daemon logs) in the local file systems? Thanks, Boyu Regards, Shahab On Fri, Jun 7, 2013 at 4:56 PM, Boyu Zhang boyuzhan...@gmail.comwrote: Thanks Shahab, I saw the link, but it is not the case for me. I copied everything from hdfs ($HADOOP_HOME/bin/hdfs dfs -copyToLocal / $local_dir). But did not see the logs. Did it work for you? Thanks, Boyu On Fri, Jun 7, 2013 at 1:52 PM, Shahab Yunus shahab.yu...@gmail.comwrote: See this; http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201302.mbox/%3c1360184802.61630.yahoomail...@web141205.mail.bf1.yahoo.com%3E Regards, Shahab On Fri, Jun 7, 2013 at 4:33 PM, Boyu Zhang boyuzhan...@gmail.comwrote: Dear All, I recently moved from Hadoop0.20.2 to 2.0.4, and I am trying to find the old job history files (used to be in hdfs, output/_logs/history), it records detailed time information for every task attempts. But now it is not on hdfs anymore, I copied the entire / from hdfs to my local dir, but am not able to find this location. Could anyone give any advise on where are the files? Thanks, Boyu
Re: How to add and remove datanode dynamically?
reference Hadoop:The Definitive Guide(3rd,2012.5) p359 2013/6/8 Mohammad Mustaqeem 3m.mustaq...@gmail.com How can we add and remove datanodes dynamically? means that there is a namenode and some datanodes running, in that cluster how can we add more datanodes? -- *With regards ---* *Mohammad Mustaqeem*, M.Tech (CSE) MNNIT Allahabad 9026604270
hdfsConnect/hdfsWrite API writes conetnts of file to local system instead of HDFS system
Hi, I have created the sample program to write the contents into HDFS file system. The file gets created successfully, but unfortunately the file is getting created in Local system instead of HDFS system. Here is the source code of sample program: int main(int argc, char **argv) { const char* writePath = /user/testuser/test1.txt; const char *tuser = root; hdfsFS fs = NULL; int exists = 0; fs = hdfsConnectAsUser(default, 0, tuser); if(fs == NULL ) { fprintf(stderr, Oops! Failed to connect to hdfs!\n); exit(-1); } hdfsFile writeFile = hdfsOpenFile(fs, writePath, O_WRONLY|O_CREAT, 0, 0, 0); if(!writeFile) { fprintf(stderr, Failed to open %s for writing!\n, writePath); exit(-1); } fprintf(stderr, Opened %s for writing successfully...\n, writePath); char* buffer = Hello, World!; tSize num_written_bytes = hdfsWrite(fs, writeFile, (void*)buffer, strlen(buffer)+1); fprintf(stderr, Wrote %d bytes\n, num_written_bytes); fprintf(stderr, Flushed %s successfully!\n, writePath); hdfsCloseFile(fs, writeFile); } CLASSPATH /usr/lib/hadoop/lib/activation-1.1.jar:/usr/lib/hadoop/lib/asm-3.2.jar:/usr/lib/hadoop/lib/avro-1.7.3.jar:/usr/lib/hadoop/lib/commons-beanutils-1.7.0.jar: /usr/lib/hadoop/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop/lib/commons-cli-1.2.jar:/usr/lib/hadoop/lib/commons-codec-1.4.jar: /usr/lib/hadoop/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop/lib/commons-configuration-1.6.jar:/usr/lib/hadoop/lib/commons-digester-1.8.jar: /usr/lib/hadoop/lib/commons-el-1.0.jar:/usr/lib/hadoop/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop/lib/commons-io-2.1.jar:/usr/lib/hadoop/lib/commons-lang-2.5.jar: /usr/lib/hadoop/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop/lib/commons-math-2.1.jar:/usr/lib/hadoop/lib/commons-net-3.1.jar:/usr/lib/hadoop/lib/guava-11.0.2.jar: /usr/lib/hadoop/lib/hue-plugins-2.2.0-cdh4.2.0.jar:/usr/lib/hadoop/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop/lib/jackson-jaxrs-1.8.8.jar: /usr/lib/hadoop/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop/lib/jackson-xc-1.8.8.jar: /usr/lib/hadoop/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop/lib/jaxb-impl-2.2.3-1.jar: /usr/lib/hadoop/lib/jersey-core-1.8.jar:/usr/lib/hadoop/lib/jersey-json-1.8.jar:/usr/lib/hadoop/lib/jersey-server-1.8.jar:/usr/lib/hadoop/lib/jets3t-0.6.1.jar:/usr/lib/hadoop/lib/jettison-1.1.jar: /usr/lib/hadoop/lib/jetty-6.1.26.cloudera.2.jar: /usr/lib/hadoop/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/jline-0.9.94.jar:/usr/lib/hadoop/lib/jsch-0.1.42.jar: /usr/lib/hadoop/lib/jsp-api-2.1.jar:/usr/lib/hadoop/lib/jsr305-1.3.9.jar:/usr/lib/hadoop/lib/junit-4.8.2.jar:/usr/lib/hadoop/lib/kfs-0.3.jar:/usr/lib/hadoop/lib/log4j-1.2.17.jar: /usr/lib/hadoop/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop/lib/paranamer-2.3.jar:/usr/lib/hadoop/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop/lib/servlet-api-2.5.jar: /usr/lib/hadoop/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop/lib/slf4j-log4j12-1.6.1.jar:/usr/lib/hadoop/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop/lib/stax-api-1.0.1.jar: /usr/lib/hadoop/lib/xmlenc-0.52.jar:/usr/lib/hadoop/lib/zookeeper-3.4.5-cdh4.2.0.jar:/usr/lib/hadoop/hadoop-annotations.jar:/usr/lib/hadoop/hadoop-auth-2.0.0-cdh4.2.0.jar: /usr/lib/hadoop/hadoop-auth.jar:/usr/lib/hadoop/hadoop-common-2.0.0-hdfs_site.xmlhttp://hadoop.6.n7.nabble.com/file/n69039/hdfs_site.xmlcore-site.xmlhttp://hadoop.6.n7.nabble.com/file/n69039/core-site.xmlcdh4.2.0.jar:/usr/lib/hadoop/hadoop-common-2.0.0-cdh4.2.0-tests.jar:/usr/lib/hadoop/hadoop-common.jar: /usr/lib/hadoop/etc/hadoop/yarn-site.xml:/usr/lib/hadoop/etc/hadoop/core-site.xml:/usr/lib/hadoop/etc/hadoop/hadoop-metrics.properties:/usr/lib/hadoop/etc/hadoop/hdfs-site.xml: /usr/lib/hadoop/etc/hadoop/mapred-site.xml Please find attached the hdfs_site.xml and core_site.xml. Regards, Dayakar ?xml version=1.0? !-- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the License); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an