Unsubscribe
Using libhdfspp
Hi, I have downloaded libhdfspp from the following link, and compiled it. https://github.com/apache/hadoop I found that some functions like hdfsWrite, hdfsHSync are not defined in this library. Also, when I was trying to replace the old libhdfs.so with this new library I am seeing some exceptions and hangs. Is anyone using this libhdfspp? Please let me know. Thanks, Kishore
Using compression support available in native libraries
Hi, I would like to use compression support available in native libraries. I have tried to google but couldn't know how can I use the LZ4 compression algorithm for compressing the data I am writing to the HDFS files I am writing from my C++ code? Is it possible to get the data written in compression mode automatically without having to compress it in our code? Like, select some compression option while opening or creating an HDFS file from our code, and just write to it using hdfsWrite() and can it underneath take care of compressing it before writing to the disk/file system? Thanks, Kishore
Socket Timeout Exception while multiple concurrent applications are reading HDFS data through WebHDFS interface
Hi, We are seeing this SocketTImeout exception while a number of concurrent applications (probably, 50 of them) are trying to read HDFS data through WebHDFS interface. Are there any parameters we can tune so it doesn't happen? An exception occurred: java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.read(SocketInputStream.java:163) at java.net.SocketInputStream.read(SocketInputStream.java:133) at org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:166) at org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:90) at org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:281) at org.apache.http.impl.conn.LoggingSessionInputBuffer.readLine(LoggingSessionInputBuffer.java:115) at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:92) at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:62) at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:254) at org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:289) at org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:252) at org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:191) at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:300) at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:127) at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:715) at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:520) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784) at com.ibm.iis.cc.filesystem.impl.webhdfs.WebHDFS.appendFromBuffer(WebHDFS.java:306) at com.ibm.iis.cc.filesystem.impl.webhdfs.WebHDFS.writeFromStream(WebHDFS.java:198) at com.ibm.iis.cc.filesystem.AbstractFileSystem.writeFromStream(AbstractFileSystem.java:45) at com.ibm.iis.cc.filesystem.FileSystem$Uploader.call(FileSystem.java:3393) at com.ibm.iis.cc.filesystem.FileSystem$Uploader.call(FileSystem.java:3358) at java.util.concurrent.FutureTask.run(FutureTask.java:273) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1176) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641) at java.lang.Thread.run(Thread.java:853) We have tried increasing the values of these parameters, but there is no change. 1) dfs.datanode.handler.count 2) dfs.client.socket-timeout (the new parameter to define the socket timeout) 3) dfs.socket.timeout (the deprecated parameter) 4) dfs.datanode.socket.write.timeout Thanks, Kishore
Re: ETL/DW to Hadoop migrations
Abhishek, Are you looking for loading your data into Hadoop? if yes, IBM DataStage has a stage called BDFS that loads/writes your data into Hadoop. Thanks, Kishore On Tue, Sep 8, 2015 at 1:29 AM, <23singhabhis...@gmail.com> wrote: > Hi guys, > > I am looking for pointers on migrating existing data warehouse to Hadoop. > Currently, we are using IBM Data stage an ETL tool and loading into > Teradata staging/maintain tables. Please suggest an architecture which > reduces cost without much degrade in performance. Has anyone of you been a > part of such migration before? If yes then please provide some inputs, > especially on what aspects should we be taking care of. Talking about > source data, it is mainly in the form of flat files and database. > > Thanks in advance. > > Regards, > > Abhishek Singh >
Re: Question about Block size configuration
The default HDFS block size 64 MB means, it is the maximum size of block of data written on HDFS. So, if you write 4 MB files, they will still be occupying only 1 block of 4 MB size, not more than that. If your file is more than 64MB, it gets split into multiple blocks. If you set the HDFS block size to 2MB, then your 4 MB file will get split into two blocks. On Tue, May 12, 2015 at 8:38 AM, Himawan Mahardianto mahardia...@ugm.ac.id wrote: Hi guys, I have a couple question about HDFS block size: What if I set my HDFS block size from default 64 MB to 2 MB each block, what will gonna happen? I decrease the value of a block size because I want to store an image file (jpeg, png etc) that have size about 4MB each file, what is your opinion or suggestion? What will gonna happen if i don't change the default size of a block size, then I store an image file with 4MB size, will Hadoop use full 64MB block, or it will create 4Mb block instead 64MB? How much memory used on RAM to store each block if my block size is 64MB, or my block size is 4MB? Is there anyone have experience with this? Any suggestion are welcome Thank you
Apache Slider stop function not working
Hi, I am not aware of any Slider specific group, so I am posting it here. We are using Apache Slider 0.60 and implemented the management operations start, status, stop, etc. in python script. Everything else is working but the stop function is not getting invoked when the container is stopped. Is this a known issue already? or is there any trick to make it work? Thanks, Kishore
Re: 100% CPU consumption by Resource Manager process
Thanks Wangda, I think I have reduced this when I was trying to reduce the container allocation time. -Kishore On Tue, Aug 19, 2014 at 7:39 AM, Wangda Tan wheele...@gmail.com wrote: Hi Krishna, 4) What's the yarn.resourcemanager.nodemanagers.heartbeat-interval-ms in your configuration? 50 I think this config is problematic, too small heartbeat-interval will cause NM contact RM too often. I would suggest you can set this value larger like 1000. Thanks, Wangda On Wed, Aug 13, 2014 at 4:42 PM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi Wangda, Thanks for the reply, here are the details, please see if you could suggest anything. 1) Number of nodes and running app in the cluster 2 nodes, and I am running my own application that keeps asking for containers, a) running something on the containers, b) releasing the containers, c) ask for more containers with incremented priority value, and repeat the same process 2) What's the version of your Hadoop? apache hadoop-2.4.0 3) Have you set yarn.scheduler.capacity.schedule-asynchronously.enable=true? No 4) What's the yarn.resourcemanager.nodemanagers.heartbeat-interval-ms in your configuration? 50 On Tue, Aug 12, 2014 at 12:44 PM, Wangda Tan wheele...@gmail.com wrote: Hi Krishna, To get more understanding about the problem, could you please share following information: 1) Number of nodes and running app in the cluster 2) What's the version of your Hadoop? 3) Have you set yarn.scheduler.capacity.schedule-asynchronously.enable=true? 4) What's the yarn.resourcemanager.nodemanagers.heartbeat-interval-ms in your configuration? Thanks, Wangda Tan On Sun, Aug 10, 2014 at 11:29 PM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi, My YARN resource manager is consuming 100% CPU when I am running an application that is running for about 10 hours, requesting as many as 27000 containers. The CPU consumption was very low at the starting of my application, and it gradually went high to over 100%. Is this a known issue or are we doing something wrong? Every dump of the EVent Processor thread is running LeafQueue::assignContainers() specifically the for loop below from LeafQueue.java and seems to be looping through some priority list. // Try to assign containers to applications in order for (FiCaSchedulerApp application : activeApplications) { ... // Schedule in priority order for (Priority priority : application.getPriorities()) { 3XMTHREADINFO ResourceManager Event Processor J9VMThread:0x01D08600, j9thread_t:0x7F032D2FAA00, java/lang/Thread:0x8341D9A0, state:CW, prio=5 3XMJAVALTHREAD(java/lang/Thread getId:0x1E, isDaemon:false) 3XMTHREADINFO1(native thread ID:0x4B64, native priority:0x5, native policy:UNKNOWN) 3XMTHREADINFO2(native stack address range from:0x7F0313DF8000, to:0x7F0313E39000, size:0x41000) 3XMCPUTIME *CPU usage total: 42334.614623696 secs* 3XMHEAPALLOC Heap bytes allocated since last GC cycle=20456 (0x4FE8) 3XMTHREADINFO3 Java callstack: 4XESTACKTRACEat org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.assignContainers(LeafQueue.java:850(Compiled Code)) 5XESTACKTRACE (entered lock: org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp@0x8360DFE0, entry count: 1) 5XESTACKTRACE (entered lock: org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue@0x833B9280, entry count: 1) 4XESTACKTRACEat org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainersToChildQueues(ParentQueue.java:655(Compiled Code)) 5XESTACKTRACE (entered lock: org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x83360A80, entry count: 2) 4XESTACKTRACEat org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainers(ParentQueue.java:569(Compiled Code)) 5XESTACKTRACE (entered lock: org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x83360A80, entry count: 1) 4XESTACKTRACEat org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:831(Compiled Code)) 5XESTACKTRACE (entered lock: org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler@0x834037C8, entry count: 1) 4XESTACKTRACEat org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:878(Compiled Code)) 4XESTACKTRACEat org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle
Re: 100% CPU consumption by Resource Manager process
Hi Wangda, Thanks for the reply, here are the details, please see if you could suggest anything. 1) Number of nodes and running app in the cluster 2 nodes, and I am running my own application that keeps asking for containers, a) running something on the containers, b) releasing the containers, c) ask for more containers with incremented priority value, and repeat the same process 2) What's the version of your Hadoop? apache hadoop-2.4.0 3) Have you set yarn.scheduler.capacity.schedule-asynchronously.enable=true? No 4) What's the yarn.resourcemanager.nodemanagers.heartbeat-interval-ms in your configuration? 50 On Tue, Aug 12, 2014 at 12:44 PM, Wangda Tan wheele...@gmail.com wrote: Hi Krishna, To get more understanding about the problem, could you please share following information: 1) Number of nodes and running app in the cluster 2) What's the version of your Hadoop? 3) Have you set yarn.scheduler.capacity.schedule-asynchronously.enable=true? 4) What's the yarn.resourcemanager.nodemanagers.heartbeat-interval-ms in your configuration? Thanks, Wangda Tan On Sun, Aug 10, 2014 at 11:29 PM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi, My YARN resource manager is consuming 100% CPU when I am running an application that is running for about 10 hours, requesting as many as 27000 containers. The CPU consumption was very low at the starting of my application, and it gradually went high to over 100%. Is this a known issue or are we doing something wrong? Every dump of the EVent Processor thread is running LeafQueue::assignContainers() specifically the for loop below from LeafQueue.java and seems to be looping through some priority list. // Try to assign containers to applications in order for (FiCaSchedulerApp application : activeApplications) { ... // Schedule in priority order for (Priority priority : application.getPriorities()) { 3XMTHREADINFO ResourceManager Event Processor J9VMThread:0x01D08600, j9thread_t:0x7F032D2FAA00, java/lang/Thread:0x8341D9A0, state:CW, prio=5 3XMJAVALTHREAD(java/lang/Thread getId:0x1E, isDaemon:false) 3XMTHREADINFO1(native thread ID:0x4B64, native priority:0x5, native policy:UNKNOWN) 3XMTHREADINFO2(native stack address range from:0x7F0313DF8000, to:0x7F0313E39000, size:0x41000) 3XMCPUTIME *CPU usage total: 42334.614623696 secs* 3XMHEAPALLOC Heap bytes allocated since last GC cycle=20456 (0x4FE8) 3XMTHREADINFO3 Java callstack: 4XESTACKTRACEat org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.assignContainers(LeafQueue.java:850(Compiled Code)) 5XESTACKTRACE (entered lock: org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp@0x8360DFE0, entry count: 1) 5XESTACKTRACE (entered lock: org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue@0x833B9280, entry count: 1) 4XESTACKTRACEat org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainersToChildQueues(ParentQueue.java:655(Compiled Code)) 5XESTACKTRACE (entered lock: org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x83360A80, entry count: 2) 4XESTACKTRACEat org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainers(ParentQueue.java:569(Compiled Code)) 5XESTACKTRACE (entered lock: org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x83360A80, entry count: 1) 4XESTACKTRACEat org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:831(Compiled Code)) 5XESTACKTRACE (entered lock: org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler@0x834037C8, entry count: 1) 4XESTACKTRACEat org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:878(Compiled Code)) 4XESTACKTRACEat org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:100(Compiled Code)) 4XESTACKTRACEat org/apache/hadoop/yarn/server/resourcemanager/ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591) 4XESTACKTRACEat java/lang/Thread.run(Thread.java:853) 3XMTHREADINFO ResourceManager Event Processor J9VMThread:0x01D08600, j9thread_t:0x7F032D2FAA00, java/lang/Thread:0x8341D9A0, state:CW, prio=5 3XMJAVALTHREAD(java/lang/Thread getId:0x1E, isDaemon:false) 3XMTHREADINFO1(native thread ID:0x4B64, native priority
Re: Negative value given by getVirtualCores() or getAvailableResources()
Hi Wangda, I was actually wondering why should it give me -ve value for vcores when I call getAvailableResources(). Thanks, Kishore On Tue, Aug 12, 2014 at 12:50 PM, Wangda Tan wheele...@gmail.com wrote: By default, vcore = 1 for each resource request. If you don't like this behavior, you can set yarn.scheduler.minimum-allocation-vcores=0 Hope this helps, Wangda Tan On Thu, Aug 7, 2014 at 7:13 PM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi, I am calling getAvailableResources() on AMRMClientAsync and getting -ve value for the number virtual cores as below. Is there something wrong? memory:16110, vCores:-2. I have set the vcores in my yarn-site.xml like this, and just ran an application that requires two containers other than the Application Master's container. In the ContainerRequest setup from my ApplicationMaster, I haven't set anything for virtual cores, means I didn't call setVirtualCores() at all. So, I think it shouldn't be showing a -ve value for the vcores, when I call getAvailableResources(), am I wrong? description Number of CPU cores that can be allocated for containers. /description name yarn.nodemanager.resource.cpu-vcores /name value 4 /value /property Thanks, Kishore
100% CPU consumption by Resource Manager process
Hi, My YARN resource manager is consuming 100% CPU when I am running an application that is running for about 10 hours, requesting as many as 27000 containers. The CPU consumption was very low at the starting of my application, and it gradually went high to over 100%. Is this a known issue or are we doing something wrong? Every dump of the EVent Processor thread is running LeafQueue::assignContainers() specifically the for loop below from LeafQueue.java and seems to be looping through some priority list. // Try to assign containers to applications in order for (FiCaSchedulerApp application : activeApplications) { ... // Schedule in priority order for (Priority priority : application.getPriorities()) { 3XMTHREADINFO ResourceManager Event Processor J9VMThread:0x01D08600, j9thread_t:0x7F032D2FAA00, java/lang/Thread:0x8341D9A0, state:CW, prio=5 3XMJAVALTHREAD(java/lang/Thread getId:0x1E, isDaemon:false) 3XMTHREADINFO1(native thread ID:0x4B64, native priority:0x5, native policy:UNKNOWN) 3XMTHREADINFO2(native stack address range from:0x7F0313DF8000, to:0x7F0313E39000, size:0x41000) 3XMCPUTIME *CPU usage total: 42334.614623696 secs* 3XMHEAPALLOC Heap bytes allocated since last GC cycle=20456 (0x4FE8) 3XMTHREADINFO3 Java callstack: 4XESTACKTRACEat org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.assignContainers(LeafQueue.java:850(Compiled Code)) 5XESTACKTRACE (entered lock: org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp@0x8360DFE0, entry count: 1) 5XESTACKTRACE (entered lock: org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue@0x833B9280, entry count: 1) 4XESTACKTRACEat org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainersToChildQueues(ParentQueue.java:655(Compiled Code)) 5XESTACKTRACE (entered lock: org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x83360A80, entry count: 2) 4XESTACKTRACEat org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainers(ParentQueue.java:569(Compiled Code)) 5XESTACKTRACE (entered lock: org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x83360A80, entry count: 1) 4XESTACKTRACEat org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:831(Compiled Code)) 5XESTACKTRACE (entered lock: org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler@0x834037C8, entry count: 1) 4XESTACKTRACEat org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:878(Compiled Code)) 4XESTACKTRACEat org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:100(Compiled Code)) 4XESTACKTRACEat org/apache/hadoop/yarn/server/resourcemanager/ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591) 4XESTACKTRACEat java/lang/Thread.run(Thread.java:853) 3XMTHREADINFO ResourceManager Event Processor J9VMThread:0x01D08600, j9thread_t:0x7F032D2FAA00, java/lang/Thread:0x8341D9A0, state:CW, prio=5 3XMJAVALTHREAD(java/lang/Thread getId:0x1E, isDaemon:false) 3XMTHREADINFO1(native thread ID:0x4B64, native priority:0x5, native policy:UNKNOWN) 3XMTHREADINFO2(native stack address range from:0x7F0313DF8000, to:0x7F0313E39000, size:0x41000) 3XMCPUTIME CPU usage total: 42379.604203548 secs 3XMHEAPALLOC Heap bytes allocated since last GC cycle=57280 (0xDFC0) 3XMTHREADINFO3 Java callstack: 4XESTACKTRACEat org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.assignContainers(LeafQueue.java:841(Compiled Code)) 5XESTACKTRACE (entered lock: org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp@0x8360DFE0, entry count: 1) 5XESTACKTRACE (entered lock: org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue@0x833B9280, entry count: 1) 4XESTACKTRACEat org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainersToChildQueues(ParentQueue.java:655(Compiled Code)) 5XESTACKTRACE (entered lock: org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x83360A80, entry count: 2) 4XESTACKTRACEat
Negative value given by getVirtualCores() or getAvailableResources()
Hi, I am calling getAvailableResources() on AMRMClientAsync and getting -ve value for the number virtual cores as below. Is there something wrong? memory:16110, vCores:-2. I have set the vcores in my yarn-site.xml like this, and just ran an application that requires two containers other than the Application Master's container. In the ContainerRequest setup from my ApplicationMaster, I haven't set anything for virtual cores, means I didn't call setVirtualCores() at all. So, I think it shouldn't be showing a -ve value for the vcores, when I call getAvailableResources(), am I wrong? description Number of CPU cores that can be allocated for containers. /description name yarn.nodemanager.resource.cpu-vcores /name value 4 /value /property Thanks, Kishore
How to check what is the log directory for container logs
Hi, Is there a way to check what is the log directory for container logs in my currently running instance of YARN from the command line, I mean using the yarn command or hadoop command or so? Thanks, Kishore
Re: priority in the container request
Thanks Vinod for the quick answer, it seems to be working when I am requesting all containers with the same specification, but not when I have multiple container requests with different host names specified. Is this expected behavior? Kishore On Mon, Jun 9, 2014 at 10:51 PM, Vinod Kumar Vavilapalli vino...@hortonworks.com wrote: Yes, priorities are assigned to ResourceRequests and you can ask multiple containers at the same priority level. You may not get all the containers together as today's scheduler lacks gang functionality. +Vinod On Jun 9, 2014, at 12:08 AM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi, Can we give the same value for priority when requesting multiple containers from the Application Master? Basically, I need all of those containers at the same time, and I am requesting them at the same time. So, I am thinking if we can do that? Thanks, Kishore -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
priority in the container request
Hi, Can we give the same value for priority when requesting multiple containers from the Application Master? Basically, I need all of those containers at the same time, and I am requesting them at the same time. So, I am thinking if we can do that? Thanks, Kishore
Getting the name of the host on which Application Master is launched
Hi, Is there a way to get the name of the host where the AM is launched? I have seen that there is a method getHost() in the ApplicationReport that we get in YarnClient, but it is giving null. Is there a way to make it work? or is there any other way to get the host name? 2014-05-09 04:36:05 YC00016 INFO PXYarnClient: Got application report from ASM for , appId: 3 , appDiagnostics: null , appMasterHost: null,clientToAMToken: null ,appQueue: default, appMasterRpcPort: 0 appStartTime: 1,399,628,163,358 , yarnAppState: ACCEPTED ,DistributedFinalState: UNDEFINED , appTrackingUrl: isredeng:8088/proxy/application_1399626434313_0003/, appUser: kbonagir (monitorApplication) Thanks, Kishore
Specifying a node/host name on which the Application Master should run
Hi, Is there a way to specify a host name on which we want to run our application master. Can we do this when it is being launched from the YarnClient? Thanks, Kishore
Re: Cleanup activity on YARN containers
Hi Rohith, Thanks for the reply. Mine is a YARN application. I have some files that are local to where the containers run on, and I want to clean them up at the end of the container execution. So, I want to do this cleanup on the same node my container ran on. With what you are suggesting, I can't delete the files local to the container. Is there any other way? Thanks, Kishore On Tue, Apr 8, 2014 at 8:55 AM, Rohith Sharma K S rohithsharm...@huawei.com wrote: Hi Kishore, Is jobs are submitted through MapReduce or Is it Yarn Application? 1. For MapReduce Framwork, framework itself provides facility to clean up per task level. Is there any callback kind of facility, in which I can write some code to be executed on my container at the end of my application or *at the end of that particular container execution?* You can override setup() and cleanup() for doing initialization and cleanup of your task. This facility is provided by MapReduce framework. The call flow of task execution is The framework first calls setup(org.apache.hadoop.mapreduce.Mapper.Context), followed by map(Object, Object, Context) / reduce(Object, Iterable, Context) for each key/value pair. Finally cleanup(Context) is called. Note : In clean up, do not hold container for more than mapreduce.task.timeout. Because, once map/reduce is completed, progress will not be sent to applicationmaster(ping is not considered as status update). If your application is taking more than value configured for mapreduce.task.timeout, then application master consider this task as timedout. In such case, you need to increase value for mapreduce.task.timeout based on your cleanup time. 2. For Yarn Application, completed container's list are sent to ApplicationMaster in heartbeat. Here you can do clean up activities for containers. Hope this will help for you. J!! Thanks Regards Rohith Sharma K S *From:* Krishna Kishore Bonagiri [mailto:write2kish...@gmail.com] *Sent:* 07 April 2014 16:41 *To:* user@hadoop.apache.org *Subject:* Cleanup activity on YARN containers Hi, Is there any callback kind of facility, in which I can write some code to be executed on my container at the end of my application or at the end of that particular container execution? I want to do some cleanup activities at the end of my application, and the clean up is not related to the localized resources that are downloaded from HDFS. Thanks, Kishore
Reuse of YARN container
Hi, Does this JIRA issue mean that we can't currently reuse a container for running/launching two different processes one after another? https://issues.apache.org/jira/browse/YARN-373 If that is true, are there any plans for making that possible? Thanks, Kishore
Re: Cleanup activity on YARN containers
Hi Rohith, Is there something like shutdown hook for containers? Can you please also tell me how to use that? Thanks, Kishore On Wed, Apr 9, 2014 at 8:34 AM, Rohith Sharma K S rohithsharm...@huawei.com wrote: For local container clean up, can be cleaned at ShutDownHook. !!?? Thanks Regards Rohith Sharma K S *From:* Krishna Kishore Bonagiri [mailto:write2kish...@gmail.com] *Sent:* 08 April 2014 20:01 *To:* user@hadoop.apache.org *Subject:* Re: Cleanup activity on YARN containers Hi Rohith, Thanks for the reply. Mine is a YARN application. I have some files that are local to where the containers run on, and I want to clean them up at the end of the container execution. So, I want to do this cleanup on the same node my container ran on. With what you are suggesting, I can't delete the files local to the container. Is there any other way? Thanks, Kishore On Tue, Apr 8, 2014 at 8:55 AM, Rohith Sharma K S rohithsharm...@huawei.com wrote: Hi Kishore, Is jobs are submitted through MapReduce or Is it Yarn Application? 1. For MapReduce Framwork, framework itself provides facility to clean up per task level. Is there any callback kind of facility, in which I can write some code to be executed on my container at the end of my application or *at the end of that particular container execution?* You can override setup() and cleanup() for doing initialization and cleanup of your task. This facility is provided by MapReduce framework. The call flow of task execution is The framework first calls setup(org.apache.hadoop.mapreduce.Mapper.Context), followed by map(Object, Object, Context) / reduce(Object, Iterable, Context) for each key/value pair. Finally cleanup(Context) is called. Note : In clean up, do not hold container for more than mapreduce.task.timeout. Because, once map/reduce is completed, progress will not be sent to applicationmaster(ping is not considered as status update). If your application is taking more than value configured for mapreduce.task.timeout, then application master consider this task as timedout. In such case, you need to increase value for mapreduce.task.timeout based on your cleanup time. 2. For Yarn Application, completed container's list are sent to ApplicationMaster in heartbeat. Here you can do clean up activities for containers. Hope this will help for you. J!! Thanks Regards Rohith Sharma K S *From:* Krishna Kishore Bonagiri [mailto:write2kish...@gmail.com] *Sent:* 07 April 2014 16:41 *To:* user@hadoop.apache.org *Subject:* Cleanup activity on YARN containers Hi, Is there any callback kind of facility, in which I can write some code to be executed on my container at the end of my application or at the end of that particular container execution? I want to do some cleanup activities at the end of my application, and the clean up is not related to the localized resources that are downloaded from HDFS. Thanks, Kishore
Cleanup activity on YARN containers
Hi, Is there any callback kind of facility, in which I can write some code to be executed on my container at the end of my application or at the end of that particular container execution? I want to do some cleanup activities at the end of my application, and the clean up is not related to the localized resources that are downloaded from HDFS. Thanks, Kishore
Value for yarn.nodemanager.address in configuration file
Hi, This is regarding a single node cluster setup If I have a value of 0.0.0.0:8050 for yarn.nodemanager.address in the configuration file yarn-site.xml/yarn-default.xml is it mandatory requirement that ssh 0.0.0.0 should work on my machine for being able to start YARN? Or will I be able to start the daemons without that ssh 0.0.0.0 working as well? Thanks, Kishore
Re: Node manager or Resource Manager crash
Vinod, One more observation I can share is that all the times the NM or RM is getting killed, I see the following kind of messages in the NM's log 2014-03-05 05:33:23,824 DEBUG org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Node's health-status : true, 2014-03-05 05:33:23,824 DEBUG org.apache.hadoop.ipc.Client: IPC Client (2132631259) connection to isredeng/9.70.137.184:8031 from kbonagir sending #5391 2014-03-05 05:33:23,826 DEBUG org.apache.hadoop.ipc.Client: IPC Client (2132631259) connection to isredeng/9.70.137.184:8031 from kbonagir got value #5391 2014-03-05 05:33:23,826 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine: Call: nodeHeartbeat took 2ms Does that give any clue? Something going wrong while it is getting a node's health? Thanks, Kishore On Tue, Mar 4, 2014 at 10:51 PM, Vinod Kumar Vavilapalli vino...@apache.org wrote: I remember you asking this question before. Check if your OS' OOM killer is killing it. +Vinod On Mar 4, 2014, at 6:53 AM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi, I am running an application on a 2-node cluster, which tries to acquire all the containers that are available on one of those nodes and remaining containers from the other node in the cluster. When I run this application continuously in a loop, one of the NM or RM is getting killed at a random point. There is no corresponding message in the log files. One of the times that NM had got killed today, the tail of the it's log is like this: 2014-03-04 02:42:44,386 DEBUG org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: isredeng:52867 sending out status for 16 containers 2014-03-04 02:42:44,386 DEBUG org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Node's health-status : true, And at the time of NM's crash, the RM's log has the following entries: 2014-03-04 02:42:40,371 DEBUG org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Processing isredeng:52867 of type STATUS_UPDATE 2014-03-04 02:42:40,371 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher: Dispatching the event org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.NodeUpdateSchedulerEvent.EventType: NODE_UPDATE 2014-03-04 02:42:40,371 DEBUG org.apache.hadoop.ipc.Server: IPC Server Responder: responding to org.apache.hadoop.yarn.server.api.ResourceTrackerPB.nodeHeartbeat from 9.70.137.184:33696 Call#14060 Retry#0 Wrote 40 bytes. 2014-03-04 02:42:40,371 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: nodeUpdate: isredeng:52867 clusterResources: memory:16384, vCores:16 2014-03-04 02:42:40,371 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Node being looked for scheduling isredeng:52867 availableResource: memory:0, vCores:-8 2014-03-04 02:42:40,393 DEBUG org.apache.hadoop.ipc.Server: got #151 Note: the name of the node on which NM has got killed is isredeng, does it indicate anything from the above message as to why it got killed? Thanks, Kishore CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Node manager or Resource Manager crash
Hi, I am running an application on a 2-node cluster, which tries to acquire all the containers that are available on one of those nodes and remaining containers from the other node in the cluster. When I run this application continuously in a loop, one of the NM or RM is getting killed at a random point. There is no corresponding message in the log files. One of the times that NM had got killed today, the tail of the it's log is like this: 2014-03-04 02:42:44,386 DEBUG org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: isredeng:52867 sending out status for 16 containers 2014-03-04 02:42:44,386 DEBUG org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Node's health-status : true, And at the time of NM's crash, the RM's log has the following entries: 2014-03-04 02:42:40,371 DEBUG org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Processing isredeng:52867 of type STATUS_UPDATE 2014-03-04 02:42:40,371 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher: Dispatching the event org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.NodeUpdateSchedulerEvent.EventType: NODE_UPDATE 2014-03-04 02:42:40,371 DEBUG org.apache.hadoop.ipc.Server: IPC Server Responder: responding to org.apache.hadoop.yarn.server.api.ResourceTrackerPB.nodeHeartbeat from 9.70.137.184:33696 Call#14060 Retry#0 Wrote 40 bytes. 2014-03-04 02:42:40,371 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: nodeUpdate: isredeng:52867 clusterResources: memory:16384, vCores:16 2014-03-04 02:42:40,371 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Node being looked for scheduling isredeng:52867 availableResource: memory:0, vCores:-8 2014-03-04 02:42:40,393 DEBUG org.apache.hadoop.ipc.Server: got #151 Note: the name of the node on which NM has got killed is isredeng, does it indicate anything from the above message as to why it got killed? Thanks, Kishore
Re: Node manager or Resource Manager crash
Yes Vinod, I was asking this question sometime back, and I got back to resolve the issue again. I tried to see if the OOM is killing but it is not. I have checked the free swap space on my box while my test is going on, but it doesn't seem to be the issue. Also, I have verified if OOM score is going high for any of these process because that is when OOM killer kills them, but they are not going high too. Thanks, Kishore On Tue, Mar 4, 2014 at 10:51 PM, Vinod Kumar Vavilapalli vino...@apache.org wrote: I remember you asking this question before. Check if your OS' OOM killer is killing it. +Vinod On Mar 4, 2014, at 6:53 AM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi, I am running an application on a 2-node cluster, which tries to acquire all the containers that are available on one of those nodes and remaining containers from the other node in the cluster. When I run this application continuously in a loop, one of the NM or RM is getting killed at a random point. There is no corresponding message in the log files. One of the times that NM had got killed today, the tail of the it's log is like this: 2014-03-04 02:42:44,386 DEBUG org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: isredeng:52867 sending out status for 16 containers 2014-03-04 02:42:44,386 DEBUG org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Node's health-status : true, And at the time of NM's crash, the RM's log has the following entries: 2014-03-04 02:42:40,371 DEBUG org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Processing isredeng:52867 of type STATUS_UPDATE 2014-03-04 02:42:40,371 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher: Dispatching the event org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.NodeUpdateSchedulerEvent.EventType: NODE_UPDATE 2014-03-04 02:42:40,371 DEBUG org.apache.hadoop.ipc.Server: IPC Server Responder: responding to org.apache.hadoop.yarn.server.api.ResourceTrackerPB.nodeHeartbeat from 9.70.137.184:33696 Call#14060 Retry#0 Wrote 40 bytes. 2014-03-04 02:42:40,371 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: nodeUpdate: isredeng:52867 clusterResources: memory:16384, vCores:16 2014-03-04 02:42:40,371 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Node being looked for scheduling isredeng:52867 availableResource: memory:0, vCores:-8 2014-03-04 02:42:40,393 DEBUG org.apache.hadoop.ipc.Server: got #151 Note: the name of the node on which NM has got killed is isredeng, does it indicate anything from the above message as to why it got killed? Thanks, Kishore CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
YARN -- Debug messages in logs
Hi, How can I get the debug log messages from RM and other daemons? For example, Currently I could see messages from LOG.info() only, i.e. something like this: LOG.info(event.getContainerId() + Container Transitioned from + oldState + to + getState()); How can I get those from LOG.debug() ? I mean the following kind of messages ... LOG.debug(Processing + event.getContainerId() + of type + event.getType()); Thanks, Kishore
Re: Yarn - specify hosts in ContainerRequest
Hi Anand, Which version of Hadoop are you using? It works from 2.2.0 Try like this, and it should work. I am using this feature on 2.2.0 String[] hosts = new String[1]; hosts[0] = node_name; ContainerRequest request = new ContainerRequest(capability, hosts, null, p, false); Thanks, Kishore On Fri, Feb 14, 2014 at 11:43 PM, Anand Mundada anandmund...@ymail.comwrote: Hi All, How can I launch container on a particular host? I tried specifying host name in *new ContainerRequest()* Thanks, Anand
Re: Can we avoid restarting of AM when it fails?
Thanks Harsh, I got it. On Sat, Feb 8, 2014 at 7:33 PM, Harsh J ha...@cloudera.com wrote: Correction: Set it to 1 (For 1 max attempt), not 0. On Sat, Feb 8, 2014 at 7:31 PM, Harsh J ha...@cloudera.com wrote: You can set http://hadoop.apache.org/docs/current/api/org/apache/hadoop/yarn/api/records/ApplicationSubmissionContext.html#setMaxAppAttempts(int) to 0, at a per-app level, to prevent any reattempts/recovery of your AM. For a cluster-wide effect instead, you can limit by overriding the default value of the RM property yarn.resourcemanager.am.max-retries in the RM's YarnConfiguration or yarn-site.xml. On Fri, Feb 7, 2014 at 5:24 PM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi, I am having some failure test cases where my Application Master is supposed to fail. But when it fails it is again started with appID_02 . Is there a way for me to avoid the second instance of the Application Master getting started? Is it re-started automatically by the RM after the first one failed? Thanks, Kishore -- Harsh J -- Harsh J
Can we avoid restarting of AM when it fails?
Hi, I am having some failure test cases where my Application Master is supposed to fail. But when it fails it is again started with appID_02 . Is there a way for me to avoid the second instance of the Application Master getting started? Is it re-started automatically by the RM after the first one failed? Thanks, Kishore
Re: Command line tools for YARN
Hi Jian, Thanks for the suggestions, I could get the required information there with yarn node command. Kishore On Fri, Dec 27, 2013 at 12:58 AM, Jian He j...@hortonworks.com wrote: 1) checking how many nodes are in my cluster 3) what are the names of nodes in my cluster you can use yarn node. 2) what are the cluster resources total or available at the time of running the command Not quite sure, you can search possible options in the yarn command menu. And you can always see the resources usage via web UI though. On Mon, Dec 23, 2013 at 10:08 PM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi, Are there any command line tools for things like, 1) checking how many nodes are in my cluster 2) what are the cluster resources total or available at the time of running the command 3) what are the names of nodes in my cluster etc.. Thanks, Kishore CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Command line tools for YARN
Hi, Are there any command line tools for things like, 1) checking how many nodes are in my cluster 2) what are the cluster resources total or available at the time of running the command 3) what are the names of nodes in my cluster etc.. Thanks, Kishore
Container exit status for released container
Hi, I am seeing the exit status of released container(released through a call to releaseAssignedContainer()) to be -100. Can my code assume that -100 will always be given as exit status for a released container by YARN? Thanks, Kishore
Re: Yarn -- one of the daemons getting killed
Hi Vinod, Thanks for the link, I went through it and it looks like the OOM killer picks a process that has the highest oom_score. I have tried to capture oom_score for all the YARN daemon processes after each run of my application.The first time I have captured these details, I see that the name node is killed where as the Node Manager has the highest score. So, I don't if it is really the OOM killer that has killed it! Please see the output of my run attached, which also has the output of free command after each run. The output of free command doesn't either show any exhaustion of system memory. Also, one more thing I have done today is, I have added audit rules for each of the daemons to capture all the system calls. And, in the audit log, I see futex() system call occurring in the killed daemon processes. I don't know if it causes the daemon to die? and why does that call happen... Thanks, Kishore On Wed, Dec 18, 2013 at 12:31 AM, Vinod Kumar Vavilapalli vino...@hortonworks.com wrote: That's good info. It is more than likely that it is the OOM killer. See http://stackoverflow.com/questions/726690/who-killed-my-process-and-whyfor example. Thanks, +Vinod On Dec 17, 2013, at 1:26 AM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi Jeff, I have run the resource manager in the foreground without nohup and here are the messages when it was killed, it says it is Killed but doesn't say why! 13/12/17 03:14:54 INFO capacity.CapacityScheduler: Application appattempt_1387266015651_0258_01 released container container_1387266015651_0258_01_03 on node: host: isredeng:36576 #containers=2 available=7936 used=256 with event: FINISHED 13/12/17 03:14:54 INFO rmcontainer.RMContainerImpl: container_1387266015651_0258_01_05 Container Transitioned from ACQUIRED to RUNNING Killed Thanks, Kishore On Mon, Dec 16, 2013 at 11:10 PM, Jeff Stuckman stuck...@umd.edu wrote: What if you open the daemons in a screen session rather than running them in the background -- for example, run yarn resourcemanager. Then you can see exactly when they terminate, and hopefully why. *From: *Krishna Kishore Bonagiri *Sent: *Monday, December 16, 2013 6:20 AM *To: *user@hadoop.apache.org *Reply To: *user@hadoop.apache.org *Subject: *Re: Yarn -- one of the daemons getting killed Hi Vinod, Yes, I am running on Linux. I was actually searching for a corresponding message in /var/log/messages to confirm that OOM killed my daemons, but could not find any corresponding messages there! According to the following link, it looks like if it is a memory issue, I should see a messages even if OOM is disabled, but I don't see it. http://www.redhat.com/archives/taroon-list/2007-August/msg6.html And, is memory consumption more in case of two node cluster than a single node one? Also, I see this problem only when I give * as the node name. One other thing I suspected was the allowed number of user processes, I increased that to 31000 from 1024 but that also didn't help. Thanks, Kishore On Fri, Dec 13, 2013 at 11:51 PM, Vinod Kumar Vavilapalli vino...@hortonworks.com wrote: Yes, that is what I suspect. That is why I asked if everything is on a single node. If you are running linux, linux OOM killer may be shooting things down. When it happens, you will see something like 'killed process in system's syslog. Thanks, +Vinod On Dec 13, 2013, at 4:52 AM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Vinod, One more thing I observed is that, my Client which submits Application Master one after another continuously also gets killed sometimes. So, it is always any of the Java Processes that is getting killed. Does it indicate some excessive memory usage by them or something like that, that is causing them die? If so, how can we resolve this kind of issue? Thanks, Kishore On Fri, Dec 13, 2013 at 10:16 AM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: No, I am running on 2 node cluster. On Fri, Dec 13, 2013 at 1:52 AM, Vinod Kumar Vavilapalli vino...@hortonworks.com wrote: Is all of this on a single node? Thanks, +Vinod On Dec 12, 2013, at 3:26 AM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi, I am running a small application on YARN (2.2.0) in a loop of 500 times, and while doing so one of the daemons, node manager, resource manager, or data node is getting killed (I mean disappearing) at a random point. I see no information in the corresponding log files. How can I know why is it happening so? And, one more observation is that, this is happening only when I am using * for node name in the container requests, otherwise when I used a specific node name, everything is fine. Thanks, Kishore CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information
Re: Yarn -- one of the daemons getting killed
Hi Jeff, I have run the resource manager in the foreground without nohup and here are the messages when it was killed, it says it is Killed but doesn't say why! 13/12/17 03:14:54 INFO capacity.CapacityScheduler: Application appattempt_1387266015651_0258_01 released container container_1387266015651_0258_01_03 on node: host: isredeng:36576 #containers=2 available=7936 used=256 with event: FINISHED 13/12/17 03:14:54 INFO rmcontainer.RMContainerImpl: container_1387266015651_0258_01_05 Container Transitioned from ACQUIRED to RUNNING Killed Thanks, Kishore On Mon, Dec 16, 2013 at 11:10 PM, Jeff Stuckman stuck...@umd.edu wrote: What if you open the daemons in a screen session rather than running them in the background -- for example, run yarn resourcemanager. Then you can see exactly when they terminate, and hopefully why. *From: *Krishna Kishore Bonagiri *Sent: *Monday, December 16, 2013 6:20 AM *To: *user@hadoop.apache.org *Reply To: *user@hadoop.apache.org *Subject: *Re: Yarn -- one of the daemons getting killed Hi Vinod, Yes, I am running on Linux. I was actually searching for a corresponding message in /var/log/messages to confirm that OOM killed my daemons, but could not find any corresponding messages there! According to the following link, it looks like if it is a memory issue, I should see a messages even if OOM is disabled, but I don't see it. http://www.redhat.com/archives/taroon-list/2007-August/msg6.html And, is memory consumption more in case of two node cluster than a single node one? Also, I see this problem only when I give * as the node name. One other thing I suspected was the allowed number of user processes, I increased that to 31000 from 1024 but that also didn't help. Thanks, Kishore On Fri, Dec 13, 2013 at 11:51 PM, Vinod Kumar Vavilapalli vino...@hortonworks.com wrote: Yes, that is what I suspect. That is why I asked if everything is on a single node. If you are running linux, linux OOM killer may be shooting things down. When it happens, you will see something like 'killed process in system's syslog. Thanks, +Vinod On Dec 13, 2013, at 4:52 AM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Vinod, One more thing I observed is that, my Client which submits Application Master one after another continuously also gets killed sometimes. So, it is always any of the Java Processes that is getting killed. Does it indicate some excessive memory usage by them or something like that, that is causing them die? If so, how can we resolve this kind of issue? Thanks, Kishore On Fri, Dec 13, 2013 at 10:16 AM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: No, I am running on 2 node cluster. On Fri, Dec 13, 2013 at 1:52 AM, Vinod Kumar Vavilapalli vino...@hortonworks.com wrote: Is all of this on a single node? Thanks, +Vinod On Dec 12, 2013, at 3:26 AM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi, I am running a small application on YARN (2.2.0) in a loop of 500 times, and while doing so one of the daemons, node manager, resource manager, or data node is getting killed (I mean disappearing) at a random point. I see no information in the corresponding log files. How can I know why is it happening so? And, one more observation is that, this is happening only when I am using * for node name in the container requests, otherwise when I used a specific node name, everything is fine. Thanks, Kishore CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Yarn -- one of the daemons getting killed
Hi Vinod, Yes, I am running on Linux. I was actually searching for a corresponding message in /var/log/messages to confirm that OOM killed my daemons, but could not find any corresponding messages there! According to the following link, it looks like if it is a memory issue, I should see a messages even if OOM is disabled, but I don't see it. http://www.redhat.com/archives/taroon-list/2007-August/msg6.html And, is memory consumption more in case of two node cluster than a single node one? Also, I see this problem only when I give * as the node name. One other thing I suspected was the allowed number of user processes, I increased that to 31000 from 1024 but that also didn't help. Thanks, Kishore On Fri, Dec 13, 2013 at 11:51 PM, Vinod Kumar Vavilapalli vino...@hortonworks.com wrote: Yes, that is what I suspect. That is why I asked if everything is on a single node. If you are running linux, linux OOM killer may be shooting things down. When it happens, you will see something like 'killed process in system's syslog. Thanks, +Vinod On Dec 13, 2013, at 4:52 AM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Vinod, One more thing I observed is that, my Client which submits Application Master one after another continuously also gets killed sometimes. So, it is always any of the Java Processes that is getting killed. Does it indicate some excessive memory usage by them or something like that, that is causing them die? If so, how can we resolve this kind of issue? Thanks, Kishore On Fri, Dec 13, 2013 at 10:16 AM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: No, I am running on 2 node cluster. On Fri, Dec 13, 2013 at 1:52 AM, Vinod Kumar Vavilapalli vino...@hortonworks.com wrote: Is all of this on a single node? Thanks, +Vinod On Dec 12, 2013, at 3:26 AM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi, I am running a small application on YARN (2.2.0) in a loop of 500 times, and while doing so one of the daemons, node manager, resource manager, or data node is getting killed (I mean disappearing) at a random point. I see no information in the corresponding log files. How can I know why is it happening so? And, one more observation is that, this is happening only when I am using * for node name in the container requests, otherwise when I used a specific node name, everything is fine. Thanks, Kishore CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Yarn -- one of the daemons getting killed
Hi Vinay, In the out files I could see nothing other than the output of ulimit -all . Do I need to enable any other kind of logging to get more information? Thanks, Kishore On Mon, Dec 16, 2013 at 5:41 PM, Vinayakumar B vinayakuma...@huawei.comwrote: Hi Krishna, Please check the out files as well for daemons. You may find something. Cheers, Vinayakumar B *From:* Krishna Kishore Bonagiri [mailto:write2kish...@gmail.com] *Sent:* 16 December 2013 16:50 *To:* user@hadoop.apache.org *Subject:* Re: Yarn -- one of the daemons getting killed Hi Vinod, Yes, I am running on Linux. I was actually searching for a corresponding message in /var/log/messages to confirm that OOM killed my daemons, but could not find any corresponding messages there! According to the following link, it looks like if it is a memory issue, I should see a messages even if OOM is disabled, but I don't see it. http://www.redhat.com/archives/taroon-list/2007-August/msg6.html And, is memory consumption more in case of two node cluster than a single node one? Also, I see this problem only when I give * as the node name. One other thing I suspected was the allowed number of user processes, I increased that to 31000 from 1024 but that also didn't help. Thanks, Kishore On Fri, Dec 13, 2013 at 11:51 PM, Vinod Kumar Vavilapalli vino...@hortonworks.com wrote: Yes, that is what I suspect. That is why I asked if everything is on a single node. If you are running linux, linux OOM killer may be shooting things down. When it happens, you will see something like 'killed process in system's syslog. Thanks, +Vinod On Dec 13, 2013, at 4:52 AM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Vinod, One more thing I observed is that, my Client which submits Application Master one after another continuously also gets killed sometimes. So, it is always any of the Java Processes that is getting killed. Does it indicate some excessive memory usage by them or something like that, that is causing them die? If so, how can we resolve this kind of issue? Thanks, Kishore On Fri, Dec 13, 2013 at 10:16 AM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: No, I am running on 2 node cluster. On Fri, Dec 13, 2013 at 1:52 AM, Vinod Kumar Vavilapalli vino...@hortonworks.com wrote: Is all of this on a single node? Thanks, +Vinod On Dec 12, 2013, at 3:26 AM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi, I am running a small application on YARN (2.2.0) in a loop of 500 times, and while doing so one of the daemons, node manager, resource manager, or data node is getting killed (I mean disappearing) at a random point. I see no information in the corresponding log files. How can I know why is it happening so? And, one more observation is that, this is happening only when I am using * for node name in the container requests, otherwise when I used a specific node name, everything is fine. Thanks, Kishore CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Yarn -- one of the daemons getting killed
Vinod, One more thing I observed is that, my Client which submits Application Master one after another continuously also gets killed sometimes. So, it is always any of the Java Processes that is getting killed. Does it indicate some excessive memory usage by them or something like that, that is causing them die? If so, how can we resolve this kind of issue? Thanks, Kishore On Fri, Dec 13, 2013 at 10:16 AM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: No, I am running on 2 node cluster. On Fri, Dec 13, 2013 at 1:52 AM, Vinod Kumar Vavilapalli vino...@hortonworks.com wrote: Is all of this on a single node? Thanks, +Vinod On Dec 12, 2013, at 3:26 AM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi, I am running a small application on YARN (2.2.0) in a loop of 500 times, and while doing so one of the daemons, node manager, resource manager, or data node is getting killed (I mean disappearing) at a random point. I see no information in the corresponding log files. How can I know why is it happening so? And, one more observation is that, this is happening only when I am using * for node name in the container requests, otherwise when I used a specific node name, everything is fine. Thanks, Kishore CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Yarn -- one of the daemons getting killed
Hi, I am running a small application on YARN (2.2.0) in a loop of 500 times, and while doing so one of the daemons, node manager, resource manager, or data node is getting killed (I mean disappearing) at a random point. I see no information in the corresponding log files. How can I know why is it happening so? And, one more observation is that, this is happening only when I am using * for node name in the container requests, otherwise when I used a specific node name, everything is fine. Thanks, Kishore
Re: Yarn -- one of the daemons getting killed
No, I am running on 2 node cluster. On Fri, Dec 13, 2013 at 1:52 AM, Vinod Kumar Vavilapalli vino...@hortonworks.com wrote: Is all of this on a single node? Thanks, +Vinod On Dec 12, 2013, at 3:26 AM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi, I am running a small application on YARN (2.2.0) in a loop of 500 times, and while doing so one of the daemons, node manager, resource manager, or data node is getting killed (I mean disappearing) at a random point. I see no information in the corresponding log files. How can I know why is it happening so? And, one more observation is that, this is happening only when I am using * for node name in the container requests, otherwise when I used a specific node name, everything is fine. Thanks, Kishore CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
How to find which version of Hadoop or YARN
Hi, Is there any command for finding which version of Hadoop or YARN we are running? If so, what is that? Thanks, Kishore
Re: How to find which version of Hadoop or YARN
Yes, thank you. On Tue, Dec 10, 2013 at 11:15 AM, Brahma Reddy Battula brahmareddy.batt...@huawei.com wrote: Hi kishore, Hope following will help you.. /home/hadoop-2.0.5-alpha/bin # *./hadoop version* *Hadoop 2.0.5-alpha Subversion * *http://svn.apache.org/repos/asf/hadoop/common*http://svn.apache.org/repos/asf/hadoop/common * -r 1488459 Compiled by jenkins on 2013-06-01T04:05Z *From source with checksum c8f4bd45ac25c31b815f311b32ef17 This command was run using /home/hadtest/Opensource/hadoop-2.0.5-alpha/share/hadoop/common/hadoop-common-2.0.5-alpha.jar ---Brahma -- *From:* Krishna Kishore Bonagiri [write2kish...@gmail.com] *Sent:* Tuesday, December 10, 2013 1:30 PM *To:* user@hadoop.apache.org *Subject:* How to find which version of Hadoop or YARN Hi, Is there any command for finding which version of Hadoop or YARN we are running? If so, what is that? Thanks, Kishore
Re: YARN: LocalResources and file distribution
Hi Arun, I have copied a shell script to HDFS and trying to execute it on containers. How do I specify my shell script PATH in setCommands() call on ContainerLaunchContext? I am doing it this way String shellScriptPath = hdfs://isredeng:8020/user/kbonagir/KKDummy/list.ksh; commands.add(shellScriptPath); But my container execution is failing saying that there is No such file or directory! org.apache.hadoop.util.Shell$ExitCodeException: /bin/bash: hdfs://isredeng:8020/user/kbonagir/KKDummy/list.ksh: No such file or directory at org.apache.hadoop.util.Shell.runCommand(Shell.java:464) at org.apache.hadoop.util.Shell.run(Shell.java:379) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589) I could see this file with hadoop fs command and also saw messages in Node Manager's log saying that the resource is downloaded and localized. So, how do I run the downloaded shell script on a container? Thanks, Kishore On Tue, Dec 3, 2013 at 4:57 AM, Arun C Murthy a...@hortonworks.com wrote: Robert, YARN, by default, will only download *resource* from a shared namespace (e.g. HDFS). If /home/hadoop/robert/large_jar.jar is available on each node then you can specify path as file:///home/hadoop/robert/large_jar.jar and it should work. Else, you'll need to copy /home/hadoop/robert/large_jar.jar to HDFS and then specify hdfs://host:port/path/to/large_jar.jar. hth, Arun On Dec 1, 2013, at 12:03 PM, Robert Metzger metrob...@gmail.com wrote: Hello, I'm currently writing code to run my application using Yarn (Hadoop 2.2.0). I used this code as a skeleton: https://github.com/hortonworks/simple-yarn-app Everything works fine on my local machine or on a cluster with the shared directories, but when I want to access resources outside of commonly accessible locations, my application fails. I have my application in a large jar file, containing everything (Submission Client, Application Master, and Workers). The submission client registers the large jar file as a local resource for the Application master's context. In my understanding, Yarn takes care of transferring the client-local resources to the application master's container. This is also stated here: http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/WritingYarnApplications.html You can use the LocalResource to add resources to your application request. This will cause YARN to distribute the resource to the ApplicationMaster node. If I'm starting my jar from the dir /home/hadoop/robert/large_jar.jar, I'll get the following error from the nodemanager (another node in the cluster): 2013-12-01 20:13:00,810 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Failed to download rsrc { { file:/home/hadoop/robert/large_jar.jar, .. So it seems as this node tries to access the file from its local file system. Do I have to use another protocol for the file, something like file://host:port/home/blabla ? Is it true that Yarn is able to distribute files (not using hdfs obviously?) ? The distributedshell-example suggests that I have to use HDFS: https://github.com/apache/hadoop-common/blob/50f0de14e377091c308c3a74ed089a7e4a7f0bfe/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/Client.java Sincerely, Robert -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/ CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Time taken for starting AMRMClientAsync
Hi Alejandro, I don't start all the AMs from the same JVM. How can I do that? Also, when I do that, that will save me time taken to get AM started, which is also good to see an improvement in. Please let me know how can I do that? And, would this also save me time taken for connecting from AM to the Resource Manager? Thanks, Kishore On Tue, Nov 26, 2013 at 3:45 AM, Alejandro Abdelnur t...@cloudera.comwrote: Hi Krishna, Are you starting all AMs from the same JVM? Mind sharing the code you are using for your time testing? Thx On Thu, Nov 21, 2013 at 6:11 AM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi Alejandro, I have modified the code in hadoop-2.2.0-src/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher/src/main/java/org/apache/hadoop/yarn/applications/unmanagedamlauncher/UnmanagedAMLauncher.java to submit multiple application masters one after another and still seeing 800 to 900 ms being taken for the start() call on AMRMClientAsync in all of those applications. Please suggest if you think I am missing something else Thanks, Kishore On Tue, Nov 19, 2013 at 6:07 PM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi Alejandro, I don't know what are managed and unmanaged AMs, can you please explain me what are the difference and how are each of them launched? I tried to google for these terms and came across hadoop-yarn-applications-unmanaged-am-launcher-2.2.0.jar, is it related to that? Thanks, Kishore On Tue, Nov 19, 2013 at 12:15 AM, Alejandro Abdelnur t...@cloudera.comwrote: Kishore, Also, please specify if you are using managed or unmanaged AMs (the numbers I've mentioned before are using unmanaged AMs). thx On Sun, Nov 17, 2013 at 11:16 AM, Vinod Kumar Vavilapalli vino...@hortonworks.com wrote: It is just creating a connection to RM and shouldn't take that long. Can you please file a ticket so that we can look at it? JVM class loading overhead is one possibility but 1 sec is a bit too much. Thanks, +Vinod On Oct 21, 2013, at 7:16 AM, Krishna Kishore Bonagiri wrote: Hi, I am seeing the following call to start() on AMRMClientAsync taking from 0.9 to 1 second. Why does it take that long? Is there a way to reduce it, I mean does it depend on any of the interval parameters or so in configuration files? I have tried reducing the value of the first argument below from 1000 to 100 seconds also, but that doesn't help. AMRMClientAsync.CallbackHandler allocListener = new RMCallbackHandler(); amRMClient = AMRMClientAsync.createAMRMClientAsync(1000, allocListener); amRMClient.init(conf); amRMClient.start(); Thanks, Kishore CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- Alejandro -- Alejandro
Re: Time taken for starting AMRMClientAsync
Vinod, Do you expect managed AMs also not take as much time as a second? Or as Alejandro was saying, only unmanaged AMs? I think I am using managed AMs. If managed AMs also are not expected to take that much time, I shall raise a ticket. Thanks, Kishore On Mon, Nov 18, 2013 at 12:46 AM, Vinod Kumar Vavilapalli vino...@hortonworks.com wrote: It is just creating a connection to RM and shouldn't take that long. Can you please file a ticket so that we can look at it? JVM class loading overhead is one possibility but 1 sec is a bit too much. Thanks, +Vinod On Oct 21, 2013, at 7:16 AM, Krishna Kishore Bonagiri wrote: Hi, I am seeing the following call to start() on AMRMClientAsync taking from 0.9 to 1 second. Why does it take that long? Is there a way to reduce it, I mean does it depend on any of the interval parameters or so in configuration files? I have tried reducing the value of the first argument below from 1000 to 100 seconds also, but that doesn't help. AMRMClientAsync.CallbackHandler allocListener = new RMCallbackHandler(); amRMClient = AMRMClientAsync.createAMRMClientAsync(1000, allocListener); amRMClient.init(conf); amRMClient.start(); Thanks, Kishore CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Time taken for starting AMRMClientAsync
Hi Alejandro, I have modified the code in hadoop-2.2.0-src/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher/src/main/java/org/apache/hadoop/yarn/applications/unmanagedamlauncher/UnmanagedAMLauncher.java to submit multiple application masters one after another and still seeing 800 to 900 ms being taken for the start() call on AMRMClientAsync in all of those applications. Please suggest if you think I am missing something else Thanks, Kishore On Tue, Nov 19, 2013 at 6:07 PM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi Alejandro, I don't know what are managed and unmanaged AMs, can you please explain me what are the difference and how are each of them launched? I tried to google for these terms and came across hadoop-yarn-applications-unmanaged-am-launcher-2.2.0.jar, is it related to that? Thanks, Kishore On Tue, Nov 19, 2013 at 12:15 AM, Alejandro Abdelnur t...@cloudera.comwrote: Kishore, Also, please specify if you are using managed or unmanaged AMs (the numbers I've mentioned before are using unmanaged AMs). thx On Sun, Nov 17, 2013 at 11:16 AM, Vinod Kumar Vavilapalli vino...@hortonworks.com wrote: It is just creating a connection to RM and shouldn't take that long. Can you please file a ticket so that we can look at it? JVM class loading overhead is one possibility but 1 sec is a bit too much. Thanks, +Vinod On Oct 21, 2013, at 7:16 AM, Krishna Kishore Bonagiri wrote: Hi, I am seeing the following call to start() on AMRMClientAsync taking from 0.9 to 1 second. Why does it take that long? Is there a way to reduce it, I mean does it depend on any of the interval parameters or so in configuration files? I have tried reducing the value of the first argument below from 1000 to 100 seconds also, but that doesn't help. AMRMClientAsync.CallbackHandler allocListener = new RMCallbackHandler(); amRMClient = AMRMClientAsync.createAMRMClientAsync(1000, allocListener); amRMClient.init(conf); amRMClient.start(); Thanks, Kishore CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- Alejandro
Unmanaged AMs
Hi, I have seen in comments for code in UnmanagedAMLauncher.java that AM can be in any language. What does that mean? Can AM be written in C++ language? If so, how would I be able to be connect to RM and how would I be able to request for containers? I mean what is the interface doing these things? Is there a sample code/example somewhere to get an idea about how to do it? Thanks, Kishore
Running multiple application from the same Application Master
Hi, I was reading on this link http://hortonworks.com/blog/apache-hadoop-yarn-concepts-and-applications/ that we can implement an Application Master to manage multiple applications. The text reads like this: *It’s useful to remember that, in reality, every application has its own instance of an ApplicationMaster. However, it’s completely feasible to implement an ApplicationMaster to manage a set of applications (e.g. ApplicationMaster for Pig or Hive to manage a set of MapReduce jobs). * How is it possible? Do they have different application IDs and are they treated as different applications on the cluster? What are the positive/negative implications of it? Is it a recommended way? Thanks, Kishore
Re: Time taken for starting AMRMClientAsync
Hi Alejandro, I don't know what are managed and unmanaged AMs, can you please explain me what are the difference and how are each of them launched? I tried to google for these terms and came across hadoop-yarn-applications-unmanaged-am-launcher-2.2.0.jar, is it related to that? Thanks, Kishore On Tue, Nov 19, 2013 at 12:15 AM, Alejandro Abdelnur t...@cloudera.comwrote: Kishore, Also, please specify if you are using managed or unmanaged AMs (the numbers I've mentioned before are using unmanaged AMs). thx On Sun, Nov 17, 2013 at 11:16 AM, Vinod Kumar Vavilapalli vino...@hortonworks.com wrote: It is just creating a connection to RM and shouldn't take that long. Can you please file a ticket so that we can look at it? JVM class loading overhead is one possibility but 1 sec is a bit too much. Thanks, +Vinod On Oct 21, 2013, at 7:16 AM, Krishna Kishore Bonagiri wrote: Hi, I am seeing the following call to start() on AMRMClientAsync taking from 0.9 to 1 second. Why does it take that long? Is there a way to reduce it, I mean does it depend on any of the interval parameters or so in configuration files? I have tried reducing the value of the first argument below from 1000 to 100 seconds also, but that doesn't help. AMRMClientAsync.CallbackHandler allocListener = new RMCallbackHandler(); amRMClient = AMRMClientAsync.createAMRMClientAsync(1000, allocListener); amRMClient.init(conf); amRMClient.start(); Thanks, Kishore CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- Alejandro
Re: Time taken for starting AMRMClientAsync
Hi Alejandro, Can you please see if you can answer my question above? I would like to reduce the time taken by the above calls made by my Application Master, the way you do. Thanks, Kishore On Tue, Oct 22, 2013 at 3:09 PM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi Alejandro, I submit all my applications from a single Client, but all of my application masters are taking almost the same amount of time for finishing the above calls. Do you reuse ApplicationMaster instances or do some other thing for saving this time? Otherwise I felt the fresh application connecting to the resource manager would take the same amount of time although I don't know why should it take that much? Thanks, Kishore On Mon, Oct 21, 2013 at 9:23 PM, Alejandro Abdelnur t...@cloudera.comwrote: Hi Krishna, Those 900ms seems consistent with the numbers we found while doing some benchmarks in the context of Llama: http://cloudera.github.io/llama/ We found that the first application master created from a client process takes around 900 ms to be ready to submit resource requests. Subsequent application masters created from the same client process take a mean of 20 ms. The application master submission throughput (discarding the first submission) tops at approximately 100 application masters per second. I believe there is room for improvement there. Cheers On Mon, Oct 21, 2013 at 7:16 AM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi, I am seeing the following call to start() on AMRMClientAsync taking from 0.9 to 1 second. Why does it take that long? Is there a way to reduce it, I mean does it depend on any of the interval parameters or so in configuration files? I have tried reducing the value of the first argument below from 1000 to 100 seconds also, but that doesn't help. AMRMClientAsync.CallbackHandler allocListener = new RMCallbackHandler(); amRMClient = AMRMClientAsync.createAMRMClientAsync(1000, allocListener); amRMClient.init(conf); amRMClient.start(); Thanks, Kishore -- Alejandro
Re: Time taken for starting AMRMClientAsync
Hi Alejandro, I submit all my applications from a single Client, but all of my application masters are taking almost the same amount of time for finishing the above calls. Do you reuse ApplicationMaster instances or do some other thing for saving this time? Otherwise I felt the fresh application connecting to the resource manager would take the same amount of time although I don't know why should it take that much? Thanks, Kishore On Mon, Oct 21, 2013 at 9:23 PM, Alejandro Abdelnur t...@cloudera.comwrote: Hi Krishna, Those 900ms seems consistent with the numbers we found while doing some benchmarks in the context of Llama: http://cloudera.github.io/llama/ We found that the first application master created from a client process takes around 900 ms to be ready to submit resource requests. Subsequent application masters created from the same client process take a mean of 20 ms. The application master submission throughput (discarding the first submission) tops at approximately 100 application masters per second. I believe there is room for improvement there. Cheers On Mon, Oct 21, 2013 at 7:16 AM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi, I am seeing the following call to start() on AMRMClientAsync taking from 0.9 to 1 second. Why does it take that long? Is there a way to reduce it, I mean does it depend on any of the interval parameters or so in configuration files? I have tried reducing the value of the first argument below from 1000 to 100 seconds also, but that doesn't help. AMRMClientAsync.CallbackHandler allocListener = new RMCallbackHandler(); amRMClient = AMRMClientAsync.createAMRMClientAsync(1000, allocListener); amRMClient.init(conf); amRMClient.start(); Thanks, Kishore -- Alejandro
Time taken for starting AMRMClientAsync
Hi, I am seeing the following call to start() on AMRMClientAsync taking from 0.9 to 1 second. Why does it take that long? Is there a way to reduce it, I mean does it depend on any of the interval parameters or so in configuration files? I have tried reducing the value of the first argument below from 1000 to 100 seconds also, but that doesn't help. AMRMClientAsync.CallbackHandler allocListener = new RMCallbackHandler(); amRMClient = AMRMClientAsync.createAMRMClientAsync(1000, allocListener); amRMClient.init(conf); amRMClient.start(); Thanks, Kishore
Re: Hadoop-2.0.1 log files deletion
Hi Reyane, Did you try yarn.nodemanager.log.retain-seconds? increasing that might help. The default value is 10800 seconds, that means 3 hours. Thanks, Kishore On Thu, Oct 10, 2013 at 8:27 PM, Reyane Oukpedjo oukped...@gmail.comwrote: Hi there, I was running somme mapreduce jobs on hadoop-2.1.0-beta . These are multiple unit tests that can take more than a day to finish running. However I realized the logs for the jobs are being deleted some how quickly than the default 24 hours setting of mapreduce.job.userlog.retain.hours property on mapred-site.xml. some of the job logs were deleted after 4 hours. Can this be a bug or if not is there any other property that override this ? Thank you. Reyane OUKPEDJO
Single Yarn Client -- multiple applications?
Hi, Can we submit multiple applications from the same Client class? It seems to be allowed now, I just tried it with Distributed Shell example... Is it OK to do so? or does it have any wrong implications? Thanks, Kishore
Can container requests be made paralelly from multiple threads
Hi, Can we submit container requests from multiple threads in parallel to the Resource Manager? Thanks, Kishore
Re: Can container requests be made paralelly from multiple threads
Hi Omkar, Thanks for the quick reply. I have a requirement for sets of containers depending on some of my business logic. I found that each of the request allocations is taking around 2 seconds, so I am thinking of doing the requests at the same from multiple threads. Kishore On Fri, Sep 27, 2013 at 11:27 PM, Omkar Joshi ojo...@hortonworks.comwrote: Hi, I suggest you should not do that. After YARN-744 goes in this will be prevented on RM side. May I know why you want to do this? any advantage/ use case? Thanks, Omkar Joshi *Hortonworks Inc.* http://www.hortonworks.com On Fri, Sep 27, 2013 at 8:31 AM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi, Can we submit container requests from multiple threads in parallel to the Resource Manager? Thanks, Kishore CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Container allocation fails randomly
Hi Omkar, It is my own custom AM that I am using, not the MR-AM. But I am still not able to believe how can a negative value go from the getProgress() call which is always calculated from division of positive numbers, but might be some floating point computation problems as you are saying. Thanks, Kishroe On Thu, Sep 19, 2013 at 5:52 AM, Omkar Joshi ojo...@hortonworks.com wrote: This is clearly an AM bug. are you using MR-AM or custom AM? you should check AM code which is computing progress. I suspect there must be some float computation problems. If it is an MR-AM problem then please file a map reduce bug. Thanks, Omkar Joshi *Hortonworks Inc.* http://www.hortonworks.com On Tue, Sep 17, 2013 at 2:47 AM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi Omkar, Thanks for the quick reply, and sorry for not being able to get the required logs that you have asked for. But in the meanwhile I just wanted to check if you can get a clue with the information I have now. I am seeing the following kind of error message in AppMaster.stderr whenever this failure is happening. I don't know why does it happen, the getProgress() call that I have implemented in RMCallbackHandler could never return a negative value! Doesn't this error mean that this getProgress() is giving a -ve value? Exception in thread AMRM Heartbeater thread java.lang.IllegalArgumentException: Progress indicator should not be negative at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88) at org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.allocate(AMRMClientImpl.java:199) at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$HeartbeatThread.run(AMRMClientAsyncImpl.java:224) Thanks, Kishore On Fri, Sep 13, 2013 at 2:59 AM, Omkar Joshi ojo...@hortonworks.comwrote: Can you give more information? logs (complete) will help a lot around this time frame. Are the containers getting assigned via scheduler? is it failing when node manager tries to start container? I clearly see the diagnostic message is empty but do you see anything in NM logs? Also if there were running containers on the machine before launching new ones.. then are they killed? or they are still hanging around? can you also try applying patch https://issues.apache.org/jira/browse/YARN-1053; ? and check if you can see any message? Thanks, Omkar Joshi *Hortonworks Inc.* http://www.hortonworks.com On Thu, Sep 12, 2013 at 6:15 AM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi, I am using 2.1.0-beta and have seen container allocation failing randomly even when running the same application in a loop. I know that the cluster has enough resources to give, because it gave the resources for the same application all the other times in the loop and ran it successfully. I have observed a lot of the following kind of messages in the node manager's log whenever such failure happens, any clues as to why it happens? 2013-09-12 08:54:36,204 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out status for container: container_id { app_attempt_id { application_id { id: 2 cluster_timestamp: 1378990400253 } attemptId: 1 } id: 1 } state: C_RUNNING diagnostics: exit_status: -1000 2013-09-12 08:54:37,220 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out status for container: container_id { app_attempt_id { application_id { id: 2 cluster_timestamp: 1378990400253 } attemptId: 1 } id: 1 } state: C_RUNNING diagnostics: exit_status: -1000 2013-09-12 08:54:38,231 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out status for container: container_id { app_attempt_id { application_id { id: 2 cluster_timestamp: 1378990400253 } attemptId: 1 } id: 1 } state: C_RUNNING diagnostics: exit_status: -1000 2013-09-12 08:54:39,239 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out status for container: container_id { app_attempt_id { application_id { id: 2 cluster_timestamp: 1378990400253 } attemptId: 1 } id: 1 } state: C_RUNNING diagnostics: exit_status: -1000 2013-09-12 08:54:40,267 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out status for container: container_id { app_attempt_id { application_id { id: 2 cluster_timestamp: 1378990400253 } attemptId: 1 } id: 1 } state: C_RUNNING diagnostics: exit_status: -1000 2013-09-12 08:54:41,275 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out status for container: container_id { app_attempt_id { application_id { id: 2 cluster_timestamp: 1378990400253 } attemptId: 1 } id: 1 } state: C_RUNNING diagnostics: exit_status: -1000 2013-09-12 08:54:42,283 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out status for container: container_id { app_attempt_id { application_id { id
Re: Container allocation fails randomly
Hi Omkar, Thanks for the quick reply, and sorry for not being able to get the required logs that you have asked for. But in the meanwhile I just wanted to check if you can get a clue with the information I have now. I am seeing the following kind of error message in AppMaster.stderr whenever this failure is happening. I don't know why does it happen, the getProgress() call that I have implemented in RMCallbackHandler could never return a negative value! Doesn't this error mean that this getProgress() is giving a -ve value? Exception in thread AMRM Heartbeater thread java.lang.IllegalArgumentException: Progress indicator should not be negative at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88) at org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.allocate(AMRMClientImpl.java:199) at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$HeartbeatThread.run(AMRMClientAsyncImpl.java:224) Thanks, Kishore On Fri, Sep 13, 2013 at 2:59 AM, Omkar Joshi ojo...@hortonworks.com wrote: Can you give more information? logs (complete) will help a lot around this time frame. Are the containers getting assigned via scheduler? is it failing when node manager tries to start container? I clearly see the diagnostic message is empty but do you see anything in NM logs? Also if there were running containers on the machine before launching new ones.. then are they killed? or they are still hanging around? can you also try applying patch https://issues.apache.org/jira/browse/YARN-1053; ? and check if you can see any message? Thanks, Omkar Joshi *Hortonworks Inc.* http://www.hortonworks.com On Thu, Sep 12, 2013 at 6:15 AM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi, I am using 2.1.0-beta and have seen container allocation failing randomly even when running the same application in a loop. I know that the cluster has enough resources to give, because it gave the resources for the same application all the other times in the loop and ran it successfully. I have observed a lot of the following kind of messages in the node manager's log whenever such failure happens, any clues as to why it happens? 2013-09-12 08:54:36,204 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out status for container: container_id { app_attempt_id { application_id { id: 2 cluster_timestamp: 1378990400253 } attemptId: 1 } id: 1 } state: C_RUNNING diagnostics: exit_status: -1000 2013-09-12 08:54:37,220 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out status for container: container_id { app_attempt_id { application_id { id: 2 cluster_timestamp: 1378990400253 } attemptId: 1 } id: 1 } state: C_RUNNING diagnostics: exit_status: -1000 2013-09-12 08:54:38,231 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out status for container: container_id { app_attempt_id { application_id { id: 2 cluster_timestamp: 1378990400253 } attemptId: 1 } id: 1 } state: C_RUNNING diagnostics: exit_status: -1000 2013-09-12 08:54:39,239 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out status for container: container_id { app_attempt_id { application_id { id: 2 cluster_timestamp: 1378990400253 } attemptId: 1 } id: 1 } state: C_RUNNING diagnostics: exit_status: -1000 2013-09-12 08:54:40,267 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out status for container: container_id { app_attempt_id { application_id { id: 2 cluster_timestamp: 1378990400253 } attemptId: 1 } id: 1 } state: C_RUNNING diagnostics: exit_status: -1000 2013-09-12 08:54:41,275 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out status for container: container_id { app_attempt_id { application_id { id: 2 cluster_timestamp: 1378990400253 } attemptId: 1 } id: 1 } state: C_RUNNING diagnostics: exit_status: -1000 2013-09-12 08:54:42,283 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out status for container: container_id { app_attempt_id { application_id { id: 2 cluster_timestamp: 1378990400253 } attemptId: 1 } id: 1 } state: C_RUNNING diagnostics: exit_status: -1000 2013-09-12 08:54:43,289 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out status for container: container_id { app_attempt_id { application_id { id: 2 cluster_timestamp: 1378990400253 } attemptId: 1 } id: 1 } state: C_RUNNING diagnostics: exit_status: -1000 Thanks, Kishore CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution
Container allocation fails randomly
Hi, I am using 2.1.0-beta and have seen container allocation failing randomly even when running the same application in a loop. I know that the cluster has enough resources to give, because it gave the resources for the same application all the other times in the loop and ran it successfully. I have observed a lot of the following kind of messages in the node manager's log whenever such failure happens, any clues as to why it happens? 2013-09-12 08:54:36,204 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out status for container: container_id { app_attempt_id { application_id { id: 2 cluster_timestamp: 1378990400253 } attemptId: 1 } id: 1 } state: C_RUNNING diagnostics: exit_status: -1000 2013-09-12 08:54:37,220 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out status for container: container_id { app_attempt_id { application_id { id: 2 cluster_timestamp: 1378990400253 } attemptId: 1 } id: 1 } state: C_RUNNING diagnostics: exit_status: -1000 2013-09-12 08:54:38,231 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out status for container: container_id { app_attempt_id { application_id { id: 2 cluster_timestamp: 1378990400253 } attemptId: 1 } id: 1 } state: C_RUNNING diagnostics: exit_status: -1000 2013-09-12 08:54:39,239 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out status for container: container_id { app_attempt_id { application_id { id: 2 cluster_timestamp: 1378990400253 } attemptId: 1 } id: 1 } state: C_RUNNING diagnostics: exit_status: -1000 2013-09-12 08:54:40,267 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out status for container: container_id { app_attempt_id { application_id { id: 2 cluster_timestamp: 1378990400253 } attemptId: 1 } id: 1 } state: C_RUNNING diagnostics: exit_status: -1000 2013-09-12 08:54:41,275 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out status for container: container_id { app_attempt_id { application_id { id: 2 cluster_timestamp: 1378990400253 } attemptId: 1 } id: 1 } state: C_RUNNING diagnostics: exit_status: -1000 2013-09-12 08:54:42,283 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out status for container: container_id { app_attempt_id { application_id { id: 2 cluster_timestamp: 1378990400253 } attemptId: 1 } id: 1 } state: C_RUNNING diagnostics: exit_status: -1000 2013-09-12 08:54:43,289 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out status for container: container_id { app_attempt_id { application_id { id: 2 cluster_timestamp: 1378990400253 } attemptId: 1 } id: 1 } state: C_RUNNING diagnostics: exit_status: -1000 Thanks, Kishore
Requesting set of containers on a single node
Hi, My application has a group of processes that need to communicate with each other either through Shared Memory or TCP/IP depending on where the containers are allocated, on the same machine or on different machines. Obviously I would like them to get them allocated on the same node whenever possible which requires all of the containers to be on the same node. But I don't want to specify a node name in my request because I don't bother wherever they run in the cluster but all of them have to be on the same node. Is there a way to make such a request for containers currently? Or if not, I think this would be good to be have because many applications could have such kind of requirement. Thanks, Kishore
Re: whitelist feature of YARN
Hi Sandy, Thanks for the reply and it is good to know YARN-521 is done! Please answer my following questions 1) when is 2.1.0-beta going to be released? is it soon or do you suggest me take it from the trunk or is there a recent release candidate available? 2) I have recently changed my application to use the new Asynchronous interfaces. I am hoping it works with that too, correct me if I am wrong. 3) Change in interface: The old interface for ContainerRequest constructor used to be this: public ContainerRequest(Resource capability, String[] nodes, String[] racks, Priority priority, int containerCount); where as now it is changed to a) public ContainerRequest(Resource capability, String[] nodes, String[] racks, Priority priority) b) public ContainerRequest(Resource capability, String[] nodes, String[] racks, Priority priority, boolean relaxLocality) that means the old argument containerCount is gone! How would I be able to specify how many containers do I need? -Kishore On Wed, Aug 7, 2013 at 11:37 AM, Sandy Ryza sandy.r...@cloudera.com wrote: YARN-521, which brings whitelisting to the AMRMClient APIs, is now included in 2.1.0-beta. Check out the doc for the relaxLocality paramater in ContainerRequest in AMRMClient: https://github.com/apache/hadoop-common/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/AMRMClient.java and I can help clarify here if anything's confusing. -Sandy On Tue, Jul 9, 2013 at 2:54 AM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi Sandy, Yes, I have been using AMRMClient APIs. I am planning to shift to whatever way is this white list feature is supported with. But am not sure what is meant by submitting ResourceRequests directly to RM. Can you please elaborate on this or give me a pointer to some example code on how to do it... Thanks for the reply, -Kishore On Mon, Jul 8, 2013 at 10:53 PM, Sandy Ryza sandy.r...@cloudera.comwrote: Hi Krishna, From your previous email, it looks like you are using the AMRMClient APIs. Support for whitelisting is not yet supported through them. I am working on this in YARN-521, which should be included in the next release after 2.1.0-beta. If you are submitting ResourceRequests directly to the RM, you can whitelist a node by * setting the relaxLocality flag on the node-level ResourceRequest to true * setting the relaxLocality flag on the corresponding rack-level ResourceRequest to false * setting the relaxLocality flag on the corresponding any-level ResourceRequest to false -Sandy On Mon, Jul 8, 2013 at 6:48 AM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi, Can someone please point to some example code of how to use the whitelist feature of YARN, I have recently got RC1 for hadoop-2.1.0-beta and want to use this feature. It would be great if you can point me to some description of what this white listing feature is, I have gone through some JIRA logs related to this but more concrete explanation would be helpful. Thanks, Kishore
Re: whitelist feature of YARN
Sandy, Thanks again. I found RC1 for 2.1.0-beta available at http://people.apache.org/~acmurthy/hadoop-2.1.0-beta-rc1/ Would this have the fix for YARN-521? and, can I use that? -Kishore On Wed, Aug 7, 2013 at 12:35 PM, Sandy Ryza sandy.r...@cloudera.com wrote: Responses inline: On Tue, Aug 6, 2013 at 11:55 PM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi Sandy, Thanks for the reply and it is good to know YARN-521 is done! Please answer my following questions 1) when is 2.1.0-beta going to be released? is it soon or do you suggest me take it from the trunk or is there a recent release candidate available? We're very close and my guess would be no later than the end of the month (don't hold me to this). 2) I have recently changed my application to use the new Asynchronous interfaces. I am hoping it works with that too, correct me if I am wrong. ContainerRequest is shared by the async interfaces as well so it should work here. 3) Change in interface: The old interface for ContainerRequest constructor used to be this: public ContainerRequest(Resource capability, String[] nodes, String[] racks, Priority priority, int containerCount); where as now it is changed to a) public ContainerRequest(Resource capability, String[] nodes, String[] racks, Priority priority) b) public ContainerRequest(Resource capability, String[] nodes, String[] racks, Priority priority, boolean relaxLocality) that means the old argument containerCount is gone! How would I be able to specify how many containers do I need? We now expect that you submit a ContainerRequest for each container you want. -Kishore On Wed, Aug 7, 2013 at 11:37 AM, Sandy Ryza sandy.r...@cloudera.comwrote: YARN-521, which brings whitelisting to the AMRMClient APIs, is now included in 2.1.0-beta. Check out the doc for the relaxLocality paramater in ContainerRequest in AMRMClient: https://github.com/apache/hadoop-common/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/AMRMClient.java and I can help clarify here if anything's confusing. -Sandy On Tue, Jul 9, 2013 at 2:54 AM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi Sandy, Yes, I have been using AMRMClient APIs. I am planning to shift to whatever way is this white list feature is supported with. But am not sure what is meant by submitting ResourceRequests directly to RM. Can you please elaborate on this or give me a pointer to some example code on how to do it... Thanks for the reply, -Kishore On Mon, Jul 8, 2013 at 10:53 PM, Sandy Ryza sandy.r...@cloudera.comwrote: Hi Krishna, From your previous email, it looks like you are using the AMRMClient APIs. Support for whitelisting is not yet supported through them. I am working on this in YARN-521, which should be included in the next release after 2.1.0-beta. If you are submitting ResourceRequests directly to the RM, you can whitelist a node by * setting the relaxLocality flag on the node-level ResourceRequest to true * setting the relaxLocality flag on the corresponding rack-level ResourceRequest to false * setting the relaxLocality flag on the corresponding any-level ResourceRequest to false -Sandy On Mon, Jul 8, 2013 at 6:48 AM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi, Can someone please point to some example code of how to use the whitelist feature of YARN, I have recently got RC1 for hadoop-2.1.0-beta and want to use this feature. It would be great if you can point me to some description of what this white listing feature is, I have gone through some JIRA logs related to this but more concrete explanation would be helpful. Thanks, Kishore
Re: Extra start-up overhead with hadoop-2.1.0-beta
Hi Omkar, Can you please see if you can answer my question with this info or if you need anything else from me? Also, does resource localization improve or impact any performance? Thanks, Kishore On Thu, Aug 1, 2013 at 11:20 PM, Omkar Joshi ojo...@hortonworks.com wrote: How are you making these measurements can you elaborate more? Is it on a best case basis or on an average or worst case? How many resources are you sending it for localization? were the sizes and number of these resources consistent across tests? Were these resources public/private/application specific? Apart from this is the other load on node manager same? is the load on hdfs same? did you see any network bottleneck? More information will help a lot. Thanks, Omkar Joshi *Hortonworks Inc.* http://www.hortonworks.com On Thu, Aug 1, 2013 at 2:19 AM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi, Please share with me if you anyone has an answer or clues to my question regarding the start up performance. Also, one more thing I have observed today is the time taken to run a command on a container went up by more than a second in this latest version. When using 2.0.4-alpha, it used to take 0.3 to 0.5 seconds from the point I call startContainer() to the point the command is started on the container. where as When using 2.1.0-beta, it is taking around 1.5 seconds from the point it came to the call back onContainerStarted() to the point the command is seen started running on the container. Thanks, Kishore On Thu, Jul 25, 2013 at 8:38 PM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi, I have been using the hadoop-2.0.1-beta release candidate and observed that it is slower in running my simple application that runs on 2 containers. I have tried to find out which parts of it is really having this extra overhead(compared to hadoop-2.0.4-alpha), and here is what I found that. 1) From the point my Client has submitted the Application Master to RM, it is taking 2 seconds extra 2) From the point my container request are set up by Application Master, till the containers are allocated, it is taking 2 seconds extra Is this overhead expected with the changes that went into the new version? Or is there to improve it by changing something in configurations or so? Thanks, Kishore
Re: Extra start-up overhead with hadoop-2.1.0-beta
No Ravi, I am not running any MR job. Also, my configuration files are not big. On Wed, Aug 7, 2013 at 11:12 PM, Ravi Prakash ravi...@ymail.com wrote: I believe https://issues.apache.org/jira/browse/MAPREDUCE-5399 causes performance degradation in cases where there are a lot of reducers. I can imagine it causing degradation if the configuration files are super big / some other weird cases. -- *From:* Krishna Kishore Bonagiri write2kish...@gmail.com *To:* user@hadoop.apache.org *Sent:* Wednesday, August 7, 2013 10:03 AM *Subject:* Re: Extra start-up overhead with hadoop-2.1.0-beta Hi Omkar, Can you please see if you can answer my question with this info or if you need anything else from me? Also, does resource localization improve or impact any performance? Thanks, Kishore On Thu, Aug 1, 2013 at 11:20 PM, Omkar Joshi ojo...@hortonworks.comwrote: How are you making these measurements can you elaborate more? Is it on a best case basis or on an average or worst case? How many resources are you sending it for localization? were the sizes and number of these resources consistent across tests? Were these resources public/private/application specific? Apart from this is the other load on node manager same? is the load on hdfs same? did you see any network bottleneck? More information will help a lot. Thanks, Omkar Joshi *Hortonworks Inc.* http://www.hortonworks.com/ On Thu, Aug 1, 2013 at 2:19 AM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi, Please share with me if you anyone has an answer or clues to my question regarding the start up performance. Also, one more thing I have observed today is the time taken to run a command on a container went up by more than a second in this latest version. When using 2.0.4-alpha, it used to take 0.3 to 0.5 seconds from the point I call startContainer() to the point the command is started on the container. where as When using 2.1.0-beta, it is taking around 1.5 seconds from the point it came to the call back onContainerStarted() to the point the command is seen started running on the container. Thanks, Kishore On Thu, Jul 25, 2013 at 8:38 PM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi, I have been using the hadoop-2.0.1-beta release candidate and observed that it is slower in running my simple application that runs on 2 containers. I have tried to find out which parts of it is really having this extra overhead(compared to hadoop-2.0.4-alpha), and here is what I found that. 1) From the point my Client has submitted the Application Master to RM, it is taking 2 seconds extra 2) From the point my container request are set up by Application Master, till the containers are allocated, it is taking 2 seconds extra Is this overhead expected with the changes that went into the new version? Or is there to improve it by changing something in configurations or so? Thanks, Kishore
Re: setLocalResources() on ContainerLaunchContext
Hi Omkar, I will try that. I might have got 2 of '/' wrongly while trying it in different ways to make it work. The file kishore/kk.ksh is accessible to the same user that is running the AM container. And my another questions is to understand what are the exact benefits of using this resource localization? Can you please explain me briefly or point me some online documentation talking about it? Thanks, Kishore On Wed, Aug 7, 2013 at 11:49 PM, Omkar Joshi ojo...@hortonworks.com wrote: Good that your timestamp worked... Now for hdfs try this hdfs://hdfs-host-name:hdfs-host-portabsolute-path now verify that your absolute path is correct. I hope it will work. bin/hadoop fs -ls absolute-path hdfs://isredeng:8020*//*kishore/kk.ksh... why // ?? you have hdfs file at absolute location /kishore/kk.sh? is /kishore and /kishore/kk.sh accessible to the user who is making startContainer call or the one running AM container? Thanks, Omkar Joshi *Hortonworks Inc.* http://www.hortonworks.com On Tue, Aug 6, 2013 at 10:43 PM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi Harsh, Hitesh Omkar, Thanks for the replies. I tried getting the last modified timestamp like this and it works. Is this a right thing to do? File file = new File(/home_/dsadm/kishore/kk.ksh); shellRsrc.setTimestamp(file.lastModified()); And, when I tried using a hdfs file qualifying it with both node name and port, it didn't work, I get a similar error as earlier. String shellScriptPath = hdfs://isredeng:8020//kishore/kk.ksh; 13/08/07 01:36:28 INFO ApplicationMaster: Got container status for containerID= container_1375853431091_0005_01_02, state=COMPLETE, exitStatus=-1000, diagnostics=File does not exist: hdfs://isredeng:8020/kishore/kk.ksh 13/08/07 01:36:28 INFO ApplicationMaster: Got failure status for a container : -1000 On Wed, Aug 7, 2013 at 7:45 AM, Harsh J ha...@cloudera.com wrote: Thanks Hitesh! P.s. Port isn't a requirement (and with HA URIs, you shouldn't add a port), but isredeng has to be the authority component. On Wed, Aug 7, 2013 at 7:37 AM, Hitesh Shah hit...@apache.org wrote: @Krishna, your logs showed the file error for hdfs://isredeng/kishore/kk.ksh I am assuming you have tried dfs -ls /kishore/kk.ksh and confirmed that the file exists? Also the qualified path seems to be missing the namenode port. I need to go back and check if a path without the port works by assuming the default namenode port. @Harsh, adding a helper function seems like a good idea. Let me file a jira to have the above added to one of the helper/client libraries. thanks -- Hitesh On Aug 6, 2013, at 6:47 PM, Harsh J wrote: It is kinda unnecessary to be asking developers to load in timestamps and length themselves. Why not provide a java.io.File, or perhaps a Path accepting API, that gets it automatically on their behalf using the FileSystem API internally? P.s. A HDFS file gave him a FNF, while a Local file gave him a proper TS/Len error. I'm guessing there's a bug here w.r.t. handling HDFS paths. On Wed, Aug 7, 2013 at 12:35 AM, Hitesh Shah hit...@apache.org wrote: Hi Krishna, YARN downloads a specified local resource on the container's node from the url specified. In all situtations, the remote url needs to be a fully qualified path. To verify that the file at the remote url is still valid, YARN expects you to provide the length and last modified timestamp of that file. If you use an hdfs path such as hdfs://namenode:port/absolute path to file, you will need to get the length and timestamp from HDFS. If you use file:///, the file should exist on all nodes and all nodes should have the file with the same length and timestamp for localization to work. ( For a single node setup, this works but tougher to get right on a multi-node setup - deploying the file via a rpm should likely work). -- Hitesh On Aug 6, 2013, at 11:11 AM, Omkar Joshi wrote: Hi, You need to match the timestamp. Probably get the timestamp locally before adding it. This is explicitly done to ensure that file is not updated after user makes the call to avoid possible errors. Thanks, Omkar Joshi Hortonworks Inc. On Tue, Aug 6, 2013 at 5:25 AM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: I tried the following and it works! String shellScriptPath = file:///home_/dsadm/kishore/kk.ksh; But now getting a timestamp error like below, when I passed 0 to setTimestamp() 13/08/06 08:23:48 INFO ApplicationMaster: Got container status for containerID= container_1375784329048_0017_01_02, state=COMPLETE, exitStatus=-1000, diagnostics=Resource file:/home_/dsadm/kishore/kk.ksh changed on src filesystem (expected 0, was 136758058 On Tue, Aug 6, 2013 at 5:24 PM, Harsh J ha...@cloudera.com wrote: Can you try passing a fully qualified local path? That is, including
Re: setLocalResources() on ContainerLaunchContext
Hi Harsh, The setResource() call on LocalResource() is expecting an argument of type org.apache.hadoop.yarn.api.records.URL which is converted from a string in the form of URI. This happens in the following call of Distributed Shell example, shellRsrc.setResource(ConverterUtils.getYarnUrlFromURI(new URI( shellScriptPath))); So, if I give a local file I get a parsing error like below, which is when I changed it to an HDFS file thinking that it should be given like that only. Could you please give an example of how else it could be used, using a local file as you are saying? 2013-08-06 06:23:12,942 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Failed to parse resource-request java.net.URISyntaxException: Expected scheme name at index 0: :///home_/dsadm/kishore/kk.ksh at java.net.URI$Parser.fail(URI.java:2820) at java.net.URI$Parser.failExpecting(URI.java:2826) at java.net.URI$Parser.parse(URI.java:3015) at java.net.URI.init(URI.java:747) at org.apache.hadoop.yarn.util.ConverterUtils.getPathFromYarnURL(ConverterUtils.java:77) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourceRequest.init(LocalResourceRequest.java:46) On Tue, Aug 6, 2013 at 3:36 PM, Harsh J ha...@cloudera.com wrote: To be honest, I've never tried loading a HDFS file onto the LocalResource this way. I usually just pass a local file and that works just fine. There may be something in the URI transformation possibly breaking a HDFS source, but try passing a local file - does that fail too? The Shell example uses a local file. On Tue, Aug 6, 2013 at 10:54 AM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi Harsh, Please see if this is useful, I got a stack trace after the error has occurred 2013-08-06 00:55:30,559 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: CWD set to /tmp/nm-local-dir/usercache/dsadm/appcache/application_1375716148174_0004 = file:/tmp/nm-local-dir/usercache/dsadm/appcache/application_1375716148174_0004 2013-08-06 00:55:31,017 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:dsadm (auth:SIMPLE) cause:java.io.FileNotFoundException: File does not exist: hdfs://isredeng/kishore/kk.ksh 2013-08-06 00:55:31,029 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: DEBUG: FAILED { hdfs://isredeng/kishore/kk.ksh, 0, FILE, null }, File does not exist: hdfs://isredeng/kishore/kk.ksh 2013-08-06 00:55:31,031 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://isredeng/kishore/kk.ksh transitioned from DOWNLOADING to FAILED 2013-08-06 00:55:31,034 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1375716148174_0004_01_02 transitioned from LOCALIZING to LOCALIZATION_FAILED 2013-08-06 00:55:31,035 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl: Container container_1375716148174_0004_01_02 sent RELEASE event on a resource request { hdfs://isredeng/kishore/kk.ksh, 0, FILE, null } not present in cache. 2013-08-06 00:55:31,036 WARN org.apache.hadoop.ipc.Client: interrupted waiting to send rpc request to server java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1290) at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:229) at java.util.concurrent.FutureTask.get(FutureTask.java:94) at org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:930) at org.apache.hadoop.ipc.Client.call(Client.java:1285) at org.apache.hadoop.ipc.Client.call(Client.java:1264) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at $Proxy22.heartbeat(Unknown Source) at org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.client.LocalizationProtocolPBClientImpl.heartbeat(LocalizationProtocolPBClientImpl.java:62) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:249) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:163) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:106) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:979) And here is my code snippet: ContainerLaunchContext ctx = Records.newRecord
Re: setLocalResources() on ContainerLaunchContext
I tried the following and it works! String shellScriptPath = file:///home_/dsadm/kishore/kk.ksh; But now getting a timestamp error like below, when I passed 0 to setTimestamp() 13/08/06 08:23:48 INFO ApplicationMaster: Got container status for containerID= container_1375784329048_0017_01_02, state=COMPLETE, exitStatus=-1000, diagnostics=Resource file:/home_/dsadm/kishore/kk.ksh changed on src filesystem (expected 0, was 136758058 On Tue, Aug 6, 2013 at 5:24 PM, Harsh J ha...@cloudera.com wrote: Can you try passing a fully qualified local path? That is, including the file:/ scheme On Aug 6, 2013 4:05 PM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi Harsh, The setResource() call on LocalResource() is expecting an argument of type org.apache.hadoop.yarn.api.records.URL which is converted from a string in the form of URI. This happens in the following call of Distributed Shell example, shellRsrc.setResource(ConverterUtils.getYarnUrlFromURI(new URI( shellScriptPath))); So, if I give a local file I get a parsing error like below, which is when I changed it to an HDFS file thinking that it should be given like that only. Could you please give an example of how else it could be used, using a local file as you are saying? 2013-08-06 06:23:12,942 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Failed to parse resource-request java.net.URISyntaxException: Expected scheme name at index 0: :///home_/dsadm/kishore/kk.ksh at java.net.URI$Parser.fail(URI.java:2820) at java.net.URI$Parser.failExpecting(URI.java:2826) at java.net.URI$Parser.parse(URI.java:3015) at java.net.URI.init(URI.java:747) at org.apache.hadoop.yarn.util.ConverterUtils.getPathFromYarnURL(ConverterUtils.java:77) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourceRequest.init(LocalResourceRequest.java:46) On Tue, Aug 6, 2013 at 3:36 PM, Harsh J ha...@cloudera.com wrote: To be honest, I've never tried loading a HDFS file onto the LocalResource this way. I usually just pass a local file and that works just fine. There may be something in the URI transformation possibly breaking a HDFS source, but try passing a local file - does that fail too? The Shell example uses a local file. On Tue, Aug 6, 2013 at 10:54 AM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi Harsh, Please see if this is useful, I got a stack trace after the error has occurred 2013-08-06 00:55:30,559 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: CWD set to /tmp/nm-local-dir/usercache/dsadm/appcache/application_1375716148174_0004 = file:/tmp/nm-local-dir/usercache/dsadm/appcache/application_1375716148174_0004 2013-08-06 00:55:31,017 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:dsadm (auth:SIMPLE) cause:java.io.FileNotFoundException: File does not exist: hdfs://isredeng/kishore/kk.ksh 2013-08-06 00:55:31,029 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: DEBUG: FAILED { hdfs://isredeng/kishore/kk.ksh, 0, FILE, null }, File does not exist: hdfs://isredeng/kishore/kk.ksh 2013-08-06 00:55:31,031 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://isredeng/kishore/kk.ksh transitioned from DOWNLOADING to FAILED 2013-08-06 00:55:31,034 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1375716148174_0004_01_02 transitioned from LOCALIZING to LOCALIZATION_FAILED 2013-08-06 00:55:31,035 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl: Container container_1375716148174_0004_01_02 sent RELEASE event on a resource request { hdfs://isredeng/kishore/kk.ksh, 0, FILE, null } not present in cache. 2013-08-06 00:55:31,036 WARN org.apache.hadoop.ipc.Client: interrupted waiting to send rpc request to server java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1290) at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:229) at java.util.concurrent.FutureTask.get(FutureTask.java:94) at org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:930) at org.apache.hadoop.ipc.Client.call(Client.java:1285) at org.apache.hadoop.ipc.Client.call(Client.java:1264) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at $Proxy22.heartbeat(Unknown Source) at org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.client.LocalizationProtocolPBClientImpl.heartbeat(LocalizationProtocolPBClientImpl.java:62
Re: setLocalResources() on ContainerLaunchContext
Hi Harsh, Hitesh Omkar, Thanks for the replies. I tried getting the last modified timestamp like this and it works. Is this a right thing to do? File file = new File(/home_/dsadm/kishore/kk.ksh); shellRsrc.setTimestamp(file.lastModified()); And, when I tried using a hdfs file qualifying it with both node name and port, it didn't work, I get a similar error as earlier. String shellScriptPath = hdfs://isredeng:8020//kishore/kk.ksh; 13/08/07 01:36:28 INFO ApplicationMaster: Got container status for containerID= container_1375853431091_0005_01_02, state=COMPLETE, exitStatus=-1000, diagnostics=File does not exist: hdfs://isredeng:8020/kishore/kk.ksh 13/08/07 01:36:28 INFO ApplicationMaster: Got failure status for a container : -1000 On Wed, Aug 7, 2013 at 7:45 AM, Harsh J ha...@cloudera.com wrote: Thanks Hitesh! P.s. Port isn't a requirement (and with HA URIs, you shouldn't add a port), but isredeng has to be the authority component. On Wed, Aug 7, 2013 at 7:37 AM, Hitesh Shah hit...@apache.org wrote: @Krishna, your logs showed the file error for hdfs://isredeng/kishore/kk.ksh I am assuming you have tried dfs -ls /kishore/kk.ksh and confirmed that the file exists? Also the qualified path seems to be missing the namenode port. I need to go back and check if a path without the port works by assuming the default namenode port. @Harsh, adding a helper function seems like a good idea. Let me file a jira to have the above added to one of the helper/client libraries. thanks -- Hitesh On Aug 6, 2013, at 6:47 PM, Harsh J wrote: It is kinda unnecessary to be asking developers to load in timestamps and length themselves. Why not provide a java.io.File, or perhaps a Path accepting API, that gets it automatically on their behalf using the FileSystem API internally? P.s. A HDFS file gave him a FNF, while a Local file gave him a proper TS/Len error. I'm guessing there's a bug here w.r.t. handling HDFS paths. On Wed, Aug 7, 2013 at 12:35 AM, Hitesh Shah hit...@apache.org wrote: Hi Krishna, YARN downloads a specified local resource on the container's node from the url specified. In all situtations, the remote url needs to be a fully qualified path. To verify that the file at the remote url is still valid, YARN expects you to provide the length and last modified timestamp of that file. If you use an hdfs path such as hdfs://namenode:port/absolute path to file, you will need to get the length and timestamp from HDFS. If you use file:///, the file should exist on all nodes and all nodes should have the file with the same length and timestamp for localization to work. ( For a single node setup, this works but tougher to get right on a multi-node setup - deploying the file via a rpm should likely work). -- Hitesh On Aug 6, 2013, at 11:11 AM, Omkar Joshi wrote: Hi, You need to match the timestamp. Probably get the timestamp locally before adding it. This is explicitly done to ensure that file is not updated after user makes the call to avoid possible errors. Thanks, Omkar Joshi Hortonworks Inc. On Tue, Aug 6, 2013 at 5:25 AM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: I tried the following and it works! String shellScriptPath = file:///home_/dsadm/kishore/kk.ksh; But now getting a timestamp error like below, when I passed 0 to setTimestamp() 13/08/06 08:23:48 INFO ApplicationMaster: Got container status for containerID= container_1375784329048_0017_01_02, state=COMPLETE, exitStatus=-1000, diagnostics=Resource file:/home_/dsadm/kishore/kk.ksh changed on src filesystem (expected 0, was 136758058 On Tue, Aug 6, 2013 at 5:24 PM, Harsh J ha...@cloudera.com wrote: Can you try passing a fully qualified local path? That is, including the file:/ scheme On Aug 6, 2013 4:05 PM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi Harsh, The setResource() call on LocalResource() is expecting an argument of type org.apache.hadoop.yarn.api.records.URL which is converted from a string in the form of URI. This happens in the following call of Distributed Shell example, shellRsrc.setResource(ConverterUtils.getYarnUrlFromURI(new URI( shellScriptPath))); So, if I give a local file I get a parsing error like below, which is when I changed it to an HDFS file thinking that it should be given like that only. Could you please give an example of how else it could be used, using a local file as you are saying? 2013-08-06 06:23:12,942 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Failed to parse resource-request java.net.URISyntaxException: Expected scheme name at index 0: :///home_/dsadm/kishore/kk.ksh at java.net.URI$Parser.fail(URI.java:2820) at java.net.URI$Parser.failExpecting(URI.java:2826) at java.net.URI$Parser.parse(URI.java:3015
setLocalResources() on ContainerLaunchContext
Hi, Can someone please tell me what is the use of calling setLocalResources() on ContainerLaunchContext? And, also an example of how to use this will help... I couldn't guess what is the String in the map that is passed to setLocalResources() like below: // Set the local resources MapString, LocalResource localResources = new HashMapString, LocalResource(); Thanks, Kishore
Re: setLocalResources() on ContainerLaunchContext
Hi Harsh, Thanks for the quick and detailed reply, it really helps. I am trying to use it and getting this error in node manager's log: 2013-08-05 08:57:28,867 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:dsadm (auth:SIMPLE) cause:java.io.FileNotFoundException: File does not exist: hdfs://isredeng/kishore/kk.ksh This file is there on the machine with name isredeng, I could do ls for that file as below: -bash-4.1$ hadoop fs -ls kishore/kk.ksh 13/08/05 09:01:03 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 1 items -rw-r--r-- 3 dsadm supergroup 1046 2013-08-05 08:48 kishore/kk.ksh Note: I am using a single node cluster Thanks, Kishore On Mon, Aug 5, 2013 at 3:00 PM, Harsh J ha...@cloudera.com wrote: The string for each LocalResource in the map can be anything that serves as a common identifier name for your application. At execution time, the passed resource filename will be aliased to the name you've mapped it to, so that the application code need not track special names. The behavior is very similar to how you can, in MR, define a symlink name for a DistributedCache entry (e.g. foo.jar#bar.jar). For an example, checkout the DistributedShell app sources. Over [1], you can see we take a user provided file path to a shell script. This can be named anything as it is user-supplied. Onto [2], we define this as a local resource [2.1] and embed it with a different name (the string you ask about) [2.2], as defined at [3] as an application reference-able constant. Note that in [4], we add to the Container arguments the aliased name we mapped it to (i.e. [3]) and not the original filename we received from the user. The resource is placed on the container with this name instead, so thats what we choose to execute. [1] - https://github.com/apache/hadoop-common/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java#L390 [2] - [2.1] https://github.com/apache/hadoop-common/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java#L764 and [2.2] https://github.com/apache/hadoop-common/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java#L780 [3] - https://github.com/apache/hadoop-common/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java#L205 [4] - https://github.com/apache/hadoop-common/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java#L791 On Mon, Aug 5, 2013 at 2:44 PM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi, Can someone please tell me what is the use of calling setLocalResources() on ContainerLaunchContext? And, also an example of how to use this will help... I couldn't guess what is the String in the map that is passed to setLocalResources() like below: // Set the local resources MapString, LocalResource localResources = new HashMapString, LocalResource(); Thanks, Kishore -- Harsh J
Re: setLocalResources() on ContainerLaunchContext
Hi Harsh, Please see if this is useful, I got a stack trace after the error has occurred 2013-08-06 00:55:30,559 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: CWD set to /tmp/nm-local-dir/usercache/dsadm/appcache/application_1375716148174_0004 = file:/tmp/nm-local-dir/usercache/dsadm/appcache/application_1375716148174_0004 2013-08-06 00:55:31,017 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:dsadm (auth:SIMPLE) cause:java.io.FileNotFoundException: File does not exist: hdfs://isredeng/kishore/kk.ksh 2013-08-06 00:55:31,029 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: DEBUG: FAILED { hdfs://isredeng/kishore/kk.ksh, 0, FILE, null }, File does not exist: hdfs://isredeng/kishore/kk.ksh 2013-08-06 00:55:31,031 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://isredeng/kishore/kk.ksh transitioned from DOWNLOADING to FAILED 2013-08-06 00:55:31,034 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1375716148174_0004_01_02 transitioned from LOCALIZING to LOCALIZATION_FAILED 2013-08-06 00:55:31,035 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl: Container container_1375716148174_0004_01_02 sent RELEASE event on a resource request { hdfs://isredeng/kishore/kk.ksh, 0, FILE, null } not present in cache. 2013-08-06 00:55:31,036 WARN org.apache.hadoop.ipc.Client: interrupted waiting to send rpc request to server java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1290) at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:229) at java.util.concurrent.FutureTask.get(FutureTask.java:94) at org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:930) at org.apache.hadoop.ipc.Client.call(Client.java:1285) at org.apache.hadoop.ipc.Client.call(Client.java:1264) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at $Proxy22.heartbeat(Unknown Source) at org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.client.LocalizationProtocolPBClientImpl.heartbeat(LocalizationProtocolPBClientImpl.java:62) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:249) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:163) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:106) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:979) And here is my code snippet: ContainerLaunchContext ctx = Records.newRecord(ContainerLaunchContext.class); ctx.setEnvironment(oshEnv); // Set the local resources MapString, LocalResource localResources = new HashMapString, LocalResource(); LocalResource shellRsrc = Records.newRecord(LocalResource.class); shellRsrc.setType(LocalResourceType.FILE); shellRsrc.setVisibility(LocalResourceVisibility.APPLICATION); String shellScriptPath = hdfs://isredeng//kishore/kk.ksh; try { shellRsrc.setResource(ConverterUtils.getYarnUrlFromURI(new URI(shellScriptPath))); } catch (URISyntaxException e) { LOG.error(Error when trying to use shell script path specified + in env, path= + shellScriptPath); e.printStackTrace(); } shellRsrc.setTimestamp(0/*shellScriptPathTimestamp*/); shellRsrc.setSize(0/*shellScriptPathLen*/); String ExecShellStringPath = ExecShellScript.sh; localResources.put(ExecShellStringPath, shellRsrc); ctx.setLocalResources(localResources); Please let me know if you need anything else. Thanks, Kishore On Tue, Aug 6, 2013 at 12:05 AM, Harsh J ha...@cloudera.com wrote: The detail is insufficient to answer why. You should also have gotten a trace after it, can you post that? If possible, also the relevant snippets of code. On Mon, Aug 5, 2013 at 6:36 PM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi Harsh, Thanks for the quick and detailed reply, it really helps. I am trying to use it and getting this error in node manager's log: 2013-08-05 08:57:28,867 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:dsadm (auth:SIMPLE) cause:java.io.FileNotFoundException: File does not exist: hdfs://isredeng/kishore/kk.ksh This file is there on the machine with name isredeng, I could do ls for that file as below: -bash-4.1$ hadoop fs -ls
Re: Extra start-up overhead with hadoop-2.1.0-beta
Hi Omkar, I have got these number by running a simple C program on the containers that fetches the timestamp in microseconds and exits. The times mentioned are low and high, they are not varying so drastically with in a version but there are huge differences(like a second) between the two versions, 2.0.4-alpha and 2.1.0-beta as I mentioned. I am using a single node cluster, and all there is absolutely no other load on the machine/node. My single node cluster is just used for my own development work, and testing. I am not aware of what is resource localization, I am not doing anything specially for that. Please let me know if you need any other info. Thanks, Kishore On Thu, Aug 1, 2013 at 11:20 PM, Omkar Joshi ojo...@hortonworks.com wrote: How are you making these measurements can you elaborate more? Is it on a best case basis or on an average or worst case? How many resources are you sending it for localization? were the sizes and number of these resources consistent across tests? Were these resources public/private/application specific? Apart from this is the other load on node manager same? is the load on hdfs same? did you see any network bottleneck? More information will help a lot. Thanks, Omkar Joshi *Hortonworks Inc.* http://www.hortonworks.com On Thu, Aug 1, 2013 at 2:19 AM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi, Please share with me if you anyone has an answer or clues to my question regarding the start up performance. Also, one more thing I have observed today is the time taken to run a command on a container went up by more than a second in this latest version. When using 2.0.4-alpha, it used to take 0.3 to 0.5 seconds from the point I call startContainer() to the point the command is started on the container. where as When using 2.1.0-beta, it is taking around 1.5 seconds from the point it came to the call back onContainerStarted() to the point the command is seen started running on the container. Thanks, Kishore On Thu, Jul 25, 2013 at 8:38 PM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi, I have been using the hadoop-2.0.1-beta release candidate and observed that it is slower in running my simple application that runs on 2 containers. I have tried to find out which parts of it is really having this extra overhead(compared to hadoop-2.0.4-alpha), and here is what I found that. 1) From the point my Client has submitted the Application Master to RM, it is taking 2 seconds extra 2) From the point my container request are set up by Application Master, till the containers are allocated, it is taking 2 seconds extra Is this overhead expected with the changes that went into the new version? Or is there to improve it by changing something in configurations or so? Thanks, Kishore
Re: Extra start-up overhead with hadoop-2.1.0-beta
Hi, Please share with me if you anyone has an answer or clues to my question regarding the start up performance. Also, one more thing I have observed today is the time taken to run a command on a container went up by more than a second in this latest version. When using 2.0.4-alpha, it used to take 0.3 to 0.5 seconds from the point I call startContainer() to the point the command is started on the container. where as When using 2.1.0-beta, it is taking around 1.5 seconds from the point it came to the call back onContainerStarted() to the point the command is seen started running on the container. Thanks, Kishore On Thu, Jul 25, 2013 at 8:38 PM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi, I have been using the hadoop-2.0.1-beta release candidate and observed that it is slower in running my simple application that runs on 2 containers. I have tried to find out which parts of it is really having this extra overhead(compared to hadoop-2.0.4-alpha), and here is what I found that. 1) From the point my Client has submitted the Application Master to RM, it is taking 2 seconds extra 2) From the point my container request are set up by Application Master, till the containers are allocated, it is taking 2 seconds extra Is this overhead expected with the changes that went into the new version? Or is there to improve it by changing something in configurations or so? Thanks, Kishore
Re: Node manager crashing when running an app requiring 100 containers on hadoop-2.1.0-beta RC0
Hi Arun, I was running on a single node cluster, so all my 100+ containers are on single node. And, the problem is gone when I increased YARN_HEAP_SIZE to 2GB. Thanks, Kishore On Thu, Aug 1, 2013 at 5:01 AM, Arun C Murthy a...@hortonworks.com wrote: How many containers are you running per node? On Jul 25, 2013, at 5:21 AM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi Devaraj, I used to run this application with the same number of containers successfully on previous version, i.e. hadoop-2.0.4-alpha. Is it failing with the new version, because YARN itself is also adding some more threads than the previous versions? Thanks, Kishore On Thu, Jul 25, 2013 at 4:24 PM, Devaraj k devara...@huawei.com wrote: Hi Kishore, ** ** It seems that system doesn’t have enough resources to launch a new thread. ** ** Could you check the system is affordable to launch the configured containers and try increasing the native memory available in the system by reducing the no of running processes in the system. ** ** Thanks Devaraj k ** ** *From:* Krishna Kishore Bonagiri [mailto:write2kish...@gmail.com] *Sent:* 25 July 2013 16:09 *To:* user@hadoop.apache.org *Subject:* Node manager crashing when running an app requiring 100 containers on hadoop-2.1.0-beta RC0 ** ** Hi, ** ** I am running an application against hadoop-2.1.0-beta RC, and my app requires 117 containers, I have got all the containers allocated, but while starting those containers, at around 99th container the node manager has gone down with the following kind of error in it's log. Also, I could reproduce this error running a sleep 200; date command using the Distributed Shell example, in which case I got this error at around 66th container. ** ** ** ** 2013-07-25 06:07:17,743 FATAL org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread Thread[process reaper,5,main] threw an Error. Shutting down now... java.lang.OutOfMemoryError: Failed to create a thread: retVal -1073741830, errno 11 at java.lang.Thread.startImpl(Native Method) at java.lang.Thread.start(Thread.java:887) at java.lang.ProcessInputStream.init(UNIXProcess.java:472) at java.lang.UNIXProcess$1$1$1.run(UNIXProcess.java:157) at java.security.AccessController.doPrivileged(AccessController.java:202)*** * at java.lang.UNIXProcess$1$1.run(UNIXProcess.java:137) 2013-07-25 06:07:17,745 INFO org.apache.hadoop.util.ExitUtil: Halt with status -1 Message: HaltException ** ** Thanks, Kishore -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/
Node manager crashing when running an app requiring 100 containers on hadoop-2.1.0-beta RC0
Hi, I am running an application against hadoop-2.1.0-beta RC, and my app requires 117 containers, I have got all the containers allocated, but while starting those containers, at around 99th container the node manager has gone down with the following kind of error in it's log. Also, I could reproduce this error running a sleep 200; date command using the Distributed Shell example, in which case I got this error at around 66th container. 2013-07-25 06:07:17,743 FATAL org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread Thread[process reaper,5,main] threw an Error. Shutting down now... java.lang.OutOfMemoryError: Failed to create a thread: retVal -1073741830, errno 11 at java.lang.Thread.startImpl(Native Method) at java.lang.Thread.start(Thread.java:887) at java.lang.ProcessInputStream.init(UNIXProcess.java:472) at java.lang.UNIXProcess$1$1$1.run(UNIXProcess.java:157) at java.security.AccessController.doPrivileged(AccessController.java:202) at java.lang.UNIXProcess$1$1.run(UNIXProcess.java:137) 2013-07-25 06:07:17,745 INFO org.apache.hadoop.util.ExitUtil: Halt with status -1 Message: HaltException Thanks, Kishore
Re: Node manager crashing when running an app requiring 100 containers on hadoop-2.1.0-beta RC0
Hi Devaraj, I used to run this application with the same number of containers successfully on previous version, i.e. hadoop-2.0.4-alpha. Is it failing with the new version, because YARN itself is also adding some more threads than the previous versions? Thanks, Kishore On Thu, Jul 25, 2013 at 4:24 PM, Devaraj k devara...@huawei.com wrote: Hi Kishore, ** ** It seems that system doesn’t have enough resources to launch a new thread. ** ** Could you check the system is affordable to launch the configured containers and try increasing the native memory available in the system by reducing the no of running processes in the system. ** ** Thanks Devaraj k ** ** *From:* Krishna Kishore Bonagiri [mailto:write2kish...@gmail.com] *Sent:* 25 July 2013 16:09 *To:* user@hadoop.apache.org *Subject:* Node manager crashing when running an app requiring 100 containers on hadoop-2.1.0-beta RC0 ** ** Hi, ** ** I am running an application against hadoop-2.1.0-beta RC, and my app requires 117 containers, I have got all the containers allocated, but while starting those containers, at around 99th container the node manager has gone down with the following kind of error in it's log. Also, I could reproduce this error running a sleep 200; date command using the Distributed Shell example, in which case I got this error at around 66th container. ** ** ** ** 2013-07-25 06:07:17,743 FATAL org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread Thread[process reaper,5,main] threw an Error. Shutting down now... java.lang.OutOfMemoryError: Failed to create a thread: retVal -1073741830, errno 11 at java.lang.Thread.startImpl(Native Method) at java.lang.Thread.start(Thread.java:887) at java.lang.ProcessInputStream.init(UNIXProcess.java:472) at java.lang.UNIXProcess$1$1$1.run(UNIXProcess.java:157) at java.security.AccessController.doPrivileged(AccessController.java:202) at java.lang.UNIXProcess$1$1.run(UNIXProcess.java:137) 2013-07-25 06:07:17,745 INFO org.apache.hadoop.util.ExitUtil: Halt with status -1 Message: HaltException ** ** Thanks, Kishore
Extra start-up overhead with hadoop-2.1.0-beta
Hi, I have been using the hadoop-2.0.1-beta release candidate and observed that it is slower in running my simple application that runs on 2 containers. I have tried to find out which parts of it is really having this extra overhead(compared to hadoop-2.0.4-alpha), and here is what I found that. 1) From the point my Client has submitted the Application Master to RM, it is taking 2 seconds extra 2) From the point my container request are set up by Application Master, till the containers are allocated, it is taking 2 seconds extra Is this overhead expected with the changes that went into the new version? Or is there to improve it by changing something in configurations or so? Thanks, Kishore
Re: Namenode automatically going to safemode with 2.1.0-beta
Hi Harsh, I have made my dfs.namenode.name.dir point to a subdirectory of my home, and I don't see this issue again. So, is this a bug that we need to log into JIRA? Thanks, Kishore On Tue, Jul 16, 2013 at 6:39 AM, Harsh J ha...@cloudera.com wrote: 2013-07-12 11:04:26,002 WARN org.apache.hadoop.hdfs.server.namenode.NameNodeResourceChecker: Space available on volume 'null' is 0, which is below the configured reserved amount 104857600 This is interesting. Its calling your volume null, which may be more of a superficial bug. What is your dfs.namenode.name.dir set to? From /tmp/hadoop-dsadm/dfs/name I'd expect you haven't set it up and /tmp is being used off of the out-of-box defaults. Could you try to set it to a specific directory thats not on /tmp? On Mon, Jul 15, 2013 at 2:43 PM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: I don't have it in my hdfs-site.xml, in which case probably the default value is taken.. On Mon, Jul 15, 2013 at 2:29 PM, Azuryy Yu azury...@gmail.com wrote: please check dfs.datanode.du.reserved in the hdfs-site.xml On Jul 15, 2013 4:30 PM, Aditya exalter adityaexal...@gmail.com wrote: Hi Krishna, Can you please send screenshots of namenode web UI. Thanks Aditya. On Mon, Jul 15, 2013 at 1:54 PM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: I have had enough space on the disk that is used, like around 30 Gigs Thanks, Kishore On Mon, Jul 15, 2013 at 1:30 PM, Venkatarami Netla venkatarami.ne...@cloudwick.com wrote: Hi, pls see the available space for NN storage directory. Thanks Regards Venkat On Mon, Jul 15, 2013 at 12:14 PM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi, I am doing no activity on my single node cluster which is using 2.1.0-beta, and still observed that it has gone to safe mode by itself after a while. I was looking at the name node log and see many of these kinds of entries.. Can anything be interpreted from these? 2013-07-12 09:06:11,256 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at 561 2013-07-12 09:07:11,290 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from 9.70.137.114 2013-07-12 09:07:11,290 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Rolling edit logs 2013-07-12 09:07:11,290 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Ending log segment 561 2013-07-12 09:07:11,291 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2 Total time for transactions(ms): 1 Number of transactions batched in Syncs: 0 Number of syncs: 2 SyncTimes(ms): 14 2013-07-12 09:07:11,292 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2 Total time for transactions(ms): 1 Number of transactions batched in Syncs: 0 Number of syncs: 3 SyncTimes(ms): 15 2013-07-12 09:07:11,293 INFO org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Finalizing edits file /tmp/hadoop-dsadm/dfs/name/current/edits_inprogress_561 - /tmp/hadoop-dsadm/dfs/name/current/edits_561-562 2013-07-12 09:07:11,294 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at 563 2013-07-12 09:08:11,397 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from 9.70.137.114 2013-07-12 09:08:11,398 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Rolling edit logs 2013-07-12 09:08:11,398 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Ending log segment 563 2013-07-12 09:08:11,399 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2 Total time for transactions(ms): 2 Number of transactions batched in Syncs: 0 Number of syncs: 2 SyncTimes(ms): 11 2013-07-12 09:08:11,400 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2 Total time for transactions(ms): 2 Number of transactions batched in Syncs: 0 Number of syncs: 3 SyncTimes(ms): 12 2013-07-12 09:08:11,402 INFO org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Finalizing edits file /tmp/hadoop-dsadm/dfs/name/current/edits_inprogress_563 - /tmp/hadoop-dsadm/dfs/name/current/edits_563-564 2013-07-12 09:08:11,402 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at 565 2013-07-12 09:09:11,440 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from 9.70.137.114 2013-07-12 09:09:11,440 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Rolling edit logs 2013-07-12 09:09:11,440 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Ending log segment 565 2013-07-12 09:09:11,440 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2 Total time for transactions(ms): 0
Re: Namenode automatically going to safemode with 2.1.0-beta
Yes Harsh, I haven't set dfs.namenode.name.dir anywhere in config files. My name node has again gone into safe mode today while it was idle. I shall try setting this value to something other than /tmp On Tue, Jul 16, 2013 at 6:39 AM, Harsh J ha...@cloudera.com wrote: 2013-07-12 11:04:26,002 WARN org.apache.hadoop.hdfs.server.namenode.NameNodeResourceChecker: Space available on volume 'null' is 0, which is below the configured reserved amount 104857600 This is interesting. Its calling your volume null, which may be more of a superficial bug. What is your dfs.namenode.name.dir set to? From /tmp/hadoop-dsadm/dfs/name I'd expect you haven't set it up and /tmp is being used off of the out-of-box defaults. Could you try to set it to a specific directory thats not on /tmp? On Mon, Jul 15, 2013 at 2:43 PM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: I don't have it in my hdfs-site.xml, in which case probably the default value is taken.. On Mon, Jul 15, 2013 at 2:29 PM, Azuryy Yu azury...@gmail.com wrote: please check dfs.datanode.du.reserved in the hdfs-site.xml On Jul 15, 2013 4:30 PM, Aditya exalter adityaexal...@gmail.com wrote: Hi Krishna, Can you please send screenshots of namenode web UI. Thanks Aditya. On Mon, Jul 15, 2013 at 1:54 PM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: I have had enough space on the disk that is used, like around 30 Gigs Thanks, Kishore On Mon, Jul 15, 2013 at 1:30 PM, Venkatarami Netla venkatarami.ne...@cloudwick.com wrote: Hi, pls see the available space for NN storage directory. Thanks Regards Venkat On Mon, Jul 15, 2013 at 12:14 PM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi, I am doing no activity on my single node cluster which is using 2.1.0-beta, and still observed that it has gone to safe mode by itself after a while. I was looking at the name node log and see many of these kinds of entries.. Can anything be interpreted from these? 2013-07-12 09:06:11,256 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at 561 2013-07-12 09:07:11,290 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from 9.70.137.114 2013-07-12 09:07:11,290 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Rolling edit logs 2013-07-12 09:07:11,290 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Ending log segment 561 2013-07-12 09:07:11,291 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2 Total time for transactions(ms): 1 Number of transactions batched in Syncs: 0 Number of syncs: 2 SyncTimes(ms): 14 2013-07-12 09:07:11,292 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2 Total time for transactions(ms): 1 Number of transactions batched in Syncs: 0 Number of syncs: 3 SyncTimes(ms): 15 2013-07-12 09:07:11,293 INFO org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Finalizing edits file /tmp/hadoop-dsadm/dfs/name/current/edits_inprogress_561 - /tmp/hadoop-dsadm/dfs/name/current/edits_561-562 2013-07-12 09:07:11,294 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at 563 2013-07-12 09:08:11,397 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from 9.70.137.114 2013-07-12 09:08:11,398 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Rolling edit logs 2013-07-12 09:08:11,398 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Ending log segment 563 2013-07-12 09:08:11,399 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2 Total time for transactions(ms): 2 Number of transactions batched in Syncs: 0 Number of syncs: 2 SyncTimes(ms): 11 2013-07-12 09:08:11,400 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2 Total time for transactions(ms): 2 Number of transactions batched in Syncs: 0 Number of syncs: 3 SyncTimes(ms): 12 2013-07-12 09:08:11,402 INFO org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Finalizing edits file /tmp/hadoop-dsadm/dfs/name/current/edits_inprogress_563 - /tmp/hadoop-dsadm/dfs/name/current/edits_563-564 2013-07-12 09:08:11,402 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at 565 2013-07-12 09:09:11,440 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from 9.70.137.114 2013-07-12 09:09:11,440 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Rolling edit logs 2013-07-12 09:09:11,440 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Ending log segment 565 2013-07-12 09:09:11,440 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2 Total time
Namenode automatically going to safemode with 2.1.0-beta
Hi, I am doing no activity on my single node cluster which is using 2.1.0-beta, and still observed that it has gone to safe mode by itself after a while. I was looking at the name node log and see many of these kinds of entries.. Can anything be interpreted from these? 2013-07-12 09:06:11,256 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at 561 2013-07-12 09:07:11,290 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from 9.70.137.114 2013-07-12 09:07:11,290 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Rolling edit logs 2013-07-12 09:07:11,290 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Ending log segment 561 2013-07-12 09:07:11,291 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2 Total time for transactions(ms): 1 Number of transactions batched in Syncs: 0 Number of syncs: 2 SyncTimes(ms): 14 2013-07-12 09:07:11,292 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2 Total time for transactions(ms): 1 Number of transactions batched in Syncs: 0 Number of syncs: 3 SyncTimes(ms): 15 2013-07-12 09:07:11,293 INFO org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Finalizing edits file /tmp/hadoop-dsadm/dfs/name/current/edits_inprogress_561 - /tmp/hadoop-dsadm/dfs/name/current/edits_561-562 2013-07-12 09:07:11,294 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at 563 2013-07-12 09:08:11,397 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from 9.70.137.114 2013-07-12 09:08:11,398 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Rolling edit logs 2013-07-12 09:08:11,398 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Ending log segment 563 2013-07-12 09:08:11,399 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2 Total time for transactions(ms): 2 Number of transactions batched in Syncs: 0 Number of syncs: 2 SyncTimes(ms): 11 2013-07-12 09:08:11,400 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2 Total time for transactions(ms): 2 Number of transactions batched in Syncs: 0 Number of syncs: 3 SyncTimes(ms): 12 2013-07-12 09:08:11,402 INFO org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Finalizing edits file /tmp/hadoop-dsadm/dfs/name/current/edits_inprogress_563 - /tmp/hadoop-dsadm/dfs/name/current/edits_563-564 2013-07-12 09:08:11,402 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at 565 2013-07-12 09:09:11,440 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from 9.70.137.114 2013-07-12 09:09:11,440 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Rolling edit logs 2013-07-12 09:09:11,440 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Ending log segment 565 2013-07-12 09:09:11,440 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2 Total time for transactions(ms): 0 Number of transactions batched in Syncs: 0 Number of syncs: 2 SyncTimes(ms): 13 2013-07-12 09:09:11,441 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2 Total time for transactions(ms): 0 Number of transactions batched in Syncs: 0 Number of syncs: 3 SyncTimes(ms): 13 And after sometime it said: 2013-07-12 11:03:19,799 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at 795 2013-07-12 11:04:19,826 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from 9.70.137.114 2013-07-12 11:04:19,826 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Rolling edit logs 2013-07-12 11:04:19,827 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Ending log segment 795 2013-07-12 11:04:19,827 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2 Total time for transactions(ms): 0 Number of transactions batched in Syncs: 0 Number of syncs: 2 SyncTimes(ms): 12 2013-07-12 11:04:19,827 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2 Total time for transactions(ms): 0 Number of transactions batched in Syncs: 0 Number of syncs: 3 SyncTimes(ms): 12 2013-07-12 11:04:19,829 INFO org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Finalizing edits file /tmp/hadoop-dsadm/dfs/name/current/edits_inprogress_795 - /tmp/hadoop-dsadm/dfs/name/current/edits_795-796 2013-07-12 11:04:19,829 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at 797 2013-07-12 11:04:26,002 WARN org.apache.hadoop.hdfs.server.namenode.NameNodeResourceChecker: Space available on volume 'null' is 0, which is below the configured reserved amount 104857600 2013-07-12 11:04:26,003 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: NameNode low on available disk space. Entering safe mode. 2013-07-12 11:04:26,004
Re: Namenode automatically going to safemode with 2.1.0-beta
I have had enough space on the disk that is used, like around 30 Gigs Thanks, Kishore On Mon, Jul 15, 2013 at 1:30 PM, Venkatarami Netla venkatarami.ne...@cloudwick.com wrote: Hi, pls see the available space for NN storage directory. Thanks Regards Venkat On Mon, Jul 15, 2013 at 12:14 PM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi, I am doing no activity on my single node cluster which is using 2.1.0-beta, and still observed that it has gone to safe mode by itself after a while. I was looking at the name node log and see many of these kinds of entries.. Can anything be interpreted from these? 2013-07-12 09:06:11,256 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at 561 2013-07-12 09:07:11,290 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from 9.70.137.114 2013-07-12 09:07:11,290 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Rolling edit logs 2013-07-12 09:07:11,290 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Ending log segment 561 2013-07-12 09:07:11,291 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2 Total time for transactions(ms): 1 Number of transactions batched in Syncs: 0 Number of syncs: 2 SyncTimes(ms): 14 2013-07-12 09:07:11,292 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2 Total time for transactions(ms): 1 Number of transactions batched in Syncs: 0 Number of syncs: 3 SyncTimes(ms): 15 2013-07-12 09:07:11,293 INFO org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Finalizing edits file /tmp/hadoop-dsadm/dfs/name/current/edits_inprogress_561 - /tmp/hadoop-dsadm/dfs/name/current/edits_561-562 2013-07-12 09:07:11,294 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at 563 2013-07-12 09:08:11,397 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from 9.70.137.114 2013-07-12 09:08:11,398 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Rolling edit logs 2013-07-12 09:08:11,398 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Ending log segment 563 2013-07-12 09:08:11,399 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2 Total time for transactions(ms): 2 Number of transactions batched in Syncs: 0 Number of syncs: 2 SyncTimes(ms): 11 2013-07-12 09:08:11,400 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2 Total time for transactions(ms): 2 Number of transactions batched in Syncs: 0 Number of syncs: 3 SyncTimes(ms): 12 2013-07-12 09:08:11,402 INFO org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Finalizing edits file /tmp/hadoop-dsadm/dfs/name/current/edits_inprogress_563 - /tmp/hadoop-dsadm/dfs/name/current/edits_563-564 2013-07-12 09:08:11,402 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at 565 2013-07-12 09:09:11,440 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from 9.70.137.114 2013-07-12 09:09:11,440 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Rolling edit logs 2013-07-12 09:09:11,440 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Ending log segment 565 2013-07-12 09:09:11,440 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2 Total time for transactions(ms): 0 Number of transactions batched in Syncs: 0 Number of syncs: 2 SyncTimes(ms): 13 2013-07-12 09:09:11,441 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2 Total time for transactions(ms): 0 Number of transactions batched in Syncs: 0 Number of syncs: 3 SyncTimes(ms): 13 And after sometime it said: 2013-07-12 11:03:19,799 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at 795 2013-07-12 11:04:19,826 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from 9.70.137.114 2013-07-12 11:04:19,826 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Rolling edit logs 2013-07-12 11:04:19,827 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Ending log segment 795 2013-07-12 11:04:19,827 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2 Total time for transactions(ms): 0 Number of transactions batched in Syncs: 0 Number of syncs: 2 SyncTimes(ms): 12 2013-07-12 11:04:19,827 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2 Total time for transactions(ms): 0 Number of transactions batched in Syncs: 0 Number of syncs: 3 SyncTimes(ms): 12 2013-07-12 11:04:19,829 INFO org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Finalizing edits file /tmp/hadoop-dsadm/dfs/name/current/edits_inprogress_795 - /tmp/hadoop-dsadm/dfs/name/current/edits_795-796
Re: Namenode automatically going to safemode with 2.1.0-beta
Hi, I have restarted my cluster after removing the data directory and formatting the namenode. So, is this screenshot still useful for you or do you want it only after I reproduce the issue? Thanks, Kishore On Mon, Jul 15, 2013 at 1:59 PM, Venkatarami Netla venkatarami.ne...@cloudwick.com wrote: Hi, Pls send Web UI namemnode page and what is ur total hard disk size On Mon, Jul 15, 2013 at 1:54 PM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: I have had enough space on the disk that is used, like around 30 Gigs Thanks, Kishore On Mon, Jul 15, 2013 at 1:30 PM, Venkatarami Netla venkatarami.ne...@cloudwick.com wrote: Hi, pls see the available space for NN storage directory. Thanks Regards Venkat On Mon, Jul 15, 2013 at 12:14 PM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi, I am doing no activity on my single node cluster which is using 2.1.0-beta, and still observed that it has gone to safe mode by itself after a while. I was looking at the name node log and see many of these kinds of entries.. Can anything be interpreted from these? 2013-07-12 09:06:11,256 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at 561 2013-07-12 09:07:11,290 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from 9.70.137.114 2013-07-12 09:07:11,290 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Rolling edit logs 2013-07-12 09:07:11,290 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Ending log segment 561 2013-07-12 09:07:11,291 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2 Total time for transactions(ms): 1 Number of transactions batched in Syncs: 0 Number of syncs: 2 SyncTimes(ms): 14 2013-07-12 09:07:11,292 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2 Total time for transactions(ms): 1 Number of transactions batched in Syncs: 0 Number of syncs: 3 SyncTimes(ms): 15 2013-07-12 09:07:11,293 INFO org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Finalizing edits file /tmp/hadoop-dsadm/dfs/name/current/edits_inprogress_561 - /tmp/hadoop-dsadm/dfs/name/current/edits_561-562 2013-07-12 09:07:11,294 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at 563 2013-07-12 09:08:11,397 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from 9.70.137.114 2013-07-12 09:08:11,398 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Rolling edit logs 2013-07-12 09:08:11,398 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Ending log segment 563 2013-07-12 09:08:11,399 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2 Total time for transactions(ms): 2 Number of transactions batched in Syncs: 0 Number of syncs: 2 SyncTimes(ms): 11 2013-07-12 09:08:11,400 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2 Total time for transactions(ms): 2 Number of transactions batched in Syncs: 0 Number of syncs: 3 SyncTimes(ms): 12 2013-07-12 09:08:11,402 INFO org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Finalizing edits file /tmp/hadoop-dsadm/dfs/name/current/edits_inprogress_563 - /tmp/hadoop-dsadm/dfs/name/current/edits_563-564 2013-07-12 09:08:11,402 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at 565 2013-07-12 09:09:11,440 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from 9.70.137.114 2013-07-12 09:09:11,440 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Rolling edit logs 2013-07-12 09:09:11,440 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Ending log segment 565 2013-07-12 09:09:11,440 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2 Total time for transactions(ms): 0 Number of transactions batched in Syncs: 0 Number of syncs: 2 SyncTimes(ms): 13 2013-07-12 09:09:11,441 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2 Total time for transactions(ms): 0 Number of transactions batched in Syncs: 0 Number of syncs: 3 SyncTimes(ms): 13 And after sometime it said: 2013-07-12 11:03:19,799 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at 795 2013-07-12 11:04:19,826 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from 9.70.137.114 2013-07-12 11:04:19,826 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Rolling edit logs 2013-07-12 11:04:19,827 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Ending log segment 795 2013-07-12 11:04:19,827 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2 Total time for transactions(ms): 0 Number of transactions batched in Syncs: 0 Number of syncs: 2 SyncTimes(ms): 12 2013-07-12 11:04
Re: Namenode automatically going to safemode with 2.1.0-beta
I don't have it in my hdfs-site.xml, in which case probably the default value is taken.. On Mon, Jul 15, 2013 at 2:29 PM, Azuryy Yu azury...@gmail.com wrote: please check dfs.datanode.du.reserved in the hdfs-site.xml On Jul 15, 2013 4:30 PM, Aditya exalter adityaexal...@gmail.com wrote: Hi Krishna, Can you please send screenshots of namenode web UI. Thanks Aditya. On Mon, Jul 15, 2013 at 1:54 PM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: I have had enough space on the disk that is used, like around 30 Gigs Thanks, Kishore On Mon, Jul 15, 2013 at 1:30 PM, Venkatarami Netla venkatarami.ne...@cloudwick.com wrote: Hi, pls see the available space for NN storage directory. Thanks Regards Venkat On Mon, Jul 15, 2013 at 12:14 PM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi, I am doing no activity on my single node cluster which is using 2.1.0-beta, and still observed that it has gone to safe mode by itself after a while. I was looking at the name node log and see many of these kinds of entries.. Can anything be interpreted from these? 2013-07-12 09:06:11,256 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at 561 2013-07-12 09:07:11,290 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from 9.70.137.114 2013-07-12 09:07:11,290 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Rolling edit logs 2013-07-12 09:07:11,290 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Ending log segment 561 2013-07-12 09:07:11,291 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2 Total time for transactions(ms): 1 Number of transactions batched in Syncs: 0 Number of syncs: 2 SyncTimes(ms): 14 2013-07-12 09:07:11,292 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2 Total time for transactions(ms): 1 Number of transactions batched in Syncs: 0 Number of syncs: 3 SyncTimes(ms): 15 2013-07-12 09:07:11,293 INFO org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Finalizing edits file /tmp/hadoop-dsadm/dfs/name/current/edits_inprogress_561 - /tmp/hadoop-dsadm/dfs/name/current/edits_561-562 2013-07-12 09:07:11,294 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at 563 2013-07-12 09:08:11,397 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from 9.70.137.114 2013-07-12 09:08:11,398 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Rolling edit logs 2013-07-12 09:08:11,398 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Ending log segment 563 2013-07-12 09:08:11,399 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2 Total time for transactions(ms): 2 Number of transactions batched in Syncs: 0 Number of syncs: 2 SyncTimes(ms): 11 2013-07-12 09:08:11,400 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2 Total time for transactions(ms): 2 Number of transactions batched in Syncs: 0 Number of syncs: 3 SyncTimes(ms): 12 2013-07-12 09:08:11,402 INFO org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Finalizing edits file /tmp/hadoop-dsadm/dfs/name/current/edits_inprogress_563 - /tmp/hadoop-dsadm/dfs/name/current/edits_563-564 2013-07-12 09:08:11,402 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at 565 2013-07-12 09:09:11,440 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from 9.70.137.114 2013-07-12 09:09:11,440 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Rolling edit logs 2013-07-12 09:09:11,440 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Ending log segment 565 2013-07-12 09:09:11,440 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2 Total time for transactions(ms): 0 Number of transactions batched in Syncs: 0 Number of syncs: 2 SyncTimes(ms): 13 2013-07-12 09:09:11,441 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2 Total time for transactions(ms): 0 Number of transactions batched in Syncs: 0 Number of syncs: 3 SyncTimes(ms): 13 And after sometime it said: 2013-07-12 11:03:19,799 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at 795 2013-07-12 11:04:19,826 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from 9.70.137.114 2013-07-12 11:04:19,826 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Rolling edit logs 2013-07-12 11:04:19,827 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Ending log segment 795 2013-07-12 11:04:19,827 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2 Total time for transactions(ms): 0 Number of transactions batched in Syncs: 0 Number of syncs: 2
Re: whitelist feature of YARN
Hi Sandy, Yes, I have been using AMRMClient APIs. I am planning to shift to whatever way is this white list feature is supported with. But am not sure what is meant by submitting ResourceRequests directly to RM. Can you please elaborate on this or give me a pointer to some example code on how to do it... Thanks for the reply, -Kishore On Mon, Jul 8, 2013 at 10:53 PM, Sandy Ryza sandy.r...@cloudera.com wrote: Hi Krishna, From your previous email, it looks like you are using the AMRMClient APIs. Support for whitelisting is not yet supported through them. I am working on this in YARN-521, which should be included in the next release after 2.1.0-beta. If you are submitting ResourceRequests directly to the RM, you can whitelist a node by * setting the relaxLocality flag on the node-level ResourceRequest to true * setting the relaxLocality flag on the corresponding rack-level ResourceRequest to false * setting the relaxLocality flag on the corresponding any-level ResourceRequest to false -Sandy On Mon, Jul 8, 2013 at 6:48 AM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi, Can someone please point to some example code of how to use the whitelist feature of YARN, I have recently got RC1 for hadoop-2.1.0-beta and want to use this feature. It would be great if you can point me to some description of what this white listing feature is, I have gone through some JIRA logs related to this but more concrete explanation would be helpful. Thanks, Kishore
Re: Requesting containers on a specific host
I could resolve this error with a simple host name change to a fully qualified one including the domain name. But to go ahead running some of my old example code, I am observing at least the following changes : 1) ClientRMProtocol and AMRMProtocol are removed. 2) ContainerManager is removed. 3) YarnRemoteException is removed. 4) ContainerRequest is removed. It looks like it is now compulsory to modify my application accordingly, where as on earlier versions, I could have run the same application without any modifications. Thanks, Kishore On Fri, Jul 5, 2013 at 7:37 PM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi Devaraj, Thanks for pointing me to this RC. I am trying this out, and getting this error for NM to get started. My RM is running fine, but NM is failing and saying that it is disallowed by RM and received a SHUTDOWN message. Please give me clue to resolve this. 2013-07-05 09:49:20,043 ERROR org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Unexpected error starting NodeStatusUpdater org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Recieved SHUTDOWN signal from Resourcemanager ,Registration of NodeManager failed, Message from ResourceManager: Disallowed NodeManager from isredeng.swg.usma.ibm.com, Sending SHUTDOWN signal to the NodeManager. at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:290) at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceStart(NodeStatusUpdaterImpl.java:156) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:101) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStart(NodeManager.java:213) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:401) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:447) Thanks, Kishore On Fri, Jul 5, 2013 at 8:52 AM, Devaraj k devara...@huawei.com wrote: Hi Kishore, ** ** hadoop-2.1.0 beta release is in voting process now. ** ** You can try out from hadoop-2.1.0 beta RC http://people.apache.org/~acmurthy/hadoop-2.1.0-beta-rc0/ or you could check the same with trunk build. ** ** Thanks Devaraj k ** ** *From:* Krishna Kishore Bonagiri [mailto:write2kish...@gmail.com] *Sent:* 04 July 2013 21:33 *To:* user@hadoop.apache.org *Subject:* Re: Requesting containers on a specific host ** ** Thanks Arun, it seems to be available with 2.1.0-beta, when will that be released? Or if I want it now, could I get from the trunk? ** ** -Kishore ** ** On Thu, Jul 4, 2013 at 5:58 PM, Arun C Murthy a...@hortonworks.com wrote: To guarantee nodes on a specific container you need to use the whitelist feature we added recently: https://issues.apache.org/jira/browse/YARN-398 Arun ** ** On Jul 4, 2013, at 3:14 AM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: I could get containers on specific nodes using addContainerRequest() on AMRMClient. But there are issues with it. I have two nodes, node1 and node2 in my cluster. And, my Application Master is trying to get 3 containers on node1, and 3 containers on node2 in that order. ** ** While trying to request on node1, it sometimes gives me those on node2, and vice verse. When I get a container on a different node than the one I need, I release it and make a fresh request. I am having to do like that forever to get a container on the node I need. ** ** Though the node I am requesting has enough resources, why does it keep giving me containers on the other node? How can I make sure I get a container on the node I want? ** ** Note: I am using the default scheduler, i.e. Capacity Scheduler. ** ** Thanks, Kishore ** ** On Fri, Jun 21, 2013 at 7:25 PM, Arun C Murthy a...@hortonworks.com wrote: Check if the hostname you are setting is the same in the RM logs… ** ** On Jun 21, 2013, at 2:15 AM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi, I am trying to get container on a specific host, using setHostName(0 call on ResourceRequest, but could not get allocated anything forever, which just works fine when I change the node name to *. I am working on a single node cluster, and I am giving the name of the single node I have in my cluster. ** ** Is there any specific format that I need to give for setHostName(), why is it not working... ** ** Thanks, Kishore ** ** -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com
whitelist feature of YARN
Hi, Can someone please point to some example code of how to use the whitelist feature of YARN, I have recently got RC1 for hadoop-2.1.0-beta and want to use this feature. It would be great if you can point me to some description of what this white listing feature is, I have gone through some JIRA logs related to this but more concrete explanation would be helpful. Thanks, Kishore
Re: Requesting containers on a specific host
Hi Devaraj, Thanks for pointing me to this RC. I am trying this out, and getting this error for NM to get started. My RM is running fine, but NM is failing and saying that it is disallowed by RM and received a SHUTDOWN message. Please give me clue to resolve this. 2013-07-05 09:49:20,043 ERROR org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Unexpected error starting NodeStatusUpdater org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Recieved SHUTDOWN signal from Resourcemanager ,Registration of NodeManager failed, Message from ResourceManager: Disallowed NodeManager from isredeng.swg.usma.ibm.com, Sending SHUTDOWN signal to the NodeManager. at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:290) at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceStart(NodeStatusUpdaterImpl.java:156) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:101) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStart(NodeManager.java:213) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:401) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:447) Thanks, Kishore On Fri, Jul 5, 2013 at 8:52 AM, Devaraj k devara...@huawei.com wrote: Hi Kishore, ** ** hadoop-2.1.0 beta release is in voting process now. ** ** You can try out from hadoop-2.1.0 beta RC http://people.apache.org/~acmurthy/hadoop-2.1.0-beta-rc0/ or you could check the same with trunk build. ** ** Thanks Devaraj k ** ** *From:* Krishna Kishore Bonagiri [mailto:write2kish...@gmail.com] *Sent:* 04 July 2013 21:33 *To:* user@hadoop.apache.org *Subject:* Re: Requesting containers on a specific host ** ** Thanks Arun, it seems to be available with 2.1.0-beta, when will that be released? Or if I want it now, could I get from the trunk? ** ** -Kishore ** ** On Thu, Jul 4, 2013 at 5:58 PM, Arun C Murthy a...@hortonworks.com wrote: To guarantee nodes on a specific container you need to use the whitelist feature we added recently: https://issues.apache.org/jira/browse/YARN-398 Arun ** ** On Jul 4, 2013, at 3:14 AM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: I could get containers on specific nodes using addContainerRequest() on AMRMClient. But there are issues with it. I have two nodes, node1 and node2 in my cluster. And, my Application Master is trying to get 3 containers on node1, and 3 containers on node2 in that order. ** ** While trying to request on node1, it sometimes gives me those on node2, and vice verse. When I get a container on a different node than the one I need, I release it and make a fresh request. I am having to do like that forever to get a container on the node I need. ** ** Though the node I am requesting has enough resources, why does it keep giving me containers on the other node? How can I make sure I get a container on the node I want? ** ** Note: I am using the default scheduler, i.e. Capacity Scheduler. ** ** Thanks, Kishore ** ** On Fri, Jun 21, 2013 at 7:25 PM, Arun C Murthy a...@hortonworks.com wrote: Check if the hostname you are setting is the same in the RM logs… ** ** On Jun 21, 2013, at 2:15 AM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi, I am trying to get container on a specific host, using setHostName(0 call on ResourceRequest, but could not get allocated anything forever, which just works fine when I change the node name to *. I am working on a single node cluster, and I am giving the name of the single node I have in my cluster. ** ** Is there any specific format that I need to give for setHostName(), why is it not working... ** ** Thanks, Kishore ** ** -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/ ** ** ** ** ** ** -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/ ** ** ** **
Re: Requesting containers on a specific host
I could get containers on specific nodes using addContainerRequest() on AMRMClient. But there are issues with it. I have two nodes, node1 and node2 in my cluster. And, my Application Master is trying to get 3 containers on node1, and 3 containers on node2 in that order. While trying to request on node1, it sometimes gives me those on node2, and vice verse. When I get a container on a different node than the one I need, I release it and make a fresh request. I am having to do like that forever to get a container on the node I need. Though the node I am requesting has enough resources, why does it keep giving me containers on the other node? How can I make sure I get a container on the node I want? Note: I am using the default scheduler, i.e. Capacity Scheduler. Thanks, Kishore On Fri, Jun 21, 2013 at 7:25 PM, Arun C Murthy a...@hortonworks.com wrote: Check if the hostname you are setting is the same in the RM logs… On Jun 21, 2013, at 2:15 AM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi, I am trying to get container on a specific host, using setHostName(0 call on ResourceRequest, but could not get allocated anything forever, which just works fine when I change the node name to *. I am working on a single node cluster, and I am giving the name of the single node I have in my cluster. Is there any specific format that I need to give for setHostName(), why is it not working... Thanks, Kishore -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/
Re: Requesting containers on a specific host
Thanks Arun, it seems to be available with 2.1.0-beta, when will that be released? Or if I want it now, could I get from the trunk? -Kishore On Thu, Jul 4, 2013 at 5:58 PM, Arun C Murthy a...@hortonworks.com wrote: To guarantee nodes on a specific container you need to use the whitelist feature we added recently: https://issues.apache.org/jira/browse/YARN-398 Arun On Jul 4, 2013, at 3:14 AM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: I could get containers on specific nodes using addContainerRequest() on AMRMClient. But there are issues with it. I have two nodes, node1 and node2 in my cluster. And, my Application Master is trying to get 3 containers on node1, and 3 containers on node2 in that order. While trying to request on node1, it sometimes gives me those on node2, and vice verse. When I get a container on a different node than the one I need, I release it and make a fresh request. I am having to do like that forever to get a container on the node I need. Though the node I am requesting has enough resources, why does it keep giving me containers on the other node? How can I make sure I get a container on the node I want? Note: I am using the default scheduler, i.e. Capacity Scheduler. Thanks, Kishore On Fri, Jun 21, 2013 at 7:25 PM, Arun C Murthy a...@hortonworks.comwrote: Check if the hostname you are setting is the same in the RM logs… On Jun 21, 2013, at 2:15 AM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi, I am trying to get container on a specific host, using setHostName(0 call on ResourceRequest, but could not get allocated anything forever, which just works fine when I change the node name to *. I am working on a single node cluster, and I am giving the name of the single node I have in my cluster. Is there any specific format that I need to give for setHostName(), why is it not working... Thanks, Kishore -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/ -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/
Re: Requesting containers on a specific host
Hi Arun, I just found that setHostName() doesn't work on 2.0.0-alpha and works on 2.0..4-alpha, I didn't check the intermediate versions. I have verified it by starting the daemons of respective versions, and modifying the ApplicationMaster.java in the distributed shell example, and running a date command on it. Is this something already known? or have I been doing something wrong? Thanks, Kishore On Fri, Jun 21, 2013 at 7:25 PM, Arun C Murthy a...@hortonworks.com wrote: Check if the hostname you are setting is the same in the RM logs… On Jun 21, 2013, at 2:15 AM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi, I am trying to get container on a specific host, using setHostName(0 call on ResourceRequest, but could not get allocated anything forever, which just works fine when I change the node name to *. I am working on a single node cluster, and I am giving the name of the single node I have in my cluster. Is there any specific format that I need to give for setHostName(), why is it not working... Thanks, Kishore -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/
Re: Requesting containers on a specific host
Yes Arun, the one I am giving is same as the one I see in RM's log also. Thanks, Kishore On Fri, Jun 21, 2013 at 7:25 PM, Arun C Murthy a...@hortonworks.com wrote: Check if the hostname you are setting is the same in the RM logs… On Jun 21, 2013, at 2:15 AM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi, I am trying to get container on a specific host, using setHostName(0 call on ResourceRequest, but could not get allocated anything forever, which just works fine when I change the node name to *. I am working on a single node cluster, and I am giving the name of the single node I have in my cluster. Is there any specific format that I need to give for setHostName(), why is it not working... Thanks, Kishore -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/
Re: Container allocation on the same node
Hi Harsh, What will happen when I specify local host as the required host? Doesn't the resource manager give me all the containers on the local host? I don't want to constrain myself to the local host, which might be busy while other nodes in the cluster have enough resources available for me. Thanks, Kishore On Wed, Jun 12, 2013 at 6:45 PM, Harsh J ha...@cloudera.com wrote: You can request containers with the local host name as the required host, and perhaps reject and re-request if they aren't designated to be on that one until you have sufficient. This may take a while though. On Wed, Jun 12, 2013 at 6:25 PM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi, I want to get some containers for my application on the same node, is there a way to make such a request. For example, I have an application which needs 10 containers, but have a constraint that a set of those containers need to be running on the same node. Can I ask my resource manager to give me, let us say 5 containers on the same node? I know that there is now a way to specify the node name on which I need a container, but I don't bother which node in the cluster I get them allocated on, I just need them on the same node. Please suggest me if it is possible, and how can I do that? Thanks, Kishore -- Harsh J
Application Master getting started very late
Hi, I have been using YARN since quite sometime, and recently moved to release 2.0.4. I recently started running huge number of Application Masters (applications) one after another, and observed that sometimes in that sequence the Application Master takes around 1 minute or little more than that also to get started from the point the client launches it. Any idea why it (AM) takes so long? It happens very randomly for me, not always at the same point. Thanks, Kishore
Re: What else can be built on top of YARN.
Hi Rahul, It is at least because of the reasons that Vinod listed that makes my life easy for porting my application on to YARN instead of making it work in the Map Reduce framework. The main purpose of me using YARN is to exploit the resource management capabilities of YARN. Thanks, Kishore On Wed, May 29, 2013 at 11:00 PM, Rahul Bhattacharjee rahul.rec@gmail.com wrote: Thanks for the response Krishna. I was wondering if it were possible for using MR to solve you problem instead of building the whole stack on top of yarn. Most likely its not possible , thats why you are building it . I wanted to know why is that ? I am in just trying to find out the need or why we might need to write the application on yarn. Rahul On Wed, May 29, 2013 at 8:23 PM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi Rahul, I am porting a distributed application that runs on a fixed set of given resources to YARN, with the aim of being able to run it on a dynamically selected resources whichever are available at the time of running the application. Thanks, Kishore On Wed, May 29, 2013 at 8:04 PM, Rahul Bhattacharjee rahul.rec@gmail.com wrote: Hi all, I was going through the motivation behind Yarn. Splitting the responsibility of JT is the major concern.Ultimately the base (Yarn) was built in a generic way for building other generic distributed applications too. I am not able to think of any other parallel processing use case that would be useful to built on top of YARN. I though of a lot of use cases that would be beneficial when run in parallel , but again ,we can do those using map only jobs in MR. Can someone tell me a scenario , where a application can utilize Yarn features or can be built on top of YARN and at the same time , it cannot be done efficiently using MRv2 jobs. thanks, Rahul
Re: What else can be built on top of YARN.
Hi Rahul, I am porting a distributed application that runs on a fixed set of given resources to YARN, with the aim of being able to run it on a dynamically selected resources whichever are available at the time of running the application. Thanks, Kishore On Wed, May 29, 2013 at 8:04 PM, Rahul Bhattacharjee rahul.rec@gmail.com wrote: Hi all, I was going through the motivation behind Yarn. Splitting the responsibility of JT is the major concern.Ultimately the base (Yarn) was built in a generic way for building other generic distributed applications too. I am not able to think of any other parallel processing use case that would be useful to built on top of YARN. I though of a lot of use cases that would be beneficial when run in parallel , but again ,we can do those using map only jobs in MR. Can someone tell me a scenario , where a application can utilize Yarn features or can be built on top of YARN and at the same time , it cannot be done efficiently using MRv2 jobs. thanks, Rahul
Out of memory error by Node Manager, and shut down
Hi, I have got the following error in node manager's log, and it got shut down, after about 1 application were run after it was started. Any clue why does it occur... or is this a bug? 2013-05-22 11:53:34,456 FATAL org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread Thread[process reaper,5,main] threw an Error. Shutting down now... java.lang.OutOfMemoryError: Failed to create a thread: retVal -1073741830, errno 11 at java.lang.Thread.startImpl(Native Method) at java.lang.Thread.start(Thread.java:887) at java.lang.ProcessInputStream.init(UNIXProcess.java:472) at java.lang.UNIXProcess$1$1$1.run(UNIXProcess.java:157) at java.security.AccessController.doPrivileged(AccessController.java:202) at java.lang.UNIXProcess$1$1.run(UNIXProcess.java:137) Thanks, Kishore
Namenode going to safe mode on YARN
Hi, I have been running application on my YARN cluster since around 20 days, about 5000 applications a day. I am getting the following error today. Please let me know how can I avoid this, is this happening because of a bug? org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeException): Cannot create file/1066/AppMaster.jar. Name node is in safe mode. The reported blocks 4775 needs additional 880 blocks to reach the threshold 0.9990 of total blocks 5660. Safe mode will be turned off automatically. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1786) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:1737) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1719) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:429) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:271) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:40732) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:454) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1014) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1741) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1737) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1478) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1735) Thanks, Kishore
Re: Namenode going to safe mode on YARN
Hi Nithin Ted, Thanks for the replies. I don't know what my replication factor is, I don't seem to have set anything in my configuration files. I run on a single node cluster. My data node has gone down and came back, and also I didn't delete any of the hdfs blocks. I know that name node enter safe mode when HDFS is restarted, and will leave soon. Is it safe to execute command to leave safe mode? I mean, can something wrong happen if we do it ourselves? because it wouldn't have collected all the needed data and could not leave the safe mode by itself? And, does the error I gave above indicate some clue as to what I could do better? Thanks, Kishore On Mon, May 6, 2013 at 2:56 PM, Ted Xu t...@gopivotal.com wrote: Hi Kishore, It should not be a bug. After restarting HDFS, namenode will enter safe mode until all needed data is collected. During safe mode, all update operations will fail. In some cases, as Nitin mentioned, namenode will never leave safe mode because it can't get enough data. In that case you may need to force name node leave safe mode. For more information, see http://hadoop.apache.org/docs/r2.0.4-alpha/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html#Safemode . On Mon, May 6, 2013 at 5:00 PM, Nitin Pawar nitinpawar...@gmail.comwrote: What is your replication factor on hdfs? Did any of your datanode go down recently and is not back in rotation? Did you delete any hdfs blocks directly from datanodes? On May 6, 2013 2:28 PM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi, I have been running application on my YARN cluster since around 20 days, about 5000 applications a day. I am getting the following error today. Please let me know how can I avoid this, is this happening because of a bug? org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeException): Cannot create file/1066/AppMaster.jar. Name node is in safe mode. The reported blocks 4775 needs additional 880 blocks to reach the threshold 0.9990 of total blocks 5660. Safe mode will be turned off automatically. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1786) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:1737) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1719) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:429) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:271) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:40732) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:454) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1014) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1741) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1737) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1478) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1735) Thanks, Kishore -- Regards, Ted Xu
Re: Namenode going to safe mode on YARN
Hi Nithin Ted, Thanks for the replies. I don't know what my replication factor is, I don't seem to have set anything in my configuration files. I run on a single node cluster. My data node has gone down and came back, and also I didn't delete any of the hdfs blocks. I know that name node enter safe mode when HDFS is restarted, and will leave soon. Is it safe to execute command to leave safe mode? I mean, can something wrong happen if we do it ourselves? because it wouldn't have collected the needed data and could not leave the safe mode by itself? And, does the error I gave above indicate some clue as to what I could do better? Thanks, Kishore On Mon, May 6, 2013 at 2:56 PM, Ted Xu t...@gopivotal.com wrote: Hi Kishore, It should not be a bug. After restarting HDFS, namenode will enter safe mode until all needed data is collected. During safe mode, all update operations will fail. In some cases, as Nitin mentioned, namenode will never leave safe mode because it can't get enough data. In that case you may need to force name node leave safe mode. For more information, see http://hadoop.apache.org/docs/r2.0.4-alpha/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html#Safemode . On Mon, May 6, 2013 at 5:00 PM, Nitin Pawar nitinpawar...@gmail.comwrote: What is your replication factor on hdfs? Did any of your datanode go down recently and is not back in rotation? Did you delete any hdfs blocks directly from datanodes? On May 6, 2013 2:28 PM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi, I have been running application on my YARN cluster since around 20 days, about 5000 applications a day. I am getting the following error today. Please let me know how can I avoid this, is this happening because of a bug? org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeException): Cannot create file/1066/AppMaster.jar. Name node is in safe mode. The reported blocks 4775 needs additional 880 blocks to reach the threshold 0.9990 of total blocks 5660. Safe mode will be turned off automatically. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1786) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:1737) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1719) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:429) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:271) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:40732) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:454) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1014) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1741) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1737) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1478) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1735) Thanks, Kishore -- Regards, Ted Xu
Re: POLL: Using YARN or pre-YARN?
I have been using YARN, i.e. hadopp-2.0.0-alpha to hadoop-2.0.4-alpha, I don't know what you meant by pre-YARN. Thanks, Kishore On Wed, Apr 24, 2013 at 10:41 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, Quick poll, would be great to know how many people are using YARN vs. pre-YARN: http://www.linkedin.com/groups/YARN-preYARN-which-version-Hadoop-988957.S.234719475 Thanks, Otis -- Hadoop Performance Monitoring - http://sematext.com/spm/index.html
Re: Differences hadoop-2.0.0-alpha Vs hadoop-2.0.3-alpha
Hi Arun, Thanks for the reply. I have just compiled and ran my ApplicationMaste.java and Client.java against hadoop-2.0.3-alpha and getting this exception. The same runs fine on 2.0.0. Please suggest me what could be the issue... 2013-03-28 00:50:47,576 FATAL [main] Client (Client.java:main(151)) - Error running CLient java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl.unwrapAndThrowException(YarnRemoteExceptionPBImpl.java:128) at org.apache.hadoop.yarn.api.impl.pb.client.ClientRMProtocolPBClientImpl.getQueueInfo(ClientRMProtocolPBClientImpl.java:215) at Client.dumpClusterInfo(Client.java:263) at Client.launchAndMonitorAM(Client.java:471) at Client.main(Client.java:149) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37) at java.lang.reflect.Method.invoke(Method.java:611) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) Caused by: org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): java.lang.NullPointerException at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:779) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.getQueueInfo(CapacityScheduler.java:542) at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getQueueInfo(ClientRMService.java:420) at org.apache.hadoop.yarn.api.impl.pb.service.ClientRMProtocolPBServiceImpl.getQueueInfo(ClientRMProtocolPBServiceImpl.java:191) at org.apache.hadoop.yarn.proto.ClientRMProtocol$ClientRMProtocolService$2.callBlockingMethod(ClientRMProtocol.java:214) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:454) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1014) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1735) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1731) at java.security.AccessController.doPrivileged(AccessController.java:284) at javax.security.auth.Subject.doAs(Subject.java:573) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1441) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1729) at org.apache.hadoop.ipc.Client.call(Client.java:1235) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202) at $Proxy4.getQueueInfo(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ClientRMProtocolPBClientImpl.getQueueInfo(ClientRMProtocolPBClientImpl.java:212) ... 8 more Thanks, Kishore On Thu, Mar 28, 2013 at 2:39 AM, Arun C Murthy a...@hortonworks.com wrote: YarnClient etc. is a just a bunch of helper libs to make it easier to write new applications. OTOH, your existing application should continue to work. hth, Arun On Mar 26, 2013, at 3:21 AM, Krishna Kishore Bonagiri wrote: Hi, I have some YARN application written and running properly against hadoop-2.0.0-alpha but when I recently downloaded and started using hadoop-2.0.3-alpha it doesn't work. I think the original one I wrote was looking at the Client.java and ApplicationMaster.java in DistributedShell example. It looks like this example code also has changed with the new version, it now has the Client being extended from YarnClientImpl and many other changes it has. Is there any guide as to how should I modify my old application to work against the new version? Thanks, Kishore -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/