Unsubscribe

2022-03-25 Thread Krishna Kishore Bonagiri



Using libhdfspp

2019-05-20 Thread Krishna Kishore Bonagiri
Hi,


   I have downloaded libhdfspp from the following link, and compiled it.


https://github.com/apache/hadoop


I found that some functions like hdfsWrite, hdfsHSync are not defined in
this library. Also, when I was trying to replace the old libhdfs.so with
this new library I am seeing some exceptions and hangs. Is anyone using
this libhdfspp? Please let me know.


Thanks,

Kishore


Using compression support available in native libraries

2018-07-30 Thread Krishna Kishore Bonagiri
Hi,

  I would like to use compression support available in native libraries. I
have tried to google but couldn't know how can I use the LZ4 compression
algorithm for compressing the data I am writing to the HDFS files I am
writing from my C++ code?

Is it possible to get the data written in compression mode automatically
without having to compress it in our code? Like, select some compression
option while opening or creating an HDFS file from our code, and just write
to it using hdfsWrite() and can it underneath take care of compressing it
before writing to the disk/file system?


Thanks,
Kishore


Socket Timeout Exception while multiple concurrent applications are reading HDFS data through WebHDFS interface

2015-12-09 Thread Krishna Kishore Bonagiri
Hi,
  We are seeing this SocketTImeout exception while a number of concurrent
applications (probably, 50 of them) are trying to read HDFS data through
WebHDFS interface. Are there any parameters we can tune so it doesn't
happen?

An exception occurred: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.read(SocketInputStream.java:163)
at java.net.SocketInputStream.read(SocketInputStream.java:133)
at
org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:166)
at
org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:90)
at
org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:281)
at
org.apache.http.impl.conn.LoggingSessionInputBuffer.readLine(LoggingSessionInputBuffer.java:115)
at
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:92)
at
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:62)
at
org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:254)
at
org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:289)
at
org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:252)
at
org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:191)
at
org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:300)
at
org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:127)
at
org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:715)
at
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:520)
at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
at
com.ibm.iis.cc.filesystem.impl.webhdfs.WebHDFS.appendFromBuffer(WebHDFS.java:306)
at
com.ibm.iis.cc.filesystem.impl.webhdfs.WebHDFS.writeFromStream(WebHDFS.java:198)
at
com.ibm.iis.cc.filesystem.AbstractFileSystem.writeFromStream(AbstractFileSystem.java:45)
at com.ibm.iis.cc.filesystem.FileSystem$Uploader.call(FileSystem.java:3393)
at com.ibm.iis.cc.filesystem.FileSystem$Uploader.call(FileSystem.java:3358)
at java.util.concurrent.FutureTask.run(FutureTask.java:273)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1176)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641)
at java.lang.Thread.run(Thread.java:853)


We have tried increasing the values of these parameters, but there is no
change.

1) dfs.datanode.handler.count
2) dfs.client.socket-timeout (the new parameter to define the socket
timeout)
3) dfs.socket.timeout (the deprecated parameter)
4) dfs.datanode.socket.write.timeout

Thanks,
Kishore


Re: ETL/DW to Hadoop migrations

2015-09-07 Thread Krishna Kishore Bonagiri
Abhishek,

   Are you looking for loading your data into Hadoop? if yes, IBM DataStage
has a stage called BDFS that loads/writes your data into Hadoop.

Thanks,
Kishore

On Tue, Sep 8, 2015 at 1:29 AM, <23singhabhis...@gmail.com> wrote:

> Hi guys,
>
> I am looking for pointers on migrating existing data warehouse to Hadoop.
> Currently,  we are using IBM Data stage an ETL tool and loading into
> Teradata staging/maintain tables.  Please suggest an architecture which
> reduces cost without much degrade in performance.  Has anyone of you been a
> part of such migration before? If yes then please provide some inputs,
> especially on what aspects should we be taking care of.  Talking about
> source data,  it is mainly in the form of flat files and database.
>
> Thanks in advance.
>
> Regards,
>
> Abhishek Singh
>


Re: Question about Block size configuration

2015-05-11 Thread Krishna Kishore Bonagiri
The default HDFS block size 64 MB means, it is the maximum size of block of
data written on HDFS. So, if you write 4 MB files, they will still be
occupying only 1 block of 4 MB size, not more than that. If your file is
more than 64MB, it gets split into multiple blocks.

If you set the HDFS block size to 2MB, then your 4 MB file will get split
into two blocks.

On Tue, May 12, 2015 at 8:38 AM, Himawan Mahardianto mahardia...@ugm.ac.id
wrote:

 Hi guys, I have a couple question about HDFS block size:

 What if I set my HDFS block size from default 64 MB to 2 MB each block,
 what will gonna happen?

 I decrease the value of a block size because I want to store an image file
 (jpeg, png etc) that have size about 4MB each file, what is your opinion or
 suggestion?

 What will gonna happen if i don't change the default size of a block size,
 then I store an image file with 4MB size, will Hadoop use full 64MB block,
 or it will create 4Mb block instead 64MB?

 How much memory used on RAM to store each block if my block size is 64MB,
 or my block size is 4MB?

 Is there anyone have experience with this? Any suggestion are welcome
 Thank you



Apache Slider stop function not working

2015-03-13 Thread Krishna Kishore Bonagiri
Hi,

  I am not aware of any Slider specific group, so I am posting it here.

  We are using Apache Slider 0.60 and implemented the management operations
start, status, stop, etc. in python script. Everything else is working but
the stop function is not getting invoked when the container is stopped. Is
this a known issue already? or is there any trick to make it work?


Thanks,
Kishore


Re: 100% CPU consumption by Resource Manager process

2014-08-18 Thread Krishna Kishore Bonagiri
Thanks Wangda, I think I have reduced this when I was trying to reduce the
container allocation time.

-Kishore


On Tue, Aug 19, 2014 at 7:39 AM, Wangda Tan wheele...@gmail.com wrote:

 Hi Krishna,

 4) What's the yarn.resourcemanager.nodemanagers.heartbeat-interval-ms in
 your configuration?
 50

 I think this config is problematic, too small heartbeat-interval will
 cause NM contact RM too often. I would suggest you can set this value
 larger like 1000.

 Thanks,
 Wangda



 On Wed, Aug 13, 2014 at 4:42 PM, Krishna Kishore Bonagiri 
 write2kish...@gmail.com wrote:

 Hi Wangda,
   Thanks for the reply, here are the details, please see if you could
 suggest anything.

 1) Number of nodes and running app in the cluster
 2 nodes, and I am running my own application that keeps asking for
 containers,
 a) running something on the containers,
 b) releasing the containers,
 c) ask for more containers with incremented priority value, and repeat
 the same process

 2) What's the version of your Hadoop?
 apache hadoop-2.4.0

 3) Have you set
 yarn.scheduler.capacity.schedule-asynchronously.enable=true?
 No

 4) What's the yarn.resourcemanager.nodemanagers.heartbeat-interval-ms
 in your configuration?
 50




 On Tue, Aug 12, 2014 at 12:44 PM, Wangda Tan wheele...@gmail.com wrote:

 Hi Krishna,
 To get more understanding about the problem, could you please share
 following information:
 1) Number of nodes and running app in the cluster
 2) What's the version of your Hadoop?
 3) Have you set
 yarn.scheduler.capacity.schedule-asynchronously.enable=true?
 4) What's the yarn.resourcemanager.nodemanagers.heartbeat-interval-ms
 in your configuration?

 Thanks,
 Wangda Tan



 On Sun, Aug 10, 2014 at 11:29 PM, Krishna Kishore Bonagiri 
 write2kish...@gmail.com wrote:

 Hi,
   My YARN resource manager is consuming 100% CPU when I am running an
 application that is running for about 10 hours, requesting as many as 27000
 containers. The CPU consumption was very low at the starting of my
 application, and it gradually went high to over 100%. Is this a known issue
 or are we doing something wrong?

 Every dump of the EVent Processor thread is running
 LeafQueue::assignContainers() specifically the for loop below from
 LeafQueue.java and seems to be looping through some priority list.

 // Try to assign containers to applications in order
 for (FiCaSchedulerApp application : activeApplications) {
 ...
 // Schedule in priority order
 for (Priority priority : application.getPriorities()) {

 3XMTHREADINFO  ResourceManager Event Processor
 J9VMThread:0x01D08600, j9thread_t:0x7F032D2FAA00,
 java/lang/Thread:0x8341D9A0, state:CW, prio=5
 3XMJAVALTHREAD(java/lang/Thread getId:0x1E, isDaemon:false)
 3XMTHREADINFO1(native thread ID:0x4B64, native
 priority:0x5, native policy:UNKNOWN)
 3XMTHREADINFO2(native stack address range
 from:0x7F0313DF8000, to:0x7F0313E39000, size:0x41000)
 3XMCPUTIME   *CPU usage total: 42334.614623696 secs*
 3XMHEAPALLOC Heap bytes allocated since last GC cycle=20456
 (0x4FE8)
 3XMTHREADINFO3   Java callstack:
 4XESTACKTRACEat
 org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.assignContainers(LeafQueue.java:850(Compiled
 Code))
 5XESTACKTRACE   (entered lock:
 org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp@0x8360DFE0,
 entry count: 1)
 5XESTACKTRACE   (entered lock:
 org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue@0x833B9280,
 entry count: 1)
 4XESTACKTRACEat
 org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainersToChildQueues(ParentQueue.java:655(Compiled
 Code))
 5XESTACKTRACE   (entered lock:
 org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x83360A80,
 entry count: 2)
 4XESTACKTRACEat
 org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainers(ParentQueue.java:569(Compiled
 Code))
 5XESTACKTRACE   (entered lock:
 org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x83360A80,
 entry count: 1)
 4XESTACKTRACEat
 org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:831(Compiled
 Code))
 5XESTACKTRACE   (entered lock:
 org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler@0x834037C8,
 entry count: 1)
 4XESTACKTRACEat
 org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:878(Compiled
 Code))
 4XESTACKTRACEat
 org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle

Re: 100% CPU consumption by Resource Manager process

2014-08-13 Thread Krishna Kishore Bonagiri
Hi Wangda,
  Thanks for the reply, here are the details, please see if you could
suggest anything.

1) Number of nodes and running app in the cluster
2 nodes, and I am running my own application that keeps asking for
containers,
a) running something on the containers,
b) releasing the containers,
c) ask for more containers with incremented priority value, and repeat the
same process

2) What's the version of your Hadoop?
apache hadoop-2.4.0

3) Have you set
yarn.scheduler.capacity.schedule-asynchronously.enable=true?
No

4) What's the yarn.resourcemanager.nodemanagers.heartbeat-interval-ms in
your configuration?
50




On Tue, Aug 12, 2014 at 12:44 PM, Wangda Tan wheele...@gmail.com wrote:

 Hi Krishna,
 To get more understanding about the problem, could you please share
 following information:
 1) Number of nodes and running app in the cluster
 2) What's the version of your Hadoop?
 3) Have you set
 yarn.scheduler.capacity.schedule-asynchronously.enable=true?
 4) What's the yarn.resourcemanager.nodemanagers.heartbeat-interval-ms in
 your configuration?

 Thanks,
 Wangda Tan



 On Sun, Aug 10, 2014 at 11:29 PM, Krishna Kishore Bonagiri 
 write2kish...@gmail.com wrote:

 Hi,
   My YARN resource manager is consuming 100% CPU when I am running an
 application that is running for about 10 hours, requesting as many as 27000
 containers. The CPU consumption was very low at the starting of my
 application, and it gradually went high to over 100%. Is this a known issue
 or are we doing something wrong?

 Every dump of the EVent Processor thread is running
 LeafQueue::assignContainers() specifically the for loop below from
 LeafQueue.java and seems to be looping through some priority list.

 // Try to assign containers to applications in order
 for (FiCaSchedulerApp application : activeApplications) {
 ...
 // Schedule in priority order
 for (Priority priority : application.getPriorities()) {

 3XMTHREADINFO  ResourceManager Event Processor
 J9VMThread:0x01D08600, j9thread_t:0x7F032D2FAA00,
 java/lang/Thread:0x8341D9A0, state:CW, prio=5
 3XMJAVALTHREAD(java/lang/Thread getId:0x1E, isDaemon:false)
 3XMTHREADINFO1(native thread ID:0x4B64, native priority:0x5,
 native policy:UNKNOWN)
 3XMTHREADINFO2(native stack address range
 from:0x7F0313DF8000, to:0x7F0313E39000, size:0x41000)
 3XMCPUTIME   *CPU usage total: 42334.614623696 secs*
 3XMHEAPALLOC Heap bytes allocated since last GC cycle=20456
 (0x4FE8)
 3XMTHREADINFO3   Java callstack:
 4XESTACKTRACEat
 org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.assignContainers(LeafQueue.java:850(Compiled
 Code))
 5XESTACKTRACE   (entered lock:
 org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp@0x8360DFE0,
 entry count: 1)
 5XESTACKTRACE   (entered lock:
 org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue@0x833B9280,
 entry count: 1)
 4XESTACKTRACEat
 org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainersToChildQueues(ParentQueue.java:655(Compiled
 Code))
 5XESTACKTRACE   (entered lock:
 org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x83360A80,
 entry count: 2)
 4XESTACKTRACEat
 org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainers(ParentQueue.java:569(Compiled
 Code))
 5XESTACKTRACE   (entered lock:
 org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x83360A80,
 entry count: 1)
 4XESTACKTRACEat
 org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:831(Compiled
 Code))
 5XESTACKTRACE   (entered lock:
 org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler@0x834037C8,
 entry count: 1)
 4XESTACKTRACEat
 org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:878(Compiled
 Code))
 4XESTACKTRACEat
 org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:100(Compiled
 Code))
 4XESTACKTRACEat
 org/apache/hadoop/yarn/server/resourcemanager/ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591)
 4XESTACKTRACEat java/lang/Thread.run(Thread.java:853)

 3XMTHREADINFO  ResourceManager Event Processor
 J9VMThread:0x01D08600, j9thread_t:0x7F032D2FAA00,
 java/lang/Thread:0x8341D9A0, state:CW, prio=5
 3XMJAVALTHREAD(java/lang/Thread getId:0x1E, isDaemon:false)
 3XMTHREADINFO1(native thread ID:0x4B64, native priority

Re: Negative value given by getVirtualCores() or getAvailableResources()

2014-08-13 Thread Krishna Kishore Bonagiri
Hi Wangda,

  I was actually wondering why should it give me -ve value for vcores when
I call getAvailableResources().

Thanks,
Kishore


On Tue, Aug 12, 2014 at 12:50 PM, Wangda Tan wheele...@gmail.com wrote:

 By default, vcore = 1 for each resource request. If you don't like this
 behavior, you can set yarn.scheduler.minimum-allocation-vcores=0

 Hope this helps,
 Wangda Tan



 On Thu, Aug 7, 2014 at 7:13 PM, Krishna Kishore Bonagiri 
 write2kish...@gmail.com wrote:

 Hi,
   I am calling getAvailableResources() on AMRMClientAsync and getting -ve
 value for the number virtual cores as below. Is there something wrong?

 memory:16110, vCores:-2.

 I have set the vcores in my yarn-site.xml like this, and just ran an
 application that requires two containers other than the Application
 Master's container. In the ContainerRequest setup from my
 ApplicationMaster, I haven't set anything for virtual cores, means I didn't
 call setVirtualCores() at all.

 So, I think it shouldn't be showing a -ve value for the vcores, when I
 call getAvailableResources(), am I wrong?


 description Number of CPU cores that can be allocated for containers.
 /description
 name yarn.nodemanager.resource.cpu-vcores /name
 value 4 /value
 /property

 Thanks,
 Kishore





100% CPU consumption by Resource Manager process

2014-08-10 Thread Krishna Kishore Bonagiri
Hi,
  My YARN resource manager is consuming 100% CPU when I am running an
application that is running for about 10 hours, requesting as many as 27000
containers. The CPU consumption was very low at the starting of my
application, and it gradually went high to over 100%. Is this a known issue
or are we doing something wrong?

Every dump of the EVent Processor thread is running
LeafQueue::assignContainers() specifically the for loop below from
LeafQueue.java and seems to be looping through some priority list.

// Try to assign containers to applications in order
for (FiCaSchedulerApp application : activeApplications) {
...
// Schedule in priority order
for (Priority priority : application.getPriorities()) {

3XMTHREADINFO  ResourceManager Event Processor
J9VMThread:0x01D08600, j9thread_t:0x7F032D2FAA00,
java/lang/Thread:0x8341D9A0, state:CW, prio=5
3XMJAVALTHREAD(java/lang/Thread getId:0x1E, isDaemon:false)
3XMTHREADINFO1(native thread ID:0x4B64, native priority:0x5,
native policy:UNKNOWN)
3XMTHREADINFO2(native stack address range
from:0x7F0313DF8000, to:0x7F0313E39000, size:0x41000)
3XMCPUTIME   *CPU usage total: 42334.614623696 secs*
3XMHEAPALLOC Heap bytes allocated since last GC cycle=20456
(0x4FE8)
3XMTHREADINFO3   Java callstack:
4XESTACKTRACEat
org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.assignContainers(LeafQueue.java:850(Compiled
Code))
5XESTACKTRACE   (entered lock:
org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp@0x8360DFE0,
entry count: 1)
5XESTACKTRACE   (entered lock:
org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue@0x833B9280,
entry count: 1)
4XESTACKTRACEat
org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainersToChildQueues(ParentQueue.java:655(Compiled
Code))
5XESTACKTRACE   (entered lock:
org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x83360A80,
entry count: 2)
4XESTACKTRACEat
org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainers(ParentQueue.java:569(Compiled
Code))
5XESTACKTRACE   (entered lock:
org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x83360A80,
entry count: 1)
4XESTACKTRACEat
org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:831(Compiled
Code))
5XESTACKTRACE   (entered lock:
org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler@0x834037C8,
entry count: 1)
4XESTACKTRACEat
org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:878(Compiled
Code))
4XESTACKTRACEat
org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.handle(CapacityScheduler.java:100(Compiled
Code))
4XESTACKTRACEat
org/apache/hadoop/yarn/server/resourcemanager/ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:591)
4XESTACKTRACEat java/lang/Thread.run(Thread.java:853)

3XMTHREADINFO  ResourceManager Event Processor
J9VMThread:0x01D08600, j9thread_t:0x7F032D2FAA00,
java/lang/Thread:0x8341D9A0, state:CW, prio=5
3XMJAVALTHREAD(java/lang/Thread getId:0x1E, isDaemon:false)
3XMTHREADINFO1(native thread ID:0x4B64, native priority:0x5,
native policy:UNKNOWN)
3XMTHREADINFO2(native stack address range
from:0x7F0313DF8000, to:0x7F0313E39000, size:0x41000)
3XMCPUTIME   CPU usage total: 42379.604203548 secs
3XMHEAPALLOC Heap bytes allocated since last GC cycle=57280
(0xDFC0)
3XMTHREADINFO3   Java callstack:
4XESTACKTRACEat
org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.assignContainers(LeafQueue.java:841(Compiled
Code))
5XESTACKTRACE   (entered lock:
org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp@0x8360DFE0,
entry count: 1)
5XESTACKTRACE   (entered lock:
org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue@0x833B9280,
entry count: 1)
4XESTACKTRACEat
org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.assignContainersToChildQueues(ParentQueue.java:655(Compiled
Code))
5XESTACKTRACE   (entered lock:
org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue@0x83360A80,
entry count: 2)
4XESTACKTRACEat

Negative value given by getVirtualCores() or getAvailableResources()

2014-08-07 Thread Krishna Kishore Bonagiri
Hi,
  I am calling getAvailableResources() on AMRMClientAsync and getting -ve
value for the number virtual cores as below. Is there something wrong?

memory:16110, vCores:-2.

I have set the vcores in my yarn-site.xml like this, and just ran an
application that requires two containers other than the Application
Master's container. In the ContainerRequest setup from my
ApplicationMaster, I haven't set anything for virtual cores, means I didn't
call setVirtualCores() at all.

So, I think it shouldn't be showing a -ve value for the vcores, when I call
getAvailableResources(), am I wrong?


description Number of CPU cores that can be allocated for containers.
/description
name yarn.nodemanager.resource.cpu-vcores /name
value 4 /value
/property

Thanks,
Kishore


How to check what is the log directory for container logs

2014-07-31 Thread Krishna Kishore Bonagiri
Hi,

  Is there a way to check what is the log directory for container logs in
my currently running instance of YARN from the command line, I mean using
the yarn command or hadoop command or so?

Thanks,
Kishore


Re: priority in the container request

2014-06-10 Thread Krishna Kishore Bonagiri
Thanks Vinod for the quick answer, it seems to be working when I am
requesting all containers with the same specification, but not when I have
multiple container requests with different host names specified. Is this
expected behavior?

Kishore


On Mon, Jun 9, 2014 at 10:51 PM, Vinod Kumar Vavilapalli 
vino...@hortonworks.com wrote:

 Yes, priorities are assigned to ResourceRequests and you can ask multiple
 containers at the same priority level. You may not get all the containers
 together as today's scheduler lacks gang functionality.

 +Vinod

 On Jun 9, 2014, at 12:08 AM, Krishna Kishore Bonagiri 
 write2kish...@gmail.com wrote:

  Hi,
 
Can we give the same value for priority when requesting multiple
 containers from the Application Master? Basically, I need all of those
 containers at the same time, and I am requesting them at the same time. So,
 I am thinking if we can do that?
 
  Thanks,
  Kishore


 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.



priority in the container request

2014-06-09 Thread Krishna Kishore Bonagiri
Hi,

  Can we give the same value for priority when requesting multiple
containers from the Application Master? Basically, I need all of those
containers at the same time, and I am requesting them at the same time. So,
I am thinking if we can do that?

Thanks,
Kishore


Getting the name of the host on which Application Master is launched

2014-05-15 Thread Krishna Kishore Bonagiri
Hi,

  Is there a way to get the name of the host where the AM is launched? I
have seen that there is a method getHost() in the ApplicationReport that we
get in YarnClient, but it is giving null. Is there a way to make it work?
or is there any other way to get the host name?

2014-05-09 04:36:05 YC00016 INFO PXYarnClient: Got application report from
ASM for , appId: 3 , appDiagnostics: null , appMasterHost:
null,clientToAMToken: null ,appQueue: default, appMasterRpcPort: 0
 appStartTime: 1,399,628,163,358 , yarnAppState: ACCEPTED
,DistributedFinalState: UNDEFINED , appTrackingUrl:
isredeng:8088/proxy/application_1399626434313_0003/, appUser: kbonagir
(monitorApplication)


Thanks,
Kishore


Specifying a node/host name on which the Application Master should run

2014-05-05 Thread Krishna Kishore Bonagiri
Hi,

  Is there a way to specify a host name on which we want to run our
application master. Can we do this when it is being launched from the
YarnClient?

Thanks,
Kishore


Re: Cleanup activity on YARN containers

2014-04-08 Thread Krishna Kishore Bonagiri
Hi Rohith,

   Thanks for the reply.

  Mine is a YARN application. I have some files that are local to where the
containers run on, and I want to clean them up at the end of the container
execution. So, I want to do this cleanup on the same node my container ran
on. With what you are suggesting, I can't delete the files local to the
container.

   Is there any other way?

Thanks,
Kishore


On Tue, Apr 8, 2014 at 8:55 AM, Rohith Sharma K S rohithsharm...@huawei.com
 wrote:

  Hi Kishore,



Is jobs are submitted through MapReduce or Is it Yarn Application?



 1.  For  MapReduce Framwork, framework itself provides facility to
 clean up per task level.

 Is there any callback kind of facility, in which I can write
 some code to be executed on my container at the end of my application or *at
 the end of that particular container execution?*

   You can override setup() and cleanup() for doing initialization and
 cleanup of your task. This facility is provided by MapReduce framework.



 The call flow of task execution is

  The framework first calls
 setup(org.apache.hadoop.mapreduce.Mapper.Context), followed by map(Object,
 Object, Context) / reduce(Object, Iterable, Context)  for each key/value
 pair. Finally cleanup(Context) is called.



 Note : In clean up, do not hold container for more than
 mapreduce.task.timeout. Because, once map/reduce is completed, progress
 will not be sent to applicationmaster(ping is not considered as status
 update). If your application is taking more than value configured for
 mapreduce.task.timeout, then application master consider this task as
 timedout. In such case, you need to increase value for
 mapreduce.task.timeout based on your cleanup time.





 2.   For Yarn Application, completed container's list are sent to
 ApplicationMaster in heartbeat.  Here you can do clean up activities for
 containers.



 Hope this will help for you.  J!!





 Thanks  Regards

 Rohith Sharma K S



 *From:* Krishna Kishore Bonagiri [mailto:write2kish...@gmail.com]
 *Sent:* 07 April 2014 16:41
 *To:* user@hadoop.apache.org
 *Subject:* Cleanup activity on YARN containers



 Hi,



   Is there any callback kind of facility, in which I can write some code
 to be executed on my container at the end of my application or at the end
 of that particular container execution?



  I want to do some cleanup activities at the end of my application, and
 the clean up is not related to the localized resources that are downloaded
 from HDFS.



 Thanks,

 Kishore



Reuse of YARN container

2014-04-08 Thread Krishna Kishore Bonagiri
Hi,

   Does this JIRA issue mean that we can't currently reuse a container for
running/launching two different processes one after another?

https://issues.apache.org/jira/browse/YARN-373


  If that is true, are there any plans for making that possible?

Thanks,
Kishore


Re: Cleanup activity on YARN containers

2014-04-08 Thread Krishna Kishore Bonagiri
Hi Rohith,

  Is there something like shutdown hook for containers? Can you please also
tell me how to use that?

Thanks,
Kishore


On Wed, Apr 9, 2014 at 8:34 AM, Rohith Sharma K S rohithsharm...@huawei.com
 wrote:

  For local container clean up, can be cleaned at ShutDownHook. !!??



 Thanks  Regards

 Rohith Sharma K S



 *From:* Krishna Kishore Bonagiri [mailto:write2kish...@gmail.com]
 *Sent:* 08 April 2014 20:01
 *To:* user@hadoop.apache.org
 *Subject:* Re: Cleanup activity on YARN containers



 Hi Rohith,



Thanks for the reply.



   Mine is a YARN application. I have some files that are local to where
 the containers run on, and I want to clean them up at the end of the
 container execution. So, I want to do this cleanup on the same node my
 container ran on. With what you are suggesting, I can't delete the files
 local to the container.



Is there any other way?



 Thanks,

 Kishore



 On Tue, Apr 8, 2014 at 8:55 AM, Rohith Sharma K S 
 rohithsharm...@huawei.com wrote:

 Hi Kishore,



Is jobs are submitted through MapReduce or Is it Yarn Application?



 1.  For  MapReduce Framwork, framework itself provides facility to
 clean up per task level.

 Is there any callback kind of facility, in which I can write
 some code to be executed on my container at the end of my application or *at
 the end of that particular container execution?*

   You can override setup() and cleanup() for doing initialization and
 cleanup of your task. This facility is provided by MapReduce framework.



 The call flow of task execution is

  The framework first calls
 setup(org.apache.hadoop.mapreduce.Mapper.Context), followed by map(Object,
 Object, Context) / reduce(Object, Iterable, Context)  for each key/value
 pair. Finally cleanup(Context) is called.



 Note : In clean up, do not hold container for more than
 mapreduce.task.timeout. Because, once map/reduce is completed, progress
 will not be sent to applicationmaster(ping is not considered as status
 update). If your application is taking more than value configured for
 mapreduce.task.timeout, then application master consider this task as
 timedout. In such case, you need to increase value for
 mapreduce.task.timeout based on your cleanup time.





 2.   For Yarn Application, completed container's list are sent to
 ApplicationMaster in heartbeat.  Here you can do clean up activities for
 containers.



 Hope this will help for you.  J!!





 Thanks  Regards

 Rohith Sharma K S



 *From:* Krishna Kishore Bonagiri [mailto:write2kish...@gmail.com]
 *Sent:* 07 April 2014 16:41
 *To:* user@hadoop.apache.org
 *Subject:* Cleanup activity on YARN containers



 Hi,



   Is there any callback kind of facility, in which I can write some code
 to be executed on my container at the end of my application or at the end
 of that particular container execution?



  I want to do some cleanup activities at the end of my application, and
 the clean up is not related to the localized resources that are downloaded
 from HDFS.



 Thanks,

 Kishore





Cleanup activity on YARN containers

2014-04-07 Thread Krishna Kishore Bonagiri
Hi,

  Is there any callback kind of facility, in which I can write some code to
be executed on my container at the end of my application or at the end of
that particular container execution?

 I want to do some cleanup activities at the end of my application, and the
clean up is not related to the localized resources that are downloaded from
HDFS.

Thanks,
Kishore


Value for yarn.nodemanager.address in configuration file

2014-04-03 Thread Krishna Kishore Bonagiri
Hi,

  This is regarding a single node cluster setup 

  If I have a value of 0.0.0.0:8050 for yarn.nodemanager.address in the
configuration file yarn-site.xml/yarn-default.xml is it mandatory
requirement that ssh 0.0.0.0 should work on my machine for being able to
start YARN? Or will I be able to start the daemons without that ssh
0.0.0.0 working as well?

Thanks,
Kishore


Re: Node manager or Resource Manager crash

2014-03-05 Thread Krishna Kishore Bonagiri
Vinod,

  One more observation I can share is that all the times the NM or RM is
getting killed, I see the following kind of messages in the NM's log

2014-03-05 05:33:23,824 DEBUG
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Node's
health-status : true,
2014-03-05 05:33:23,824 DEBUG org.apache.hadoop.ipc.Client: IPC Client
(2132631259) connection to isredeng/9.70.137.184:8031 from kbonagir sending
#5391
2014-03-05 05:33:23,826 DEBUG org.apache.hadoop.ipc.Client: IPC Client
(2132631259) connection to isredeng/9.70.137.184:8031 from kbonagir got
value #5391
2014-03-05 05:33:23,826 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine:
Call: nodeHeartbeat took 2ms


Does that give any clue? Something going wrong while it is getting a node's
health?

Thanks,
Kishore



On Tue, Mar 4, 2014 at 10:51 PM, Vinod Kumar Vavilapalli vino...@apache.org
 wrote:

 I remember you asking this question before. Check if your OS' OOM killer
 is killing it.

 +Vinod

 On Mar 4, 2014, at 6:53 AM, Krishna Kishore Bonagiri 
 write2kish...@gmail.com wrote:

 Hi,
   I am running an application on a 2-node cluster, which tries to acquire
 all the containers that are available on one of those nodes and remaining
 containers from the other node in the cluster. When I run this application
 continuously in a loop, one of the NM or RM is getting killed at a random
 point. There is no corresponding message in the log files.

 One of the times that NM had got killed today, the tail of the it's log is
 like this:

 2014-03-04 02:42:44,386 DEBUG
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl:
 isredeng:52867 sending out status for 16 containers
 2014-03-04 02:42:44,386 DEBUG
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Node's
 health-status : true,


 And at the time of NM's crash, the RM's log has the following entries:

 2014-03-04 02:42:40,371 DEBUG
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Processing
 isredeng:52867 of type STATUS_UPDATE
 2014-03-04 02:42:40,371 DEBUG
 org.apache.hadoop.yarn.event.AsyncDispatcher: Dispatching the event
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.NodeUpdateSchedulerEvent.EventType:
 NODE_UPDATE
 2014-03-04 02:42:40,371 DEBUG org.apache.hadoop.ipc.Server: IPC Server
 Responder: responding to
 org.apache.hadoop.yarn.server.api.ResourceTrackerPB.nodeHeartbeat from
 9.70.137.184:33696 Call#14060 Retry#0 Wrote 40 bytes.
 2014-03-04 02:42:40,371 DEBUG
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
 nodeUpdate: isredeng:52867 clusterResources:
 memory:16384, vCores:16
 2014-03-04 02:42:40,371 DEBUG
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
 Node being looked for scheduling isredeng:52867
 availableResource: memory:0, vCores:-8
 2014-03-04 02:42:40,393 DEBUG org.apache.hadoop.ipc.Server:  got #151


 Note: the name of the node on which NM has got killed is isredeng, does it
 indicate anything from the above message as to why it got killed?

 Thanks,
 Kishore





 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity
 to which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.


Node manager or Resource Manager crash

2014-03-04 Thread Krishna Kishore Bonagiri
Hi,
  I am running an application on a 2-node cluster, which tries to acquire
all the containers that are available on one of those nodes and remaining
containers from the other node in the cluster. When I run this application
continuously in a loop, one of the NM or RM is getting killed at a random
point. There is no corresponding message in the log files.

One of the times that NM had got killed today, the tail of the it's log is
like this:

2014-03-04 02:42:44,386 DEBUG
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl:
isredeng:52867 sending out status for 16 containers
2014-03-04 02:42:44,386 DEBUG
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Node's
health-status : true,


And at the time of NM's crash, the RM's log has the following entries:

2014-03-04 02:42:40,371 DEBUG
org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Processing
isredeng:52867 of type STATUS_UPDATE
2014-03-04 02:42:40,371 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher:
Dispatching the event
org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.NodeUpdateSchedulerEvent.EventType:
NODE_UPDATE
2014-03-04 02:42:40,371 DEBUG org.apache.hadoop.ipc.Server: IPC Server
Responder: responding to
org.apache.hadoop.yarn.server.api.ResourceTrackerPB.nodeHeartbeat from
9.70.137.184:33696 Call#14060 Retry#0 Wrote 40 bytes.
2014-03-04 02:42:40,371 DEBUG
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
nodeUpdate: isredeng:52867 clusterResources:
memory:16384, vCores:16
2014-03-04 02:42:40,371 DEBUG
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
Node being looked for scheduling isredeng:52867
availableResource: memory:0, vCores:-8
2014-03-04 02:42:40,393 DEBUG org.apache.hadoop.ipc.Server:  got #151


Note: the name of the node on which NM has got killed is isredeng, does it
indicate anything from the above message as to why it got killed?

Thanks,
Kishore


Re: Node manager or Resource Manager crash

2014-03-04 Thread Krishna Kishore Bonagiri
Yes Vinod, I was asking this question sometime back, and I got back to
resolve the issue again.

I tried to see if the OOM is killing but it is not. I have checked the free
swap  space on my box while my test is going on, but it doesn't seem to be
the issue. Also, I  have verified if OOM score is going high for any of
these process because that is when OOM killer kills them, but they are not
going high too.

Thanks,
Kishore


On Tue, Mar 4, 2014 at 10:51 PM, Vinod Kumar Vavilapalli vino...@apache.org
 wrote:

 I remember you asking this question before. Check if your OS' OOM killer
 is killing it.

 +Vinod

 On Mar 4, 2014, at 6:53 AM, Krishna Kishore Bonagiri 
 write2kish...@gmail.com wrote:

 Hi,
   I am running an application on a 2-node cluster, which tries to acquire
 all the containers that are available on one of those nodes and remaining
 containers from the other node in the cluster. When I run this application
 continuously in a loop, one of the NM or RM is getting killed at a random
 point. There is no corresponding message in the log files.

 One of the times that NM had got killed today, the tail of the it's log is
 like this:

 2014-03-04 02:42:44,386 DEBUG
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl:
 isredeng:52867 sending out status for 16 containers
 2014-03-04 02:42:44,386 DEBUG
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Node's
 health-status : true,


 And at the time of NM's crash, the RM's log has the following entries:

 2014-03-04 02:42:40,371 DEBUG
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Processing
 isredeng:52867 of type STATUS_UPDATE
 2014-03-04 02:42:40,371 DEBUG
 org.apache.hadoop.yarn.event.AsyncDispatcher: Dispatching the event
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.NodeUpdateSchedulerEvent.EventType:
 NODE_UPDATE
 2014-03-04 02:42:40,371 DEBUG org.apache.hadoop.ipc.Server: IPC Server
 Responder: responding to
 org.apache.hadoop.yarn.server.api.ResourceTrackerPB.nodeHeartbeat from
 9.70.137.184:33696 Call#14060 Retry#0 Wrote 40 bytes.
 2014-03-04 02:42:40,371 DEBUG
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
 nodeUpdate: isredeng:52867 clusterResources:
 memory:16384, vCores:16
 2014-03-04 02:42:40,371 DEBUG
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
 Node being looked for scheduling isredeng:52867
 availableResource: memory:0, vCores:-8
 2014-03-04 02:42:40,393 DEBUG org.apache.hadoop.ipc.Server:  got #151


 Note: the name of the node on which NM has got killed is isredeng, does it
 indicate anything from the above message as to why it got killed?

 Thanks,
 Kishore





 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity
 to which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.


YARN -- Debug messages in logs

2014-02-28 Thread Krishna Kishore Bonagiri
Hi,

  How can I get the debug log messages from RM and other daemons?

For example,

   Currently I could see messages from LOG.info() only, i.e. something like
this:

LOG.info(event.getContainerId() +  Container Transitioned from  +
oldState +  to  + getState());

How can I get those from LOG.debug() ? I mean the following kind of
messages ...

LOG.debug(Processing  + event.getContainerId() +  of type  +
event.getType());


Thanks,
Kishore


Re: Yarn - specify hosts in ContainerRequest

2014-02-16 Thread Krishna Kishore Bonagiri
Hi Anand,

  Which version of Hadoop are you using? It works from 2.2.0

Try like this, and it should work. I am using this feature on 2.2.0

   String[] hosts = new String[1];
   hosts[0] = node_name;
   ContainerRequest request = new ContainerRequest(capability, hosts,
null, p, false);


Thanks,
Kishore


On Fri, Feb 14, 2014 at 11:43 PM, Anand Mundada anandmund...@ymail.comwrote:

 Hi All,

 How can I launch container on a particular host?
 I tried specifying host name in
 *new ContainerRequest()*

 Thanks,
 Anand



Re: Can we avoid restarting of AM when it fails?

2014-02-10 Thread Krishna Kishore Bonagiri
Thanks Harsh, I got it.


On Sat, Feb 8, 2014 at 7:33 PM, Harsh J ha...@cloudera.com wrote:

 Correction: Set it to 1 (For 1 max attempt), not 0.

 On Sat, Feb 8, 2014 at 7:31 PM, Harsh J ha...@cloudera.com wrote:
  You can set
 http://hadoop.apache.org/docs/current/api/org/apache/hadoop/yarn/api/records/ApplicationSubmissionContext.html#setMaxAppAttempts(int)
  to 0, at a per-app level, to prevent any reattempts/recovery of your
  AM.
 
  For a cluster-wide effect instead, you can limit by overriding the
  default value of the RM property yarn.resourcemanager.am.max-retries
  in the RM's YarnConfiguration or yarn-site.xml.
 
  On Fri, Feb 7, 2014 at 5:24 PM, Krishna Kishore Bonagiri
  write2kish...@gmail.com wrote:
  Hi,
 
 I am having some failure test cases where my Application Master is
  supposed to fail. But when it fails it is again started with appID_02
 . Is
  there a way for me to avoid the second instance of the Application
 Master
  getting started? Is it re-started automatically by the RM after the
 first
  one failed?
 
  Thanks,
  Kishore
 
 
 
  --
  Harsh J



 --
 Harsh J



Can we avoid restarting of AM when it fails?

2014-02-07 Thread Krishna Kishore Bonagiri
Hi,

   I am having some failure test cases where my Application Master is
supposed to fail. But when it fails it is again started with appID_02 .
Is there a way for me to avoid the second instance of the Application
Master getting started? Is it re-started automatically by the RM after the
first one failed?

Thanks,
Kishore


Re: Command line tools for YARN

2013-12-26 Thread Krishna Kishore Bonagiri
Hi Jian,

 Thanks for the suggestions, I could get the required information there
with yarn node command.

Kishore


On Fri, Dec 27, 2013 at 12:58 AM, Jian He j...@hortonworks.com wrote:

 1) checking how many nodes are in my cluster
 3) what are the names of nodes in my cluster

 you can use yarn node.

 2) what are the cluster resources total or available at the time of
 running the command
 Not quite sure, you can search possible options in the yarn command menu.
 And you can always see the resources usage via web UI though.


 On Mon, Dec 23, 2013 at 10:08 PM, Krishna Kishore Bonagiri 
 write2kish...@gmail.com wrote:

 Hi,

   Are there any command line tools for things like,

 1) checking how many nodes are in my cluster
 2) what are the cluster resources total or available at the time of
 running the command
 3) what are the names of nodes in my cluster

 etc..

 Thanks,
 Kishore



 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity
 to which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.


Command line tools for YARN

2013-12-23 Thread Krishna Kishore Bonagiri
Hi,

  Are there any command line tools for things like,

1) checking how many nodes are in my cluster
2) what are the cluster resources total or available at the time of running
the command
3) what are the names of nodes in my cluster

etc..

Thanks,
Kishore


Container exit status for released container

2013-12-23 Thread Krishna Kishore Bonagiri
Hi,
 I am seeing the exit status of released container(released through a call
to releaseAssignedContainer())  to be -100. Can my code assume that -100
will always be given as exit status for a released container by YARN?

Thanks,
Kishore


Re: Yarn -- one of the daemons getting killed

2013-12-18 Thread Krishna Kishore Bonagiri
Hi Vinod,
 Thanks for the link, I went through it and it looks like the OOM killer
picks a process that has the highest oom_score. I have tried to capture
oom_score for all the YARN daemon processes after each run of my
application.The first time I have captured these details, I see that the
name node is killed where as the Node Manager has the highest score. So, I
don't if it is really the OOM killer that has killed it!

 Please see the output of my run attached, which also has the output of
free command after each run. The output of free command doesn't either show
any exhaustion of system memory.

Also, one more thing I have done today is, I have added audit rules for
each of the daemons to capture all the system calls. And, in the audit log,
I see futex() system call occurring in the killed daemon processes. I don't
know if it causes the daemon to die? and why does that call happen...


Thanks,
Kishore


On Wed, Dec 18, 2013 at 12:31 AM, Vinod Kumar Vavilapalli 
vino...@hortonworks.com wrote:

 That's good info. It is more than likely that it is the OOM killer. See
 http://stackoverflow.com/questions/726690/who-killed-my-process-and-whyfor 
 example.

 Thanks,
 +Vinod

 On Dec 17, 2013, at 1:26 AM, Krishna Kishore Bonagiri 
 write2kish...@gmail.com wrote:

 Hi Jeff,

   I have run the resource manager in the foreground without nohup and here
 are the messages when it was killed, it says it is Killed but doesn't say
 why!

 13/12/17 03:14:54 INFO capacity.CapacityScheduler: Application
 appattempt_1387266015651_0258_01 released container
 container_1387266015651_0258_01_03 on node: host: isredeng:36576
 #containers=2 available=7936 used=256 with event: FINISHED
 13/12/17 03:14:54 INFO rmcontainer.RMContainerImpl:
 container_1387266015651_0258_01_05 Container Transitioned from ACQUIRED
 to RUNNING
 Killed


 Thanks,
 Kishore


 On Mon, Dec 16, 2013 at 11:10 PM, Jeff Stuckman stuck...@umd.edu wrote:

  What if you open the daemons in a screen session rather than running
 them in the background -- for example, run yarn resourcemanager. Then you
 can see exactly when they terminate, and hopefully why.

*From: *Krishna Kishore Bonagiri
 *Sent: *Monday, December 16, 2013 6:20 AM
 *To: *user@hadoop.apache.org
 *Reply To: *user@hadoop.apache.org
 *Subject: *Re: Yarn -- one of the daemons getting killed

  Hi Vinod,

   Yes, I am running on Linux.

  I was actually searching for a corresponding message in
 /var/log/messages to confirm that OOM killed my daemons, but could not find
 any corresponding messages there! According to the following link, it looks
 like if it is a memory issue, I should see a messages even if OOM is
 disabled, but I don't see it.

  http://www.redhat.com/archives/taroon-list/2007-August/msg6.html

And, is memory consumption more in case of two node cluster than a
 single node one? Also, I see this problem only when I give * as the node
 name.

One other thing I suspected was the allowed number of user processes,
 I increased that to 31000 from 1024 but that also didn't help.

  Thanks,
 Kishore


 On Fri, Dec 13, 2013 at 11:51 PM, Vinod Kumar Vavilapalli 
 vino...@hortonworks.com wrote:

 Yes, that is what I suspect. That is why I asked if everything is on a
 single node. If you are running linux, linux OOM killer may be shooting
 things down. When it happens, you will see something like 'killed process
 in system's syslog.

Thanks,
 +Vinod

  On Dec 13, 2013, at 4:52 AM, Krishna Kishore Bonagiri 
 write2kish...@gmail.com wrote:

  Vinod,

   One more thing I observed is that, my Client which submits Application
 Master one after another continuously also gets killed sometimes. So, it is
 always any of the Java Processes that is getting killed. Does it indicate
 some excessive memory usage by them or something like that, that is causing
 them die? If so, how can we resolve this kind of issue?

  Thanks,
 Kishore


 On Fri, Dec 13, 2013 at 10:16 AM, Krishna Kishore Bonagiri 
 write2kish...@gmail.com wrote:

 No, I am running on 2 node cluster.


 On Fri, Dec 13, 2013 at 1:52 AM, Vinod Kumar Vavilapalli 
 vino...@hortonworks.com wrote:

 Is all of this on a single node?

   Thanks,
 +Vinod

  On Dec 12, 2013, at 3:26 AM, Krishna Kishore Bonagiri 
 write2kish...@gmail.com wrote:

  Hi,
   I am running a small application on YARN (2.2.0) in a loop of 500
 times, and while doing so one of the daemons, node manager, resource
 manager, or data node is getting killed (I mean disappearing) at a random
 point. I see no information in the corresponding log files. How can I know
 why is it happening so?

   And, one more observation is that, this is happening only when I am
 using * for node name in the container requests, otherwise when I used a
 specific node name, everything is fine.

  Thanks,
 Kishore



 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or
 entity to which it is addressed and may contain information

Re: Yarn -- one of the daemons getting killed

2013-12-17 Thread Krishna Kishore Bonagiri
Hi Jeff,

  I have run the resource manager in the foreground without nohup and here
are the messages when it was killed, it says it is Killed but doesn't say
why!

13/12/17 03:14:54 INFO capacity.CapacityScheduler: Application
appattempt_1387266015651_0258_01 released container
container_1387266015651_0258_01_03 on node: host: isredeng:36576
#containers=2 available=7936 used=256 with event: FINISHED
13/12/17 03:14:54 INFO rmcontainer.RMContainerImpl:
container_1387266015651_0258_01_05 Container Transitioned from ACQUIRED
to RUNNING
Killed


Thanks,
Kishore


On Mon, Dec 16, 2013 at 11:10 PM, Jeff Stuckman stuck...@umd.edu wrote:

  What if you open the daemons in a screen session rather than running
 them in the background -- for example, run yarn resourcemanager. Then you
 can see exactly when they terminate, and hopefully why.

*From: *Krishna Kishore Bonagiri
 *Sent: *Monday, December 16, 2013 6:20 AM
 *To: *user@hadoop.apache.org
 *Reply To: *user@hadoop.apache.org
 *Subject: *Re: Yarn -- one of the daemons getting killed

  Hi Vinod,

   Yes, I am running on Linux.

  I was actually searching for a corresponding message in /var/log/messages
 to confirm that OOM killed my daemons, but could not find any corresponding
 messages there! According to the following link, it looks like if it is a
 memory issue, I should see a messages even if OOM is disabled, but I don't
 see it.

  http://www.redhat.com/archives/taroon-list/2007-August/msg6.html

And, is memory consumption more in case of two node cluster than a
 single node one? Also, I see this problem only when I give * as the node
 name.

One other thing I suspected was the allowed number of user processes,
 I increased that to 31000 from 1024 but that also didn't help.

  Thanks,
 Kishore


 On Fri, Dec 13, 2013 at 11:51 PM, Vinod Kumar Vavilapalli 
 vino...@hortonworks.com wrote:

 Yes, that is what I suspect. That is why I asked if everything is on a
 single node. If you are running linux, linux OOM killer may be shooting
 things down. When it happens, you will see something like 'killed process
 in system's syslog.

Thanks,
 +Vinod

  On Dec 13, 2013, at 4:52 AM, Krishna Kishore Bonagiri 
 write2kish...@gmail.com wrote:

  Vinod,

   One more thing I observed is that, my Client which submits Application
 Master one after another continuously also gets killed sometimes. So, it is
 always any of the Java Processes that is getting killed. Does it indicate
 some excessive memory usage by them or something like that, that is causing
 them die? If so, how can we resolve this kind of issue?

  Thanks,
 Kishore


 On Fri, Dec 13, 2013 at 10:16 AM, Krishna Kishore Bonagiri 
 write2kish...@gmail.com wrote:

 No, I am running on 2 node cluster.


 On Fri, Dec 13, 2013 at 1:52 AM, Vinod Kumar Vavilapalli 
 vino...@hortonworks.com wrote:

 Is all of this on a single node?

   Thanks,
 +Vinod

  On Dec 12, 2013, at 3:26 AM, Krishna Kishore Bonagiri 
 write2kish...@gmail.com wrote:

  Hi,
   I am running a small application on YARN (2.2.0) in a loop of 500
 times, and while doing so one of the daemons, node manager, resource
 manager, or data node is getting killed (I mean disappearing) at a random
 point. I see no information in the corresponding log files. How can I know
 why is it happening so?

   And, one more observation is that, this is happening only when I am
 using * for node name in the container requests, otherwise when I used a
 specific node name, everything is fine.

  Thanks,
 Kishore



 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or
 entity to which it is addressed and may contain information that is
 confidential, privileged and exempt from disclosure under applicable law.
 If the reader of this message is not the intended recipient, you are hereby
 notified that any printing, copying, dissemination, distribution,
 disclosure or forwarding of this communication is strictly prohibited. If
 you have received this communication in error, please contact the sender
 immediately and delete it from your system. Thank You.





 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity
 to which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.





Re: Yarn -- one of the daemons getting killed

2013-12-16 Thread Krishna Kishore Bonagiri
Hi Vinod,

 Yes, I am running on Linux.

 I was actually searching for a corresponding message in /var/log/messages
to confirm that OOM killed my daemons, but could not find any corresponding
messages there! According to the following link, it looks like if it is a
memory issue, I should see a messages even if OOM is disabled, but I don't
see it.

http://www.redhat.com/archives/taroon-list/2007-August/msg6.html

  And, is memory consumption more in case of two node cluster than a single
node one? Also, I see this problem only when I give * as the node name.

  One other thing I suspected was the allowed number of user processes, I
increased that to 31000 from 1024 but that also didn't help.

Thanks,
Kishore


On Fri, Dec 13, 2013 at 11:51 PM, Vinod Kumar Vavilapalli 
vino...@hortonworks.com wrote:

 Yes, that is what I suspect. That is why I asked if everything is on a
 single node. If you are running linux, linux OOM killer may be shooting
 things down. When it happens, you will see something like 'killed process
 in system's syslog.

 Thanks,
 +Vinod

 On Dec 13, 2013, at 4:52 AM, Krishna Kishore Bonagiri 
 write2kish...@gmail.com wrote:

 Vinod,

   One more thing I observed is that, my Client which submits Application
 Master one after another continuously also gets killed sometimes. So, it is
 always any of the Java Processes that is getting killed. Does it indicate
 some excessive memory usage by them or something like that, that is causing
 them die? If so, how can we resolve this kind of issue?

 Thanks,
 Kishore


 On Fri, Dec 13, 2013 at 10:16 AM, Krishna Kishore Bonagiri 
 write2kish...@gmail.com wrote:

 No, I am running on 2 node cluster.


 On Fri, Dec 13, 2013 at 1:52 AM, Vinod Kumar Vavilapalli 
 vino...@hortonworks.com wrote:

 Is all of this on a single node?

  Thanks,
 +Vinod

 On Dec 12, 2013, at 3:26 AM, Krishna Kishore Bonagiri 
 write2kish...@gmail.com wrote:

 Hi,
   I am running a small application on YARN (2.2.0) in a loop of 500
 times, and while doing so one of the daemons, node manager, resource
 manager, or data node is getting killed (I mean disappearing) at a random
 point. I see no information in the corresponding log files. How can I know
 why is it happening so?

  And, one more observation is that, this is happening only when I am
 using * for node name in the container requests, otherwise when I used a
 specific node name, everything is fine.

 Thanks,
 Kishore



 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity
 to which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.





 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity
 to which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.



Re: Yarn -- one of the daemons getting killed

2013-12-16 Thread Krishna Kishore Bonagiri
Hi Vinay,

  In the out files I could see nothing other than the output of ulimit -all
. Do I need to enable any other kind of logging to get more information?

Thanks,
Kishore


On Mon, Dec 16, 2013 at 5:41 PM, Vinayakumar B vinayakuma...@huawei.comwrote:

  Hi Krishna,



 Please check the out files as well for daemons. You may find something.





 Cheers,

 Vinayakumar B



 *From:* Krishna Kishore Bonagiri [mailto:write2kish...@gmail.com]
 *Sent:* 16 December 2013 16:50
 *To:* user@hadoop.apache.org
 *Subject:* Re: Yarn -- one of the daemons getting killed



 Hi Vinod,



  Yes, I am running on Linux.



  I was actually searching for a corresponding message in /var/log/messages
 to confirm that OOM killed my daemons, but could not find any corresponding
 messages there! According to the following link, it looks like if it is a
 memory issue, I should see a messages even if OOM is disabled, but I don't
 see it.



 http://www.redhat.com/archives/taroon-list/2007-August/msg6.html



   And, is memory consumption more in case of two node cluster than a
 single node one? Also, I see this problem only when I give * as the node
 name.



   One other thing I suspected was the allowed number of user processes, I
 increased that to 31000 from 1024 but that also didn't help.



 Thanks,

 Kishore



 On Fri, Dec 13, 2013 at 11:51 PM, Vinod Kumar Vavilapalli 
 vino...@hortonworks.com wrote:

 Yes, that is what I suspect. That is why I asked if everything is on a
 single node. If you are running linux, linux OOM killer may be shooting
 things down. When it happens, you will see something like 'killed process
 in system's syslog.



 Thanks,

 +Vinod



 On Dec 13, 2013, at 4:52 AM, Krishna Kishore Bonagiri 
 write2kish...@gmail.com wrote:



  Vinod,



   One more thing I observed is that, my Client which submits Application
 Master one after another continuously also gets killed sometimes. So, it is
 always any of the Java Processes that is getting killed. Does it indicate
 some excessive memory usage by them or something like that, that is causing
 them die? If so, how can we resolve this kind of issue?



 Thanks,

 Kishore



 On Fri, Dec 13, 2013 at 10:16 AM, Krishna Kishore Bonagiri 
 write2kish...@gmail.com wrote:

 No, I am running on 2 node cluster.



 On Fri, Dec 13, 2013 at 1:52 AM, Vinod Kumar Vavilapalli 
 vino...@hortonworks.com wrote:

 Is all of this on a single node?



 Thanks,

 +Vinod



 On Dec 12, 2013, at 3:26 AM, Krishna Kishore Bonagiri 
 write2kish...@gmail.com wrote:



  Hi,

   I am running a small application on YARN (2.2.0) in a loop of 500 times,
 and while doing so one of the daemons, node manager, resource manager, or
 data node is getting killed (I mean disappearing) at a random point. I see
 no information in the corresponding log files. How can I know why is it
 happening so?



  And, one more observation is that, this is happening only when I am using
 * for node name in the container requests, otherwise when I used a
 specific node name, everything is fine.



 Thanks,

 Kishore




 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity
 to which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.








 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity
 to which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.





Re: Yarn -- one of the daemons getting killed

2013-12-13 Thread Krishna Kishore Bonagiri
Vinod,

  One more thing I observed is that, my Client which submits Application
Master one after another continuously also gets killed sometimes. So, it is
always any of the Java Processes that is getting killed. Does it indicate
some excessive memory usage by them or something like that, that is causing
them die? If so, how can we resolve this kind of issue?

Thanks,
Kishore


On Fri, Dec 13, 2013 at 10:16 AM, Krishna Kishore Bonagiri 
write2kish...@gmail.com wrote:

 No, I am running on 2 node cluster.


 On Fri, Dec 13, 2013 at 1:52 AM, Vinod Kumar Vavilapalli 
 vino...@hortonworks.com wrote:

 Is all of this on a single node?

  Thanks,
 +Vinod

 On Dec 12, 2013, at 3:26 AM, Krishna Kishore Bonagiri 
 write2kish...@gmail.com wrote:

 Hi,
   I am running a small application on YARN (2.2.0) in a loop of 500
 times, and while doing so one of the daemons, node manager, resource
 manager, or data node is getting killed (I mean disappearing) at a random
 point. I see no information in the corresponding log files. How can I know
 why is it happening so?

  And, one more observation is that, this is happening only when I am
 using * for node name in the container requests, otherwise when I used a
 specific node name, everything is fine.

 Thanks,
 Kishore



 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity
 to which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.





Yarn -- one of the daemons getting killed

2013-12-12 Thread Krishna Kishore Bonagiri
Hi,
  I am running a small application on YARN (2.2.0) in a loop of 500 times,
and while doing so one of the daemons, node manager, resource manager, or
data node is getting killed (I mean disappearing) at a random point. I see
no information in the corresponding log files. How can I know why is it
happening so?

 And, one more observation is that, this is happening only when I am using
* for node name in the container requests, otherwise when I used a
specific node name, everything is fine.

Thanks,
Kishore


Re: Yarn -- one of the daemons getting killed

2013-12-12 Thread Krishna Kishore Bonagiri
No, I am running on 2 node cluster.


On Fri, Dec 13, 2013 at 1:52 AM, Vinod Kumar Vavilapalli 
vino...@hortonworks.com wrote:

 Is all of this on a single node?

 Thanks,
 +Vinod

 On Dec 12, 2013, at 3:26 AM, Krishna Kishore Bonagiri 
 write2kish...@gmail.com wrote:

 Hi,
   I am running a small application on YARN (2.2.0) in a loop of 500 times,
 and while doing so one of the daemons, node manager, resource manager, or
 data node is getting killed (I mean disappearing) at a random point. I see
 no information in the corresponding log files. How can I know why is it
 happening so?

  And, one more observation is that, this is happening only when I am using
 * for node name in the container requests, otherwise when I used a
 specific node name, everything is fine.

 Thanks,
 Kishore



 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity
 to which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.


How to find which version of Hadoop or YARN

2013-12-09 Thread Krishna Kishore Bonagiri
Hi,

 Is there any command for finding which version of Hadoop or YARN we are
running? If so, what is that?

Thanks,
Kishore


Re: How to find which version of Hadoop or YARN

2013-12-09 Thread Krishna Kishore Bonagiri
Yes, thank you.


On Tue, Dec 10, 2013 at 11:15 AM, Brahma Reddy Battula 
brahmareddy.batt...@huawei.com wrote:

  Hi kishore,



 Hope following will help you..



 /home/hadoop-2.0.5-alpha/bin # *./hadoop version*


 *Hadoop 2.0.5-alpha Subversion *
 *http://svn.apache.org/repos/asf/hadoop/common*http://svn.apache.org/repos/asf/hadoop/common

 * -r 1488459 Compiled by jenkins on 2013-06-01T04:05Z *From source with
 checksum c8f4bd45ac25c31b815f311b32ef17
 This command was run using
 /home/hadtest/Opensource/hadoop-2.0.5-alpha/share/hadoop/common/hadoop-common-2.0.5-alpha.jar







 ---Brahma




  --
 *From:* Krishna Kishore Bonagiri [write2kish...@gmail.com]
 *Sent:* Tuesday, December 10, 2013 1:30 PM
 *To:* user@hadoop.apache.org
 *Subject:* How to find which version of Hadoop or YARN

   Hi,

  Is there any command for finding which version of Hadoop or YARN we are
 running? If so, what is that?

  Thanks,
 Kishore



Re: YARN: LocalResources and file distribution

2013-12-05 Thread Krishna Kishore Bonagiri
Hi Arun,

  I have copied a shell script to HDFS and trying to execute it on
containers. How do I specify my shell script PATH in setCommands() call
on ContainerLaunchContext? I am doing it this way

  String shellScriptPath =
hdfs://isredeng:8020/user/kbonagir/KKDummy/list.ksh;
  commands.add(shellScriptPath);

But my container execution is failing saying that there is No such file or
directory!

org.apache.hadoop.util.Shell$ExitCodeException: /bin/bash:
hdfs://isredeng:8020/user/kbonagir/KKDummy/list.ksh: No such file or
directory

at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
at org.apache.hadoop.util.Shell.run(Shell.java:379)
at
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)

I could see this file with hadoop fs command and also saw messages in
Node Manager's log saying that the resource is downloaded and localized.
So, how do I run the downloaded shell script on a container?

Thanks,
Kishore



On Tue, Dec 3, 2013 at 4:57 AM, Arun C Murthy a...@hortonworks.com wrote:

 Robert,

  YARN, by default, will only download *resource* from a shared namespace
 (e.g. HDFS).

  If /home/hadoop/robert/large_jar.jar is available on each node then you
 can specify path as file:///home/hadoop/robert/large_jar.jar and it should
 work.

  Else, you'll need to copy /home/hadoop/robert/large_jar.jar to HDFS and
 then specify hdfs://host:port/path/to/large_jar.jar.

 hth,
 Arun

 On Dec 1, 2013, at 12:03 PM, Robert Metzger metrob...@gmail.com wrote:

 Hello,

 I'm currently writing code to run my application using Yarn (Hadoop 2.2.0).
 I used this code as a skeleton:
 https://github.com/hortonworks/simple-yarn-app

 Everything works fine on my local machine or on a cluster with the shared
 directories, but when I want to access resources outside of commonly
 accessible locations, my application fails.

 I have my application in a large jar file, containing everything
 (Submission Client, Application Master, and Workers).
 The submission client registers the large jar file as a local resource for
 the Application master's context.

 In my understanding, Yarn takes care of transferring the client-local
 resources to the application master's container.
 This is also stated here:
 http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/WritingYarnApplications.html

 You can use the LocalResource to add resources to your application
 request. This will cause YARN to distribute the resource to the
 ApplicationMaster node.


 If I'm starting my jar from the dir /home/hadoop/robert/large_jar.jar,
 I'll get the following error from the nodemanager (another node in the
 cluster):

 2013-12-01 20:13:00,810 INFO
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
 Failed to download rsrc { { file:/home/hadoop/robert/large_jar.jar, ..


 So it seems as this node tries to access the file from its local file
 system.

 Do I have to use another protocol for the file, something like 
 file://host:port/home/blabla ?

 Is it true that Yarn is able to distribute files (not using hdfs
 obviously?) ?


 The distributedshell-example suggests that I have to use HDFS:
 https://github.com/apache/hadoop-common/blob/50f0de14e377091c308c3a74ed089a7e4a7f0bfe/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/Client.java


 Sincerely,
 Robert






 --
 Arun C. Murthy
 Hortonworks Inc.
 http://hortonworks.com/



 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity
 to which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.


Re: Time taken for starting AMRMClientAsync

2013-11-25 Thread Krishna Kishore Bonagiri
Hi Alejandro,

  I don't start all the AMs from the same JVM. How can I do that? Also,
when I do that, that will save me time taken to get AM started, which is
also good to see an improvement in. Please let me know how can I do that?
And, would this also save me time taken for connecting from AM to the
Resource Manager?

Thanks,
Kishore




On Tue, Nov 26, 2013 at 3:45 AM, Alejandro Abdelnur t...@cloudera.comwrote:

 Hi Krishna,

 Are you starting all AMs from the same JVM? Mind sharing the code you are
 using for your time testing?

 Thx


 On Thu, Nov 21, 2013 at 6:11 AM, Krishna Kishore Bonagiri 
 write2kish...@gmail.com wrote:

 Hi Alejandro,

  I have modified the code in


 hadoop-2.2.0-src/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher/src/main/java/org/apache/hadoop/yarn/applications/unmanagedamlauncher/UnmanagedAMLauncher.java

 to submit multiple application masters one after another and still seeing
 800 to 900 ms being taken for the start() call on AMRMClientAsync in all
 of those applications.

 Please suggest if you think I am missing something else

 Thanks,
 Kishore


 On Tue, Nov 19, 2013 at 6:07 PM, Krishna Kishore Bonagiri 
 write2kish...@gmail.com wrote:

 Hi Alejandro,

   I don't know what are managed and unmanaged AMs, can you please
 explain me what are the difference and how are each of them launched?

  I tried to google for these terms and came
 across hadoop-yarn-applications-unmanaged-am-launcher-2.2.0.jar, is it
 related to that?

 Thanks,
 Kishore


 On Tue, Nov 19, 2013 at 12:15 AM, Alejandro Abdelnur 
 t...@cloudera.comwrote:

 Kishore,

 Also, please specify if you are using managed or unmanaged AMs (the
 numbers I've mentioned before are using unmanaged AMs).

 thx


 On Sun, Nov 17, 2013 at 11:16 AM, Vinod Kumar Vavilapalli 
 vino...@hortonworks.com wrote:

 It is just creating a connection to RM and shouldn't take that long.
 Can you please file a ticket so that we can look at it?

 JVM class loading overhead is one possibility but 1 sec is a bit too
 much.

  Thanks,
 +Vinod

 On Oct 21, 2013, at 7:16 AM, Krishna Kishore Bonagiri wrote:

 Hi,
   I am seeing the following call to start() on AMRMClientAsync taking
 from 0.9 to 1 second. Why does it take that long? Is there a way to reduce
 it, I mean does it depend on any of the interval parameters or so in
 configuration files? I have tried reducing the value of the first argument
 below from 1000 to 100 seconds also, but that doesn't help.

 AMRMClientAsync.CallbackHandler allocListener = new
 RMCallbackHandler();
 amRMClient = AMRMClientAsync.createAMRMClientAsync(1000,
 allocListener);
 amRMClient.init(conf);
 amRMClient.start();


 Thanks,
 Kishore



 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or
 entity to which it is addressed and may contain information that is
 confidential, privileged and exempt from disclosure under applicable law.
 If the reader of this message is not the intended recipient, you are 
 hereby
 notified that any printing, copying, dissemination, distribution,
 disclosure or forwarding of this communication is strictly prohibited. If
 you have received this communication in error, please contact the sender
 immediately and delete it from your system. Thank You.




 --
 Alejandro






 --
 Alejandro



Re: Time taken for starting AMRMClientAsync

2013-11-21 Thread Krishna Kishore Bonagiri
Vinod,

  Do you expect managed AMs also not take as much time as a second? Or as
Alejandro was saying, only unmanaged AMs?

  I think I am using managed AMs. If managed AMs also are not expected to
take that much time, I shall raise a ticket.

Thanks,
Kishore


On Mon, Nov 18, 2013 at 12:46 AM, Vinod Kumar Vavilapalli 
vino...@hortonworks.com wrote:

 It is just creating a connection to RM and shouldn't take that long. Can
 you please file a ticket so that we can look at it?

 JVM class loading overhead is one possibility but 1 sec is a bit too much.

 Thanks,
 +Vinod

 On Oct 21, 2013, at 7:16 AM, Krishna Kishore Bonagiri wrote:

 Hi,
   I am seeing the following call to start() on AMRMClientAsync taking from
 0.9 to 1 second. Why does it take that long? Is there a way to reduce it, I
 mean does it depend on any of the interval parameters or so in
 configuration files? I have tried reducing the value of the first argument
 below from 1000 to 100 seconds also, but that doesn't help.

 AMRMClientAsync.CallbackHandler allocListener = new
 RMCallbackHandler();
 amRMClient = AMRMClientAsync.createAMRMClientAsync(1000,
 allocListener);
 amRMClient.init(conf);
 amRMClient.start();


 Thanks,
 Kishore



 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity
 to which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.


Re: Time taken for starting AMRMClientAsync

2013-11-21 Thread Krishna Kishore Bonagiri
Hi Alejandro,

 I have modified the code in

hadoop-2.2.0-src/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher/src/main/java/org/apache/hadoop/yarn/applications/unmanagedamlauncher/UnmanagedAMLauncher.java

to submit multiple application masters one after another and still seeing
800 to 900 ms being taken for the start() call on AMRMClientAsync in all of
those applications.

Please suggest if you think I am missing something else

Thanks,
Kishore


On Tue, Nov 19, 2013 at 6:07 PM, Krishna Kishore Bonagiri 
write2kish...@gmail.com wrote:

 Hi Alejandro,

   I don't know what are managed and unmanaged AMs, can you please explain
 me what are the difference and how are each of them launched?

  I tried to google for these terms and came
 across hadoop-yarn-applications-unmanaged-am-launcher-2.2.0.jar, is it
 related to that?

 Thanks,
 Kishore


 On Tue, Nov 19, 2013 at 12:15 AM, Alejandro Abdelnur t...@cloudera.comwrote:

 Kishore,

 Also, please specify if you are using managed or unmanaged AMs (the
 numbers I've mentioned before are using unmanaged AMs).

 thx


 On Sun, Nov 17, 2013 at 11:16 AM, Vinod Kumar Vavilapalli 
 vino...@hortonworks.com wrote:

 It is just creating a connection to RM and shouldn't take that long. Can
 you please file a ticket so that we can look at it?

 JVM class loading overhead is one possibility but 1 sec is a bit too
 much.

  Thanks,
 +Vinod

 On Oct 21, 2013, at 7:16 AM, Krishna Kishore Bonagiri wrote:

 Hi,
   I am seeing the following call to start() on AMRMClientAsync taking
 from 0.9 to 1 second. Why does it take that long? Is there a way to reduce
 it, I mean does it depend on any of the interval parameters or so in
 configuration files? I have tried reducing the value of the first argument
 below from 1000 to 100 seconds also, but that doesn't help.

 AMRMClientAsync.CallbackHandler allocListener = new
 RMCallbackHandler();
 amRMClient = AMRMClientAsync.createAMRMClientAsync(1000,
 allocListener);
 amRMClient.init(conf);
 amRMClient.start();


 Thanks,
 Kishore



 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity
 to which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.




 --
 Alejandro





Unmanaged AMs

2013-11-21 Thread Krishna Kishore Bonagiri
Hi,

  I have seen in comments for code in UnmanagedAMLauncher.java that AM can
be in any language. What does that mean? Can AM be written in C++ language?
If so, how would I be able to be connect to RM and how would I be able to
request for containers? I mean what is the interface doing these things? Is
there a sample code/example somewhere to get an idea about how to do it?

Thanks,
Kishore


Running multiple application from the same Application Master

2013-11-21 Thread Krishna Kishore Bonagiri
Hi,

  I was reading on this link

http://hortonworks.com/blog/apache-hadoop-yarn-concepts-and-applications/

that we can implement an Application Master to manage multiple
applications. The text reads like this:

*It’s useful to remember that, in reality, every application has its own
instance of an ApplicationMaster. However, it’s completely feasible to
implement an ApplicationMaster to manage a set of applications (e.g.
ApplicationMaster for Pig or Hive to manage a set of MapReduce jobs). *

How is it possible? Do they have different application IDs and are they
treated as different applications on the cluster? What are the
positive/negative implications of it? Is it a recommended way?

Thanks,
Kishore


Re: Time taken for starting AMRMClientAsync

2013-11-19 Thread Krishna Kishore Bonagiri
Hi Alejandro,

  I don't know what are managed and unmanaged AMs, can you please explain
me what are the difference and how are each of them launched?

 I tried to google for these terms and came
across hadoop-yarn-applications-unmanaged-am-launcher-2.2.0.jar, is it
related to that?

Thanks,
Kishore


On Tue, Nov 19, 2013 at 12:15 AM, Alejandro Abdelnur t...@cloudera.comwrote:

 Kishore,

 Also, please specify if you are using managed or unmanaged AMs (the
 numbers I've mentioned before are using unmanaged AMs).

 thx


 On Sun, Nov 17, 2013 at 11:16 AM, Vinod Kumar Vavilapalli 
 vino...@hortonworks.com wrote:

 It is just creating a connection to RM and shouldn't take that long. Can
 you please file a ticket so that we can look at it?

 JVM class loading overhead is one possibility but 1 sec is a bit too much.

  Thanks,
 +Vinod

 On Oct 21, 2013, at 7:16 AM, Krishna Kishore Bonagiri wrote:

 Hi,
   I am seeing the following call to start() on AMRMClientAsync taking
 from 0.9 to 1 second. Why does it take that long? Is there a way to reduce
 it, I mean does it depend on any of the interval parameters or so in
 configuration files? I have tried reducing the value of the first argument
 below from 1000 to 100 seconds also, but that doesn't help.

 AMRMClientAsync.CallbackHandler allocListener = new
 RMCallbackHandler();
 amRMClient = AMRMClientAsync.createAMRMClientAsync(1000,
 allocListener);
 amRMClient.init(conf);
 amRMClient.start();


 Thanks,
 Kishore



 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity
 to which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.




 --
 Alejandro



Re: Time taken for starting AMRMClientAsync

2013-11-17 Thread Krishna Kishore Bonagiri
Hi Alejandro,

  Can you please see if you can answer my question above? I would like to
reduce the time taken by the above calls made by my Application Master, the
way you do.

Thanks,
Kishore


On Tue, Oct 22, 2013 at 3:09 PM, Krishna Kishore Bonagiri 
write2kish...@gmail.com wrote:

 Hi Alejandro,

   I submit all my applications from a single Client, but all of my
 application masters are taking almost the same amount of time for finishing
 the above calls. Do you reuse ApplicationMaster instances or do some other
 thing for saving this time? Otherwise I felt the fresh application
 connecting to the resource manager would take the same amount of time
 although I don't know why should it take that much?

 Thanks,
 Kishore


 On Mon, Oct 21, 2013 at 9:23 PM, Alejandro Abdelnur t...@cloudera.comwrote:

 Hi Krishna,

 Those 900ms seems consistent with the numbers we  found while doing some
 benchmarks in the context of Llama:

 http://cloudera.github.io/llama/

 We found that the first application master created from a client process
 takes around 900 ms to be ready to submit resource requests. Subsequent
 application masters created from the same client process take a mean of 20
 ms. The application master submission throughput (discarding the first
 submission) tops at approximately 100 application masters per second.

 I believe there is room for improvement there.

 Cheers


 On Mon, Oct 21, 2013 at 7:16 AM, Krishna Kishore Bonagiri 
 write2kish...@gmail.com wrote:

 Hi,
   I am seeing the following call to start() on AMRMClientAsync taking
 from 0.9 to 1 second. Why does it take that long? Is there a way to reduce
 it, I mean does it depend on any of the interval parameters or so in
 configuration files? I have tried reducing the value of the first argument
 below from 1000 to 100 seconds also, but that doesn't help.

 AMRMClientAsync.CallbackHandler allocListener = new
 RMCallbackHandler();
 amRMClient = AMRMClientAsync.createAMRMClientAsync(1000,
 allocListener);
 amRMClient.init(conf);
 amRMClient.start();


 Thanks,
 Kishore




 --
 Alejandro





Re: Time taken for starting AMRMClientAsync

2013-10-22 Thread Krishna Kishore Bonagiri
Hi Alejandro,

  I submit all my applications from a single Client, but all of my
application masters are taking almost the same amount of time for finishing
the above calls. Do you reuse ApplicationMaster instances or do some other
thing for saving this time? Otherwise I felt the fresh application
connecting to the resource manager would take the same amount of time
although I don't know why should it take that much?

Thanks,
Kishore


On Mon, Oct 21, 2013 at 9:23 PM, Alejandro Abdelnur t...@cloudera.comwrote:

 Hi Krishna,

 Those 900ms seems consistent with the numbers we  found while doing some
 benchmarks in the context of Llama:

 http://cloudera.github.io/llama/

 We found that the first application master created from a client process
 takes around 900 ms to be ready to submit resource requests. Subsequent
 application masters created from the same client process take a mean of 20
 ms. The application master submission throughput (discarding the first
 submission) tops at approximately 100 application masters per second.

 I believe there is room for improvement there.

 Cheers


 On Mon, Oct 21, 2013 at 7:16 AM, Krishna Kishore Bonagiri 
 write2kish...@gmail.com wrote:

 Hi,
   I am seeing the following call to start() on AMRMClientAsync taking
 from 0.9 to 1 second. Why does it take that long? Is there a way to reduce
 it, I mean does it depend on any of the interval parameters or so in
 configuration files? I have tried reducing the value of the first argument
 below from 1000 to 100 seconds also, but that doesn't help.

 AMRMClientAsync.CallbackHandler allocListener = new
 RMCallbackHandler();
 amRMClient = AMRMClientAsync.createAMRMClientAsync(1000,
 allocListener);
 amRMClient.init(conf);
 amRMClient.start();


 Thanks,
 Kishore




 --
 Alejandro



Time taken for starting AMRMClientAsync

2013-10-21 Thread Krishna Kishore Bonagiri
Hi,
  I am seeing the following call to start() on AMRMClientAsync taking from
0.9 to 1 second. Why does it take that long? Is there a way to reduce it, I
mean does it depend on any of the interval parameters or so in
configuration files? I have tried reducing the value of the first argument
below from 1000 to 100 seconds also, but that doesn't help.

AMRMClientAsync.CallbackHandler allocListener = new RMCallbackHandler();
amRMClient = AMRMClientAsync.createAMRMClientAsync(1000, allocListener);
amRMClient.init(conf);
amRMClient.start();


Thanks,
Kishore


Re: Hadoop-2.0.1 log files deletion

2013-10-10 Thread Krishna Kishore Bonagiri
Hi Reyane,

   Did you try yarn.nodemanager.log.retain-seconds? increasing that might
help. The default value is 10800 seconds, that means 3 hours.

Thanks,
Kishore


On Thu, Oct 10, 2013 at 8:27 PM, Reyane Oukpedjo oukped...@gmail.comwrote:

 Hi there,

 I was running somme mapreduce jobs on hadoop-2.1.0-beta . These are
 multiple unit tests that can take more than a day to finish running.
 However I realized the logs for the jobs are being deleted some how quickly
 than the default 24 hours setting of mapreduce.job.userlog.retain.hours
 property on mapred-site.xml. some of  the job logs were deleted after 4
 hours. Can this be a bug or if not is there any other property that
 override this ?


 Thank you.

 Reyane OUKPEDJO









Single Yarn Client -- multiple applications?

2013-10-03 Thread Krishna Kishore Bonagiri
Hi,

  Can we submit multiple applications from the same Client class? It seems
to be allowed now, I just tried it with Distributed Shell example...

  Is it OK to do so? or does it have any wrong implications?

Thanks,
Kishore


Can container requests be made paralelly from multiple threads

2013-09-27 Thread Krishna Kishore Bonagiri
Hi,

  Can we submit container requests from multiple threads in parallel to the
Resource Manager?

Thanks,
Kishore


Re: Can container requests be made paralelly from multiple threads

2013-09-27 Thread Krishna Kishore Bonagiri
Hi Omkar,

  Thanks for the quick reply. I have a requirement for sets of containers
depending on some of my business logic. I found that each of the request
allocations is taking around 2 seconds, so I am thinking of doing the
requests at the same from multiple threads.

Kishore


On Fri, Sep 27, 2013 at 11:27 PM, Omkar Joshi ojo...@hortonworks.comwrote:

 Hi,

 I suggest you should not do that. After YARN-744 goes in this will be
 prevented on RM side. May I know why you want to do this? any advantage/
 use case?

 Thanks,
 Omkar Joshi
 *Hortonworks Inc.* http://www.hortonworks.com


 On Fri, Sep 27, 2013 at 8:31 AM, Krishna Kishore Bonagiri 
 write2kish...@gmail.com wrote:

 Hi,

   Can we submit container requests from multiple threads in parallel to
 the Resource Manager?

 Thanks,
 Kishore



 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity
 to which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.


Re: Container allocation fails randomly

2013-09-18 Thread Krishna Kishore Bonagiri
Hi Omkar,

  It is my own custom AM that I am using, not the MR-AM. But I am still not
able to believe how can a negative value go from the getProgress() call
which is always calculated from division of positive numbers, but might be
some floating point computation problems as you are saying.

Thanks,
Kishroe



On Thu, Sep 19, 2013 at 5:52 AM, Omkar Joshi ojo...@hortonworks.com wrote:

 This is clearly an AM bug. are you using MR-AM or custom AM? you should
 check AM code which is computing progress. I suspect there must be some
 float computation problems. If it is an MR-AM problem then please file a
 map reduce bug.

 Thanks,
 Omkar Joshi
 *Hortonworks Inc.* http://www.hortonworks.com


 On Tue, Sep 17, 2013 at 2:47 AM, Krishna Kishore Bonagiri 
 write2kish...@gmail.com wrote:

 Hi Omkar,

   Thanks for the quick reply, and sorry for not being able to get the
 required logs that you have asked for.

   But in the meanwhile I just wanted to check if you can get a clue with
 the information I have now. I am seeing the following kind of error message
 in AppMaster.stderr whenever this failure is happening. I don't know why
 does it happen, the getProgress() call that I have implemented
 in RMCallbackHandler could never return a negative value! Doesn't this
 error mean that this getProgress() is giving a -ve value?

 Exception in thread AMRM Heartbeater thread
 java.lang.IllegalArgumentException: Progress indicator should not be
 negative
 at
 com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
 at
 org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.allocate(AMRMClientImpl.java:199)
 at
 org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$HeartbeatThread.run(AMRMClientAsyncImpl.java:224)

 Thanks,
 Kishore


 On Fri, Sep 13, 2013 at 2:59 AM, Omkar Joshi ojo...@hortonworks.comwrote:

 Can you give more information? logs (complete) will help a lot around
 this time frame. Are the containers getting assigned via scheduler? is it
 failing when node manager tries to start container? I clearly see the
 diagnostic message is empty but do you see anything in NM logs? Also if
 there were running containers on the machine before launching new ones..
 then are they killed? or they are still hanging around? can you also try
 applying patch https://issues.apache.org/jira/browse/YARN-1053; ? and
 check if you can see any message?

 Thanks,
 Omkar Joshi
 *Hortonworks Inc.* http://www.hortonworks.com


 On Thu, Sep 12, 2013 at 6:15 AM, Krishna Kishore Bonagiri 
 write2kish...@gmail.com wrote:

 Hi,
   I am using 2.1.0-beta and have seen container allocation failing
 randomly even when running the same application in a loop. I know that the
 cluster has enough resources to give, because it gave the resources for the
 same application all the other times in the loop and ran it successfully.

I have observed a lot of the following kind of messages in the node
 manager's log whenever such failure happens, any clues as to why it 
 happens?

 2013-09-12 08:54:36,204 INFO
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending
 out status for container: container_id { app_attempt_id { application_id {
 id: 2 cluster_timestamp: 1378990400253 } attemptId: 1 } id: 1 } state:
 C_RUNNING diagnostics:  exit_status: -1000
 2013-09-12 08:54:37,220 INFO
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending
 out status for container: container_id { app_attempt_id { application_id {
 id: 2 cluster_timestamp: 1378990400253 } attemptId: 1 } id: 1 } state:
 C_RUNNING diagnostics:  exit_status: -1000
 2013-09-12 08:54:38,231 INFO
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending
 out status for container: container_id { app_attempt_id { application_id {
 id: 2 cluster_timestamp: 1378990400253 } attemptId: 1 } id: 1 } state:
 C_RUNNING diagnostics:  exit_status: -1000
 2013-09-12 08:54:39,239 INFO
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending
 out status for container: container_id { app_attempt_id { application_id {
 id: 2 cluster_timestamp: 1378990400253 } attemptId: 1 } id: 1 } state:
 C_RUNNING diagnostics:  exit_status: -1000
 2013-09-12 08:54:40,267 INFO
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending
 out status for container: container_id { app_attempt_id { application_id {
 id: 2 cluster_timestamp: 1378990400253 } attemptId: 1 } id: 1 } state:
 C_RUNNING diagnostics:  exit_status: -1000
 2013-09-12 08:54:41,275 INFO
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending
 out status for container: container_id { app_attempt_id { application_id {
 id: 2 cluster_timestamp: 1378990400253 } attemptId: 1 } id: 1 } state:
 C_RUNNING diagnostics:  exit_status: -1000
 2013-09-12 08:54:42,283 INFO
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending
 out status for container: container_id { app_attempt_id { application_id {
 id

Re: Container allocation fails randomly

2013-09-17 Thread Krishna Kishore Bonagiri
Hi Omkar,

  Thanks for the quick reply, and sorry for not being able to get the
required logs that you have asked for.

  But in the meanwhile I just wanted to check if you can get a clue with
the information I have now. I am seeing the following kind of error message
in AppMaster.stderr whenever this failure is happening. I don't know why
does it happen, the getProgress() call that I have implemented
in RMCallbackHandler could never return a negative value! Doesn't this
error mean that this getProgress() is giving a -ve value?

Exception in thread AMRM Heartbeater thread
java.lang.IllegalArgumentException: Progress indicator should not be
negative
at
com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
at
org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.allocate(AMRMClientImpl.java:199)
at
org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$HeartbeatThread.run(AMRMClientAsyncImpl.java:224)

Thanks,
Kishore


On Fri, Sep 13, 2013 at 2:59 AM, Omkar Joshi ojo...@hortonworks.com wrote:

 Can you give more information? logs (complete) will help a lot around this
 time frame. Are the containers getting assigned via scheduler? is it
 failing when node manager tries to start container? I clearly see the
 diagnostic message is empty but do you see anything in NM logs? Also if
 there were running containers on the machine before launching new ones..
 then are they killed? or they are still hanging around? can you also try
 applying patch https://issues.apache.org/jira/browse/YARN-1053; ? and
 check if you can see any message?

 Thanks,
 Omkar Joshi
 *Hortonworks Inc.* http://www.hortonworks.com


 On Thu, Sep 12, 2013 at 6:15 AM, Krishna Kishore Bonagiri 
 write2kish...@gmail.com wrote:

 Hi,
   I am using 2.1.0-beta and have seen container allocation failing
 randomly even when running the same application in a loop. I know that the
 cluster has enough resources to give, because it gave the resources for the
 same application all the other times in the loop and ran it successfully.

I have observed a lot of the following kind of messages in the node
 manager's log whenever such failure happens, any clues as to why it happens?

 2013-09-12 08:54:36,204 INFO
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending
 out status for container: container_id { app_attempt_id { application_id {
 id: 2 cluster_timestamp: 1378990400253 } attemptId: 1 } id: 1 } state:
 C_RUNNING diagnostics:  exit_status: -1000
 2013-09-12 08:54:37,220 INFO
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending
 out status for container: container_id { app_attempt_id { application_id {
 id: 2 cluster_timestamp: 1378990400253 } attemptId: 1 } id: 1 } state:
 C_RUNNING diagnostics:  exit_status: -1000
 2013-09-12 08:54:38,231 INFO
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending
 out status for container: container_id { app_attempt_id { application_id {
 id: 2 cluster_timestamp: 1378990400253 } attemptId: 1 } id: 1 } state:
 C_RUNNING diagnostics:  exit_status: -1000
 2013-09-12 08:54:39,239 INFO
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending
 out status for container: container_id { app_attempt_id { application_id {
 id: 2 cluster_timestamp: 1378990400253 } attemptId: 1 } id: 1 } state:
 C_RUNNING diagnostics:  exit_status: -1000
 2013-09-12 08:54:40,267 INFO
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending
 out status for container: container_id { app_attempt_id { application_id {
 id: 2 cluster_timestamp: 1378990400253 } attemptId: 1 } id: 1 } state:
 C_RUNNING diagnostics:  exit_status: -1000
 2013-09-12 08:54:41,275 INFO
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending
 out status for container: container_id { app_attempt_id { application_id {
 id: 2 cluster_timestamp: 1378990400253 } attemptId: 1 } id: 1 } state:
 C_RUNNING diagnostics:  exit_status: -1000
 2013-09-12 08:54:42,283 INFO
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending
 out status for container: container_id { app_attempt_id { application_id {
 id: 2 cluster_timestamp: 1378990400253 } attemptId: 1 } id: 1 } state:
 C_RUNNING diagnostics:  exit_status: -1000
 2013-09-12 08:54:43,289 INFO
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending
 out status for container: container_id { app_attempt_id { application_id {
 id: 2 cluster_timestamp: 1378990400253 } attemptId: 1 } id: 1 } state:
 C_RUNNING diagnostics:  exit_status: -1000


 Thanks,
 Kishore



 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity
 to which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution

Container allocation fails randomly

2013-09-12 Thread Krishna Kishore Bonagiri
Hi,
  I am using 2.1.0-beta and have seen container allocation failing randomly
even when running the same application in a loop. I know that the cluster
has enough resources to give, because it gave the resources for the same
application all the other times in the loop and ran it successfully.

   I have observed a lot of the following kind of messages in the node
manager's log whenever such failure happens, any clues as to why it happens?

2013-09-12 08:54:36,204 INFO
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending
out status for container: container_id { app_attempt_id { application_id {
id: 2 cluster_timestamp: 1378990400253 } attemptId: 1 } id: 1 } state:
C_RUNNING diagnostics:  exit_status: -1000
2013-09-12 08:54:37,220 INFO
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending
out status for container: container_id { app_attempt_id { application_id {
id: 2 cluster_timestamp: 1378990400253 } attemptId: 1 } id: 1 } state:
C_RUNNING diagnostics:  exit_status: -1000
2013-09-12 08:54:38,231 INFO
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending
out status for container: container_id { app_attempt_id { application_id {
id: 2 cluster_timestamp: 1378990400253 } attemptId: 1 } id: 1 } state:
C_RUNNING diagnostics:  exit_status: -1000
2013-09-12 08:54:39,239 INFO
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending
out status for container: container_id { app_attempt_id { application_id {
id: 2 cluster_timestamp: 1378990400253 } attemptId: 1 } id: 1 } state:
C_RUNNING diagnostics:  exit_status: -1000
2013-09-12 08:54:40,267 INFO
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending
out status for container: container_id { app_attempt_id { application_id {
id: 2 cluster_timestamp: 1378990400253 } attemptId: 1 } id: 1 } state:
C_RUNNING diagnostics:  exit_status: -1000
2013-09-12 08:54:41,275 INFO
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending
out status for container: container_id { app_attempt_id { application_id {
id: 2 cluster_timestamp: 1378990400253 } attemptId: 1 } id: 1 } state:
C_RUNNING diagnostics:  exit_status: -1000
2013-09-12 08:54:42,283 INFO
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending
out status for container: container_id { app_attempt_id { application_id {
id: 2 cluster_timestamp: 1378990400253 } attemptId: 1 } id: 1 } state:
C_RUNNING diagnostics:  exit_status: -1000
2013-09-12 08:54:43,289 INFO
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending
out status for container: container_id { app_attempt_id { application_id {
id: 2 cluster_timestamp: 1378990400253 } attemptId: 1 } id: 1 } state:
C_RUNNING diagnostics:  exit_status: -1000


Thanks,
Kishore


Requesting set of containers on a single node

2013-08-13 Thread Krishna Kishore Bonagiri
Hi,
  My application has a group of processes that need to communicate with
each other either through Shared Memory or TCP/IP depending on where the
containers are allocated, on the same machine or on different machines.

 Obviously I would like them to get them allocated on the same node
whenever possible which requires all of the containers to be on the same
node. But I don't want to specify a node name in my request because I don't
bother wherever they run in the cluster but all of them have to be on the
same node. Is there a way to make such a request for containers currently?
Or if not, I think this would be good to be have because many applications
could have such kind of requirement.

Thanks,
Kishore


Re: whitelist feature of YARN

2013-08-07 Thread Krishna Kishore Bonagiri
Hi Sandy,

  Thanks for the reply and it is good to know YARN-521 is done! Please
answer my following questions

1) when is 2.1.0-beta going to be released? is it soon or do you suggest me
take it from the trunk or is there a recent release candidate available?

2) I have recently changed my application to use the new Asynchronous
interfaces. I am hoping it works with that too, correct me if I am wrong.

3) Change in interface:

The old interface for ContainerRequest constructor used to be this:

 public ContainerRequest(Resource capability, String[] nodes,
String[] racks, Priority priority, int containerCount);

where as now it is changed to

a) public ContainerRequest(Resource capability, String[] nodes,
String[] racks, Priority priority)


b) public ContainerRequest(Resource capability, String[] nodes,
String[] racks, Priority priority, boolean relaxLocality)

that means the old argument containerCount is gone! How would I be able to
specify how many containers do I need?

-Kishore




On Wed, Aug 7, 2013 at 11:37 AM, Sandy Ryza sandy.r...@cloudera.com wrote:

 YARN-521, which brings whitelisting to the AMRMClient APIs, is now
 included in 2.1.0-beta.  Check out the doc for the relaxLocality paramater
 in ContainerRequest in AMRMClient:
 https://github.com/apache/hadoop-common/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/AMRMClient.java
  and
 I can help clarify here if anything's confusing.

 -Sandy


 On Tue, Jul 9, 2013 at 2:54 AM, Krishna Kishore Bonagiri 
 write2kish...@gmail.com wrote:

 Hi Sandy,

   Yes, I have been using AMRMClient APIs. I am planning to shift to
 whatever way is this white list feature is supported with. But am not sure
 what is meant by submitting ResourceRequests directly to RM. Can you please
 elaborate on this or give me a pointer to some example code on how to do
 it...

Thanks for the reply,

 -Kishore


 On Mon, Jul 8, 2013 at 10:53 PM, Sandy Ryza sandy.r...@cloudera.comwrote:

 Hi Krishna,

 From your previous email, it looks like you are using the AMRMClient
 APIs.  Support for whitelisting is not yet supported through them.  I am
 working on this in YARN-521, which should be included in the next release
 after 2.1.0-beta.  If you are submitting ResourceRequests directly to the
 RM, you can whitelist a node by
 * setting the relaxLocality flag on the node-level ResourceRequest to
 true
 * setting the relaxLocality flag on the corresponding rack-level
 ResourceRequest to false
 * setting the relaxLocality flag on the corresponding any-level
 ResourceRequest to false

 -Sandy


 On Mon, Jul 8, 2013 at 6:48 AM, Krishna Kishore Bonagiri 
 write2kish...@gmail.com wrote:

 Hi,

   Can someone please point to some example code of how to use the
 whitelist feature of YARN, I have recently got RC1 for hadoop-2.1.0-beta
 and want to use this feature.

   It would be great if you can point me to some description of what
 this white listing feature is, I have gone through some JIRA logs related
 to this but more concrete explanation would be helpful.

 Thanks,
 Kishore







Re: whitelist feature of YARN

2013-08-07 Thread Krishna Kishore Bonagiri
Sandy,
  Thanks again. I found RC1 for 2.1.0-beta available at
http://people.apache.org/~acmurthy/hadoop-2.1.0-beta-rc1/
   Would this have the fix for YARN-521? and, can I use that?

-Kishore


On Wed, Aug 7, 2013 at 12:35 PM, Sandy Ryza sandy.r...@cloudera.com wrote:

 Responses inline:


 On Tue, Aug 6, 2013 at 11:55 PM, Krishna Kishore Bonagiri 
 write2kish...@gmail.com wrote:

 Hi Sandy,

   Thanks for the reply and it is good to know YARN-521 is done! Please
 answer my following questions

 1) when is 2.1.0-beta going to be released? is it soon or do you suggest
 me take it from the trunk or is there a recent release candidate available?

 We're very close and my guess would be no later than the end of the
 month (don't hold me to this).


 2) I have recently changed my application to use the new Asynchronous
 interfaces. I am hoping it works with that too, correct me if I am wrong.

 ContainerRequest is shared by the async interfaces as well so it should
 work here.


 3) Change in interface:

 The old interface for ContainerRequest constructor used to be this:

  public ContainerRequest(Resource capability, String[] nodes,
 String[] racks, Priority priority, int containerCount);

 where as now it is changed to

 a) public ContainerRequest(Resource capability, String[] nodes,
 String[] racks, Priority priority)
 

 b) public ContainerRequest(Resource capability, String[] nodes,
 String[] racks, Priority priority, boolean relaxLocality)

 that means the old argument containerCount is gone! How would I be able
 to specify how many containers do I need?

 We now expect that you submit a ContainerRequest for each container you
 want.


 -Kishore




 On Wed, Aug 7, 2013 at 11:37 AM, Sandy Ryza sandy.r...@cloudera.comwrote:

 YARN-521, which brings whitelisting to the AMRMClient APIs, is now
 included in 2.1.0-beta.  Check out the doc for the relaxLocality paramater
 in ContainerRequest in AMRMClient:
 https://github.com/apache/hadoop-common/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/AMRMClient.java
  and
 I can help clarify here if anything's confusing.

 -Sandy


 On Tue, Jul 9, 2013 at 2:54 AM, Krishna Kishore Bonagiri 
 write2kish...@gmail.com wrote:

 Hi Sandy,

   Yes, I have been using AMRMClient APIs. I am planning to shift to
 whatever way is this white list feature is supported with. But am not sure
 what is meant by submitting ResourceRequests directly to RM. Can you please
 elaborate on this or give me a pointer to some example code on how to do
 it...

Thanks for the reply,

 -Kishore


 On Mon, Jul 8, 2013 at 10:53 PM, Sandy Ryza sandy.r...@cloudera.comwrote:

 Hi Krishna,

 From your previous email, it looks like you are using the AMRMClient
 APIs.  Support for whitelisting is not yet supported through them.  I am
 working on this in YARN-521, which should be included in the next release
 after 2.1.0-beta.  If you are submitting ResourceRequests directly to the
 RM, you can whitelist a node by
 * setting the relaxLocality flag on the node-level ResourceRequest to
 true
 * setting the relaxLocality flag on the corresponding rack-level
 ResourceRequest to false
 * setting the relaxLocality flag on the corresponding any-level
 ResourceRequest to false

 -Sandy


 On Mon, Jul 8, 2013 at 6:48 AM, Krishna Kishore Bonagiri 
 write2kish...@gmail.com wrote:

 Hi,

   Can someone please point to some example code of how to use the
 whitelist feature of YARN, I have recently got RC1 for hadoop-2.1.0-beta
 and want to use this feature.

   It would be great if you can point me to some description of what
 this white listing feature is, I have gone through some JIRA logs related
 to this but more concrete explanation would be helpful.

 Thanks,
 Kishore









Re: Extra start-up overhead with hadoop-2.1.0-beta

2013-08-07 Thread Krishna Kishore Bonagiri
Hi Omkar,

 Can you please see if you can answer my question with this info or if you
need anything else from me?

 Also, does resource localization improve or impact any performance?

Thanks,
Kishore


On Thu, Aug 1, 2013 at 11:20 PM, Omkar Joshi ojo...@hortonworks.com wrote:

 How are you making these measurements can you elaborate more? Is it on a
 best case basis or on an average or worst case? How many resources are you
 sending it for localization? were the sizes and number of these resources
 consistent across tests? Were these resources public/private/application
 specific? Apart from this is the other load on node manager same? is the
 load on hdfs same? did you see any network bottleneck?

 More information will help a lot.


 Thanks,
 Omkar Joshi
 *Hortonworks Inc.* http://www.hortonworks.com


 On Thu, Aug 1, 2013 at 2:19 AM, Krishna Kishore Bonagiri 
 write2kish...@gmail.com wrote:

 Hi,
   Please share with me if you anyone has an answer or clues to my
 question regarding the start up performance.

 Also, one more thing I have observed today is the time taken to run a
 command on a container went up by more than a second in this latest version.

 When using 2.0.4-alpha, it used to take 0.3 to 0.5 seconds from the point
 I call startContainer() to the  point the command is started on the
 container.

 where as

 When using 2.1.0-beta, it is taking around 1.5 seconds from the point it
 came to the call back onContainerStarted() to the point the command is seen
 started running on the container.

 Thanks,
 Kishore


 On Thu, Jul 25, 2013 at 8:38 PM, Krishna Kishore Bonagiri 
 write2kish...@gmail.com wrote:

 Hi,

   I have been using the hadoop-2.0.1-beta release candidate and observed
 that it is slower in running my simple application that runs on 2
 containers. I have tried to find out which parts of it is really having
 this extra overhead(compared to hadoop-2.0.4-alpha), and here is what I
 found that.

 1) From the point my Client has submitted the Application Master to RM,
 it is taking 2  seconds extra
 2) From the point my container request are set up by Application Master,
 till the containers are allocated, it is taking 2 seconds extra

 Is this overhead expected with the changes that went into the new
 version? Or is there to improve it by changing something in configurations
 or so?

 Thanks,
 Kishore






Re: Extra start-up overhead with hadoop-2.1.0-beta

2013-08-07 Thread Krishna Kishore Bonagiri
No Ravi, I am not running any MR job. Also, my configuration files are not
big.


On Wed, Aug 7, 2013 at 11:12 PM, Ravi Prakash ravi...@ymail.com wrote:

 I believe https://issues.apache.org/jira/browse/MAPREDUCE-5399 causes
 performance degradation in cases where there are a lot of reducers. I can
 imagine it causing degradation if the configuration files are super big /
 some other weird cases.


   --
  *From:* Krishna Kishore Bonagiri write2kish...@gmail.com
 *To:* user@hadoop.apache.org
 *Sent:* Wednesday, August 7, 2013 10:03 AM
 *Subject:* Re: Extra start-up overhead with hadoop-2.1.0-beta

 Hi Omkar,

  Can you please see if you can answer my question with this info or if you
 need anything else from me?

  Also, does resource localization improve or impact any performance?

 Thanks,
 Kishore


 On Thu, Aug 1, 2013 at 11:20 PM, Omkar Joshi ojo...@hortonworks.comwrote:

 How are you making these measurements can you elaborate more? Is it on a
 best case basis or on an average or worst case? How many resources are you
 sending it for localization? were the sizes and number of these resources
 consistent across tests? Were these resources public/private/application
 specific? Apart from this is the other load on node manager same? is the
 load on hdfs same? did you see any network bottleneck?

 More information will help a lot.


 Thanks,
 Omkar Joshi
 *Hortonworks Inc.* http://www.hortonworks.com/


 On Thu, Aug 1, 2013 at 2:19 AM, Krishna Kishore Bonagiri 
 write2kish...@gmail.com wrote:

 Hi,
   Please share with me if you anyone has an answer or clues to my question
 regarding the start up performance.

 Also, one more thing I have observed today is the time taken to run a
 command on a container went up by more than a second in this latest version.

 When using 2.0.4-alpha, it used to take 0.3 to 0.5 seconds from the point
 I call startContainer() to the  point the command is started on the
 container.

 where as

 When using 2.1.0-beta, it is taking around 1.5 seconds from the point it
 came to the call back onContainerStarted() to the point the command is seen
 started running on the container.

 Thanks,
 Kishore


 On Thu, Jul 25, 2013 at 8:38 PM, Krishna Kishore Bonagiri 
 write2kish...@gmail.com wrote:

 Hi,

   I have been using the hadoop-2.0.1-beta release candidate and observed
 that it is slower in running my simple application that runs on 2
 containers. I have tried to find out which parts of it is really having
 this extra overhead(compared to hadoop-2.0.4-alpha), and here is what I
 found that.

 1) From the point my Client has submitted the Application Master to RM, it
 is taking 2  seconds extra
 2) From the point my container request are set up by Application Master,
 till the containers are allocated, it is taking 2 seconds extra

 Is this overhead expected with the changes that went into the new version?
 Or is there to improve it by changing something in configurations or so?

 Thanks,
 Kishore









Re: setLocalResources() on ContainerLaunchContext

2013-08-07 Thread Krishna Kishore Bonagiri
Hi Omkar,

  I will try that. I might have got 2 of '/' wrongly while trying it in
different ways to make it work. The file kishore/kk.ksh is accessible to
the same user that is running the AM container.

  And my another questions is to understand what are the exact benefits of
using this resource localization? Can you please explain me briefly or
point me some online documentation talking about it?

Thanks,
Kishore


On Wed, Aug 7, 2013 at 11:49 PM, Omkar Joshi ojo...@hortonworks.com wrote:

 Good that your timestamp worked... Now for hdfs try this
 hdfs://hdfs-host-name:hdfs-host-portabsolute-path
 now verify that your absolute path is correct. I hope it will work.
 bin/hadoop fs -ls absolute-path


 hdfs://isredeng:8020*//*kishore/kk.ksh... why // ?? you have hdfs file
 at absolute location /kishore/kk.sh? is /kishore and /kishore/kk.sh
 accessible to the user who is making startContainer call or the one running
 AM container?

 Thanks,
 Omkar Joshi
 *Hortonworks Inc.* http://www.hortonworks.com


 On Tue, Aug 6, 2013 at 10:43 PM, Krishna Kishore Bonagiri 
 write2kish...@gmail.com wrote:

 Hi Harsh, Hitesh  Omkar,

   Thanks for the replies.

 I tried getting the last modified timestamp like this and it works. Is
 this a right thing to do?

   File file = new File(/home_/dsadm/kishore/kk.ksh);
   shellRsrc.setTimestamp(file.lastModified());


 And, when I tried using a hdfs file qualifying it with both node name and
 port, it didn't work, I get a similar error as earlier.

   String shellScriptPath = hdfs://isredeng:8020//kishore/kk.ksh;


 13/08/07 01:36:28 INFO ApplicationMaster: Got container status for
 containerID= container_1375853431091_0005_01_02, state=COMPLETE,
 exitStatus=-1000, diagnostics=File does not exist:
 hdfs://isredeng:8020/kishore/kk.ksh

 13/08/07 01:36:28 INFO ApplicationMaster: Got failure status for a
 container : -1000



 On Wed, Aug 7, 2013 at 7:45 AM, Harsh J ha...@cloudera.com wrote:

 Thanks Hitesh!

 P.s. Port isn't a requirement (and with HA URIs, you shouldn't add a
 port), but isredeng has to be the authority component.

 On Wed, Aug 7, 2013 at 7:37 AM, Hitesh Shah hit...@apache.org wrote:
  @Krishna, your logs showed the file error for
 hdfs://isredeng/kishore/kk.ksh
 
  I am assuming you have tried dfs -ls /kishore/kk.ksh and confirmed
 that the file exists? Also the qualified path seems to be missing the
 namenode port. I need to go back and check if a path without the port works
 by assuming the default namenode port.
 
  @Harsh, adding a helper function seems like a good idea. Let me file a
 jira to have the above added to one of the helper/client libraries.
 
  thanks
  -- Hitesh
 
  On Aug 6, 2013, at 6:47 PM, Harsh J wrote:
 
  It is kinda unnecessary to be asking developers to load in timestamps
 and
  length themselves. Why not provide a java.io.File, or perhaps a Path
  accepting API, that gets it automatically on their behalf using the
  FileSystem API internally?
 
  P.s. A HDFS file gave him a FNF, while a Local file gave him a proper
  TS/Len error. I'm guessing there's a bug here w.r.t. handling HDFS
  paths.
 
  On Wed, Aug 7, 2013 at 12:35 AM, Hitesh Shah hit...@apache.org
 wrote:
  Hi Krishna,
 
  YARN downloads a specified local resource on the container's node
 from the url specified. In all situtations, the remote url needs to be a
 fully qualified path. To verify that the file at the remote url is still
 valid, YARN expects you to provide the length and last modified timestamp
 of that file.
 
  If you use an hdfs path such as hdfs://namenode:port/absolute path
 to file, you will need to get the length and timestamp from HDFS.
  If you use file:///, the file should exist on all nodes and all
 nodes should have the file with the same length and timestamp for
 localization to work. ( For a single node setup, this works but tougher to
 get right on a multi-node setup - deploying the file via a rpm should
 likely work).
 
  -- Hitesh
 
  On Aug 6, 2013, at 11:11 AM, Omkar Joshi wrote:
 
  Hi,
 
  You need to match the timestamp. Probably get the timestamp locally
 before adding it. This is explicitly done to ensure that file is not
 updated after user makes the call to avoid possible errors.
 
 
  Thanks,
  Omkar Joshi
  Hortonworks Inc.
 
 
  On Tue, Aug 6, 2013 at 5:25 AM, Krishna Kishore Bonagiri 
 write2kish...@gmail.com wrote:
  I tried the following and it works!
  String shellScriptPath = file:///home_/dsadm/kishore/kk.ksh;
 
  But now getting a timestamp error like below, when I passed 0 to
 setTimestamp()
 
  13/08/06 08:23:48 INFO ApplicationMaster: Got container status for
 containerID= container_1375784329048_0017_01_02, state=COMPLETE,
 exitStatus=-1000, diagnostics=Resource file:/home_/dsadm/kishore/kk.ksh
 changed on src filesystem (expected 0, was 136758058
 
 
 
 
  On Tue, Aug 6, 2013 at 5:24 PM, Harsh J ha...@cloudera.com wrote:
  Can you try passing a fully qualified local path? That is,
 including

Re: setLocalResources() on ContainerLaunchContext

2013-08-06 Thread Krishna Kishore Bonagiri
Hi Harsh,
   The setResource() call on LocalResource() is expecting an argument of
type org.apache.hadoop.yarn.api.records.URL which is converted from a
string in the form of URI. This happens in the following call of
Distributed Shell example,

shellRsrc.setResource(ConverterUtils.getYarnUrlFromURI(new URI(
shellScriptPath)));

So, if I give a local file I get a parsing error like below, which is when
I changed it to an HDFS file thinking that it should be given like that
only. Could you please give an example of how else it could be used, using
a local file as you are saying?

2013-08-06 06:23:12,942 WARN
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
Failed to parse resource-request
java.net.URISyntaxException: Expected scheme name at index 0:
:///home_/dsadm/kishore/kk.ksh
at java.net.URI$Parser.fail(URI.java:2820)
at java.net.URI$Parser.failExpecting(URI.java:2826)
at java.net.URI$Parser.parse(URI.java:3015)
at java.net.URI.init(URI.java:747)
at
org.apache.hadoop.yarn.util.ConverterUtils.getPathFromYarnURL(ConverterUtils.java:77)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourceRequest.init(LocalResourceRequest.java:46)



On Tue, Aug 6, 2013 at 3:36 PM, Harsh J ha...@cloudera.com wrote:

 To be honest, I've never tried loading a HDFS file onto the
 LocalResource this way. I usually just pass a local file and that
 works just fine. There may be something in the URI transformation
 possibly breaking a HDFS source, but try passing a local file - does
 that fail too? The Shell example uses a local file.

 On Tue, Aug 6, 2013 at 10:54 AM, Krishna Kishore Bonagiri
 write2kish...@gmail.com wrote:
  Hi Harsh,
 
Please see if this is useful, I got a stack trace after the error has
  occurred
 
  2013-08-06 00:55:30,559 INFO
  org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: CWD
 set
  to
 /tmp/nm-local-dir/usercache/dsadm/appcache/application_1375716148174_0004
  =
 
 file:/tmp/nm-local-dir/usercache/dsadm/appcache/application_1375716148174_0004
  2013-08-06 00:55:31,017 ERROR
  org.apache.hadoop.security.UserGroupInformation:
 PriviledgedActionException
  as:dsadm (auth:SIMPLE) cause:java.io.FileNotFoundException: File does not
  exist: hdfs://isredeng/kishore/kk.ksh
  2013-08-06 00:55:31,029 INFO
 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  DEBUG: FAILED { hdfs://isredeng/kishore/kk.ksh, 0, FILE, null }, File
 does
  not exist: hdfs://isredeng/kishore/kk.ksh
  2013-08-06 00:55:31,031 INFO
 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
  Resource hdfs://isredeng/kishore/kk.ksh transitioned from DOWNLOADING to
  FAILED
  2013-08-06 00:55:31,034 INFO
 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
  Container container_1375716148174_0004_01_02 transitioned from
  LOCALIZING to LOCALIZATION_FAILED
  2013-08-06 00:55:31,035 INFO
 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl:
  Container container_1375716148174_0004_01_02 sent RELEASE event on a
  resource request { hdfs://isredeng/kishore/kk.ksh, 0, FILE, null } not
  present in cache.
  2013-08-06 00:55:31,036 WARN org.apache.hadoop.ipc.Client: interrupted
  waiting to send rpc request to server
  java.lang.InterruptedException
  at
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1290)
  at
  java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:229)
  at java.util.concurrent.FutureTask.get(FutureTask.java:94)
  at
  org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:930)
  at org.apache.hadoop.ipc.Client.call(Client.java:1285)
  at org.apache.hadoop.ipc.Client.call(Client.java:1264)
  at
 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
  at $Proxy22.heartbeat(Unknown Source)
  at
 
 org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.client.LocalizationProtocolPBClientImpl.heartbeat(LocalizationProtocolPBClientImpl.java:62)
  at
 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:249)
  at
 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:163)
  at
 
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:106)
  at
 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:979)
 
 
 
  And here is my code snippet:
 
ContainerLaunchContext ctx =
  Records.newRecord

Re: setLocalResources() on ContainerLaunchContext

2013-08-06 Thread Krishna Kishore Bonagiri
I tried the following and it works!
String shellScriptPath = file:///home_/dsadm/kishore/kk.ksh;

But now getting a timestamp error like below, when I passed 0 to
setTimestamp()

13/08/06 08:23:48 INFO ApplicationMaster: Got container status for
containerID= container_1375784329048_0017_01_02, state=COMPLETE,
exitStatus=-1000, diagnostics=Resource file:/home_/dsadm/kishore/kk.ksh
changed on src filesystem (expected 0, was 136758058




On Tue, Aug 6, 2013 at 5:24 PM, Harsh J ha...@cloudera.com wrote:

 Can you try passing a fully qualified local path? That is, including the
 file:/ scheme
 On Aug 6, 2013 4:05 PM, Krishna Kishore Bonagiri 
 write2kish...@gmail.com wrote:

 Hi Harsh,
The setResource() call on LocalResource() is expecting an argument of
 type org.apache.hadoop.yarn.api.records.URL which is converted from a
 string in the form of URI. This happens in the following call of
 Distributed Shell example,

 shellRsrc.setResource(ConverterUtils.getYarnUrlFromURI(new URI(
 shellScriptPath)));

 So, if I give a local file I get a parsing error like below, which is
 when I changed it to an HDFS file thinking that it should be given like
 that only. Could you please give an example of how else it could be used,
 using a local file as you are saying?

 2013-08-06 06:23:12,942 WARN
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
 Failed to parse resource-request
 java.net.URISyntaxException: Expected scheme name at index 0:
 :///home_/dsadm/kishore/kk.ksh
 at java.net.URI$Parser.fail(URI.java:2820)
 at java.net.URI$Parser.failExpecting(URI.java:2826)
 at java.net.URI$Parser.parse(URI.java:3015)
 at java.net.URI.init(URI.java:747)
 at
 org.apache.hadoop.yarn.util.ConverterUtils.getPathFromYarnURL(ConverterUtils.java:77)
 at
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourceRequest.init(LocalResourceRequest.java:46)



 On Tue, Aug 6, 2013 at 3:36 PM, Harsh J ha...@cloudera.com wrote:

 To be honest, I've never tried loading a HDFS file onto the
 LocalResource this way. I usually just pass a local file and that
 works just fine. There may be something in the URI transformation
 possibly breaking a HDFS source, but try passing a local file - does
 that fail too? The Shell example uses a local file.

 On Tue, Aug 6, 2013 at 10:54 AM, Krishna Kishore Bonagiri
 write2kish...@gmail.com wrote:
  Hi Harsh,
 
Please see if this is useful, I got a stack trace after the error has
  occurred
 
  2013-08-06 00:55:30,559 INFO
  org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor:
 CWD set
  to
 /tmp/nm-local-dir/usercache/dsadm/appcache/application_1375716148174_0004
  =
 
 file:/tmp/nm-local-dir/usercache/dsadm/appcache/application_1375716148174_0004
  2013-08-06 00:55:31,017 ERROR
  org.apache.hadoop.security.UserGroupInformation:
 PriviledgedActionException
  as:dsadm (auth:SIMPLE) cause:java.io.FileNotFoundException: File does
 not
  exist: hdfs://isredeng/kishore/kk.ksh
  2013-08-06 00:55:31,029 INFO
 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  DEBUG: FAILED { hdfs://isredeng/kishore/kk.ksh, 0, FILE, null }, File
 does
  not exist: hdfs://isredeng/kishore/kk.ksh
  2013-08-06 00:55:31,031 INFO
 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
  Resource hdfs://isredeng/kishore/kk.ksh transitioned from DOWNLOADING
 to
  FAILED
  2013-08-06 00:55:31,034 INFO
 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
  Container container_1375716148174_0004_01_02 transitioned from
  LOCALIZING to LOCALIZATION_FAILED
  2013-08-06 00:55:31,035 INFO
 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl:
  Container container_1375716148174_0004_01_02 sent RELEASE event on
 a
  resource request { hdfs://isredeng/kishore/kk.ksh, 0, FILE, null } not
  present in cache.
  2013-08-06 00:55:31,036 WARN org.apache.hadoop.ipc.Client: interrupted
  waiting to send rpc request to server
  java.lang.InterruptedException
  at
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1290)
  at
  java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:229)
  at java.util.concurrent.FutureTask.get(FutureTask.java:94)
  at
  org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:930)
  at org.apache.hadoop.ipc.Client.call(Client.java:1285)
  at org.apache.hadoop.ipc.Client.call(Client.java:1264)
  at
 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
  at $Proxy22.heartbeat(Unknown Source)
  at
 
 org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.client.LocalizationProtocolPBClientImpl.heartbeat(LocalizationProtocolPBClientImpl.java:62

Re: setLocalResources() on ContainerLaunchContext

2013-08-06 Thread Krishna Kishore Bonagiri
Hi Harsh, Hitesh  Omkar,

  Thanks for the replies.

I tried getting the last modified timestamp like this and it works. Is this
a right thing to do?

  File file = new File(/home_/dsadm/kishore/kk.ksh);
  shellRsrc.setTimestamp(file.lastModified());


And, when I tried using a hdfs file qualifying it with both node name and
port, it didn't work, I get a similar error as earlier.

  String shellScriptPath = hdfs://isredeng:8020//kishore/kk.ksh;


13/08/07 01:36:28 INFO ApplicationMaster: Got container status for
containerID= container_1375853431091_0005_01_02, state=COMPLETE,
exitStatus=-1000, diagnostics=File does not exist:
hdfs://isredeng:8020/kishore/kk.ksh

13/08/07 01:36:28 INFO ApplicationMaster: Got failure status for a
container : -1000



On Wed, Aug 7, 2013 at 7:45 AM, Harsh J ha...@cloudera.com wrote:

 Thanks Hitesh!

 P.s. Port isn't a requirement (and with HA URIs, you shouldn't add a
 port), but isredeng has to be the authority component.

 On Wed, Aug 7, 2013 at 7:37 AM, Hitesh Shah hit...@apache.org wrote:
  @Krishna, your logs showed the file error for
 hdfs://isredeng/kishore/kk.ksh
 
  I am assuming you have tried dfs -ls /kishore/kk.ksh and confirmed that
 the file exists? Also the qualified path seems to be missing the namenode
 port. I need to go back and check if a path without the port works by
 assuming the default namenode port.
 
  @Harsh, adding a helper function seems like a good idea. Let me file a
 jira to have the above added to one of the helper/client libraries.
 
  thanks
  -- Hitesh
 
  On Aug 6, 2013, at 6:47 PM, Harsh J wrote:
 
  It is kinda unnecessary to be asking developers to load in timestamps
 and
  length themselves. Why not provide a java.io.File, or perhaps a Path
  accepting API, that gets it automatically on their behalf using the
  FileSystem API internally?
 
  P.s. A HDFS file gave him a FNF, while a Local file gave him a proper
  TS/Len error. I'm guessing there's a bug here w.r.t. handling HDFS
  paths.
 
  On Wed, Aug 7, 2013 at 12:35 AM, Hitesh Shah hit...@apache.org wrote:
  Hi Krishna,
 
  YARN downloads a specified local resource on the container's node from
 the url specified. In all situtations, the remote url needs to be a fully
 qualified path. To verify that the file at the remote url is still valid,
 YARN expects you to provide the length and last modified timestamp of that
 file.
 
  If you use an hdfs path such as hdfs://namenode:port/absolute path to
 file, you will need to get the length and timestamp from HDFS.
  If you use file:///, the file should exist on all nodes and all nodes
 should have the file with the same length and timestamp for localization to
 work. ( For a single node setup, this works but tougher to get right on a
 multi-node setup - deploying the file via a rpm should likely work).
 
  -- Hitesh
 
  On Aug 6, 2013, at 11:11 AM, Omkar Joshi wrote:
 
  Hi,
 
  You need to match the timestamp. Probably get the timestamp locally
 before adding it. This is explicitly done to ensure that file is not
 updated after user makes the call to avoid possible errors.
 
 
  Thanks,
  Omkar Joshi
  Hortonworks Inc.
 
 
  On Tue, Aug 6, 2013 at 5:25 AM, Krishna Kishore Bonagiri 
 write2kish...@gmail.com wrote:
  I tried the following and it works!
  String shellScriptPath = file:///home_/dsadm/kishore/kk.ksh;
 
  But now getting a timestamp error like below, when I passed 0 to
 setTimestamp()
 
  13/08/06 08:23:48 INFO ApplicationMaster: Got container status for
 containerID= container_1375784329048_0017_01_02, state=COMPLETE,
 exitStatus=-1000, diagnostics=Resource file:/home_/dsadm/kishore/kk.ksh
 changed on src filesystem (expected 0, was 136758058
 
 
 
 
  On Tue, Aug 6, 2013 at 5:24 PM, Harsh J ha...@cloudera.com wrote:
  Can you try passing a fully qualified local path? That is, including
 the file:/ scheme
 
  On Aug 6, 2013 4:05 PM, Krishna Kishore Bonagiri 
 write2kish...@gmail.com wrote:
  Hi Harsh,
The setResource() call on LocalResource() is expecting an argument
 of type org.apache.hadoop.yarn.api.records.URL which is converted from a
 string in the form of URI. This happens in the following call of
 Distributed Shell example,
 
  shellRsrc.setResource(ConverterUtils.getYarnUrlFromURI(new URI(
 shellScriptPath)));
 
  So, if I give a local file I get a parsing error like below, which is
 when I changed it to an HDFS file thinking that it should be given like
 that only. Could you please give an example of how else it could be used,
 using a local file as you are saying?
 
  2013-08-06 06:23:12,942 WARN
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
 Failed to parse resource-request
  java.net.URISyntaxException: Expected scheme name at index 0:
 :///home_/dsadm/kishore/kk.ksh
 at java.net.URI$Parser.fail(URI.java:2820)
 at java.net.URI$Parser.failExpecting(URI.java:2826)
 at java.net.URI$Parser.parse(URI.java:3015

setLocalResources() on ContainerLaunchContext

2013-08-05 Thread Krishna Kishore Bonagiri
Hi,

  Can someone please tell me what is the use of calling setLocalResources()
on ContainerLaunchContext?

  And, also an example of how to use this will help...

 I couldn't guess what is the String in the map that is passed to
setLocalResources() like below:

  // Set the local resources
  MapString, LocalResource localResources = new HashMapString,
LocalResource();

Thanks,
Kishore


Re: setLocalResources() on ContainerLaunchContext

2013-08-05 Thread Krishna Kishore Bonagiri
Hi Harsh,
 Thanks for the quick and detailed reply, it really helps. I am trying to
use it and getting this error in node manager's log:

2013-08-05 08:57:28,867 ERROR
org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
as:dsadm (auth:SIMPLE) cause:java.io.FileNotFoundException: File does not
exist: hdfs://isredeng/kishore/kk.ksh


This file is there on the machine with name isredeng, I could do ls for
that file as below:

-bash-4.1$ hadoop fs -ls kishore/kk.ksh
13/08/05 09:01:03 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
Found 1 items
-rw-r--r--   3 dsadm supergroup   1046 2013-08-05 08:48 kishore/kk.ksh

Note: I am using a single node cluster

Thanks,
Kishore




On Mon, Aug 5, 2013 at 3:00 PM, Harsh J ha...@cloudera.com wrote:

 The string for each LocalResource in the map can be anything that
 serves as a common identifier name for your application. At execution
 time, the passed resource filename will be aliased to the name you've
 mapped it to, so that the application code need not track special
 names. The behavior is very similar to how you can, in MR, define a
 symlink name for a DistributedCache entry (e.g. foo.jar#bar.jar).

 For an example, checkout the DistributedShell app sources.

 Over [1], you can see we take a user provided file path to a shell
 script. This can be named anything as it is user-supplied.
 Onto [2], we define this as a local resource [2.1] and embed it with a
 different name (the string you ask about) [2.2], as defined at [3] as
 an application reference-able constant.
 Note that in [4], we add to the Container arguments the aliased name
 we mapped it to (i.e. [3]) and not the original filename we received
 from the user. The resource is placed on the container with this name
 instead, so thats what we choose to execute.

 [1] -
 https://github.com/apache/hadoop-common/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java#L390

 [2] - [2.1]
 https://github.com/apache/hadoop-common/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java#L764
 and [2.2]
 https://github.com/apache/hadoop-common/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java#L780

 [3] -
 https://github.com/apache/hadoop-common/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java#L205

 [4] -
 https://github.com/apache/hadoop-common/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java#L791

 On Mon, Aug 5, 2013 at 2:44 PM, Krishna Kishore Bonagiri
 write2kish...@gmail.com wrote:
  Hi,
 
Can someone please tell me what is the use of calling
 setLocalResources()
  on ContainerLaunchContext?
 
And, also an example of how to use this will help...
 
   I couldn't guess what is the String in the map that is passed to
  setLocalResources() like below:
 
// Set the local resources
MapString, LocalResource localResources = new HashMapString,
  LocalResource();
 
  Thanks,
  Kishore
 



 --
 Harsh J



Re: setLocalResources() on ContainerLaunchContext

2013-08-05 Thread Krishna Kishore Bonagiri
Hi Harsh,

  Please see if this is useful, I got a stack trace after the error has
occurred

2013-08-06 00:55:30,559 INFO
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: CWD set
to
/tmp/nm-local-dir/usercache/dsadm/appcache/application_1375716148174_0004 =
file:/tmp/nm-local-dir/usercache/dsadm/appcache/application_1375716148174_0004
2013-08-06 00:55:31,017 ERROR
org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
as:dsadm (auth:SIMPLE) cause:java.io.FileNotFoundException: File does not
exist: hdfs://isredeng/kishore/kk.ksh
2013-08-06 00:55:31,029 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
DEBUG: FAILED { hdfs://isredeng/kishore/kk.ksh, 0, FILE, null }, File does
not exist: hdfs://isredeng/kishore/kk.ksh
2013-08-06 00:55:31,031 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
Resource hdfs://isredeng/kishore/kk.ksh transitioned from DOWNLOADING to
FAILED
2013-08-06 00:55:31,034 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
Container container_1375716148174_0004_01_02 transitioned from
LOCALIZING to LOCALIZATION_FAILED
2013-08-06 00:55:31,035 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl:
Container container_1375716148174_0004_01_02 sent RELEASE event on a
resource request { hdfs://isredeng/kishore/kk.ksh, 0, FILE, null } not
present in cache.
2013-08-06 00:55:31,036 WARN org.apache.hadoop.ipc.Client: interrupted
waiting to send rpc request to server
java.lang.InterruptedException
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1290)
at
java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:229)
at java.util.concurrent.FutureTask.get(FutureTask.java:94)
at
org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:930)
at org.apache.hadoop.ipc.Client.call(Client.java:1285)
at org.apache.hadoop.ipc.Client.call(Client.java:1264)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at $Proxy22.heartbeat(Unknown Source)
at
org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.client.LocalizationProtocolPBClientImpl.heartbeat(LocalizationProtocolPBClientImpl.java:62)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:249)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:163)
at
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:106)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:979)



And here is my code snippet:

  ContainerLaunchContext ctx =
Records.newRecord(ContainerLaunchContext.class);

  ctx.setEnvironment(oshEnv);

  // Set the local resources
  MapString, LocalResource localResources = new HashMapString,
LocalResource();

  LocalResource shellRsrc = Records.newRecord(LocalResource.class);
  shellRsrc.setType(LocalResourceType.FILE);
  shellRsrc.setVisibility(LocalResourceVisibility.APPLICATION);
  String shellScriptPath = hdfs://isredeng//kishore/kk.ksh;
  try {
shellRsrc.setResource(ConverterUtils.getYarnUrlFromURI(new
URI(shellScriptPath)));
  } catch (URISyntaxException e) {
LOG.error(Error when trying to use shell script path specified
+  in env, path= + shellScriptPath);
e.printStackTrace();
  }

  shellRsrc.setTimestamp(0/*shellScriptPathTimestamp*/);
  shellRsrc.setSize(0/*shellScriptPathLen*/);
  String ExecShellStringPath = ExecShellScript.sh;
  localResources.put(ExecShellStringPath, shellRsrc);

  ctx.setLocalResources(localResources);


Please let me know if you need anything else.

Thanks,
Kishore



On Tue, Aug 6, 2013 at 12:05 AM, Harsh J ha...@cloudera.com wrote:

 The detail is insufficient to answer why. You should also have gotten
 a trace after it, can you post that? If possible, also the relevant
 snippets of code.

 On Mon, Aug 5, 2013 at 6:36 PM, Krishna Kishore Bonagiri
 write2kish...@gmail.com wrote:
  Hi Harsh,
   Thanks for the quick and detailed reply, it really helps. I am trying to
  use it and getting this error in node manager's log:
 
  2013-08-05 08:57:28,867 ERROR
  org.apache.hadoop.security.UserGroupInformation:
 PriviledgedActionException
  as:dsadm (auth:SIMPLE) cause:java.io.FileNotFoundException: File does not
  exist: hdfs://isredeng/kishore/kk.ksh
 
 
  This file is there on the machine with name isredeng, I could do ls for
  that file as below:
 
  -bash-4.1$ hadoop fs -ls

Re: Extra start-up overhead with hadoop-2.1.0-beta

2013-08-02 Thread Krishna Kishore Bonagiri
Hi Omkar,

  I have got these number by running a simple C program on the containers
that fetches the timestamp in microseconds and exits. The times mentioned
are low and high, they are not varying so drastically with in a version but
there are huge differences(like a second) between the two versions,
2.0.4-alpha and 2.1.0-beta as I mentioned.

  I am using a single node cluster, and all there is absolutely no other
load on the machine/node. My single node cluster is just used for my own
development work, and testing.

  I am not aware of what is resource localization, I am not doing anything
specially for that.

  Please let me know if you need any other info.

Thanks,
Kishore


On Thu, Aug 1, 2013 at 11:20 PM, Omkar Joshi ojo...@hortonworks.com wrote:

 How are you making these measurements can you elaborate more? Is it on a
 best case basis or on an average or worst case? How many resources are you
 sending it for localization? were the sizes and number of these resources
 consistent across tests? Were these resources public/private/application
 specific? Apart from this is the other load on node manager same? is the
 load on hdfs same? did you see any network bottleneck?

 More information will help a lot.


 Thanks,
 Omkar Joshi
 *Hortonworks Inc.* http://www.hortonworks.com


 On Thu, Aug 1, 2013 at 2:19 AM, Krishna Kishore Bonagiri 
 write2kish...@gmail.com wrote:

 Hi,
   Please share with me if you anyone has an answer or clues to my
 question regarding the start up performance.

 Also, one more thing I have observed today is the time taken to run a
 command on a container went up by more than a second in this latest version.

 When using 2.0.4-alpha, it used to take 0.3 to 0.5 seconds from the point
 I call startContainer() to the  point the command is started on the
 container.

 where as

 When using 2.1.0-beta, it is taking around 1.5 seconds from the point it
 came to the call back onContainerStarted() to the point the command is seen
 started running on the container.

 Thanks,
 Kishore


 On Thu, Jul 25, 2013 at 8:38 PM, Krishna Kishore Bonagiri 
 write2kish...@gmail.com wrote:

 Hi,

   I have been using the hadoop-2.0.1-beta release candidate and observed
 that it is slower in running my simple application that runs on 2
 containers. I have tried to find out which parts of it is really having
 this extra overhead(compared to hadoop-2.0.4-alpha), and here is what I
 found that.

 1) From the point my Client has submitted the Application Master to RM,
 it is taking 2  seconds extra
 2) From the point my container request are set up by Application Master,
 till the containers are allocated, it is taking 2 seconds extra

 Is this overhead expected with the changes that went into the new
 version? Or is there to improve it by changing something in configurations
 or so?

 Thanks,
 Kishore






Re: Extra start-up overhead with hadoop-2.1.0-beta

2013-08-01 Thread Krishna Kishore Bonagiri
Hi,
  Please share with me if you anyone has an answer or clues to my question
regarding the start up performance.

Also, one more thing I have observed today is the time taken to run a
command on a container went up by more than a second in this latest version.

When using 2.0.4-alpha, it used to take 0.3 to 0.5 seconds from the point I
call startContainer() to the  point the command is started on the container.

where as

When using 2.1.0-beta, it is taking around 1.5 seconds from the point it
came to the call back onContainerStarted() to the point the command is seen
started running on the container.

Thanks,
Kishore


On Thu, Jul 25, 2013 at 8:38 PM, Krishna Kishore Bonagiri 
write2kish...@gmail.com wrote:

 Hi,

   I have been using the hadoop-2.0.1-beta release candidate and observed
 that it is slower in running my simple application that runs on 2
 containers. I have tried to find out which parts of it is really having
 this extra overhead(compared to hadoop-2.0.4-alpha), and here is what I
 found that.

 1) From the point my Client has submitted the Application Master to RM, it
 is taking 2  seconds extra
 2) From the point my container request are set up by Application Master,
 till the containers are allocated, it is taking 2 seconds extra

 Is this overhead expected with the changes that went into the new version?
 Or is there to improve it by changing something in configurations or so?

 Thanks,
 Kishore



Re: Node manager crashing when running an app requiring 100 containers on hadoop-2.1.0-beta RC0

2013-07-31 Thread Krishna Kishore Bonagiri
Hi Arun,
 I was running on a single node cluster, so all my 100+ containers are on
single node. And, the problem is gone when I increased YARN_HEAP_SIZE to
2GB.

Thanks,
Kishore


On Thu, Aug 1, 2013 at 5:01 AM, Arun C Murthy a...@hortonworks.com wrote:

 How many containers are you running per node?

 On Jul 25, 2013, at 5:21 AM, Krishna Kishore Bonagiri 
 write2kish...@gmail.com wrote:

 Hi Devaraj,

  I used to run this application with the same number of containers
 successfully on previous version, i.e. hadoop-2.0.4-alpha. Is it failing
 with the new version, because YARN itself is also adding some more threads
 than the previous versions?

 Thanks,
 Kishore


 On Thu, Jul 25, 2013 at 4:24 PM, Devaraj k devara...@huawei.com wrote:

  Hi Kishore,

 ** **

 It seems that system doesn’t have enough resources to launch a new
 thread. 

 ** **

 Could you check the system is affordable to launch the configured
 containers and try increasing the native memory available in the system by
 reducing the no of running processes in the system.

 ** **

 Thanks

 Devaraj k

 ** **

 *From:* Krishna Kishore Bonagiri [mailto:write2kish...@gmail.com]
 *Sent:* 25 July 2013 16:09
 *To:* user@hadoop.apache.org
 *Subject:* Node manager crashing when running an app requiring 100
 containers on hadoop-2.1.0-beta RC0

 ** **

 Hi,

 ** **

   I am running an application against hadoop-2.1.0-beta RC, and my app
 requires 117 containers, I have got all the containers allocated, but while
 starting those containers, at around 99th container the node manager has
 gone down with the following kind of error in it's log. Also, I could
 reproduce this error running a sleep 200; date command using the
 Distributed Shell example, in which case I got this error at around 66th
 container.

 ** **

 ** **

 2013-07-25 06:07:17,743 FATAL
 org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread Thread[process
 reaper,5,main] threw an Error.  Shutting down now...

 java.lang.OutOfMemoryError: Failed to create a thread: retVal
 -1073741830, errno 11

 at java.lang.Thread.startImpl(Native Method)

 at java.lang.Thread.start(Thread.java:887)

 at java.lang.ProcessInputStream.init(UNIXProcess.java:472)

 at java.lang.UNIXProcess$1$1$1.run(UNIXProcess.java:157)

 at
 java.security.AccessController.doPrivileged(AccessController.java:202)***
 *

 at java.lang.UNIXProcess$1$1.run(UNIXProcess.java:137)

 2013-07-25 06:07:17,745 INFO org.apache.hadoop.util.ExitUtil: Halt with
 status -1 Message: HaltException

 ** **

 Thanks,

 Kishore



 --
 Arun C. Murthy
 Hortonworks Inc.
 http://hortonworks.com/





Node manager crashing when running an app requiring 100 containers on hadoop-2.1.0-beta RC0

2013-07-25 Thread Krishna Kishore Bonagiri
Hi,

  I am running an application against hadoop-2.1.0-beta RC, and my app
requires 117 containers, I have got all the containers allocated, but while
starting those containers, at around 99th container the node manager has
gone down with the following kind of error in it's log. Also, I could
reproduce this error running a sleep 200; date command using the
Distributed Shell example, in which case I got this error at around 66th
container.


2013-07-25 06:07:17,743 FATAL
org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread Thread[process
reaper,5,main] threw an Error.  Shutting down now...
java.lang.OutOfMemoryError: Failed to create a thread: retVal -1073741830,
errno 11
at java.lang.Thread.startImpl(Native Method)
at java.lang.Thread.start(Thread.java:887)
at java.lang.ProcessInputStream.init(UNIXProcess.java:472)
at java.lang.UNIXProcess$1$1$1.run(UNIXProcess.java:157)
at
java.security.AccessController.doPrivileged(AccessController.java:202)
at java.lang.UNIXProcess$1$1.run(UNIXProcess.java:137)
2013-07-25 06:07:17,745 INFO org.apache.hadoop.util.ExitUtil: Halt with
status -1 Message: HaltException

Thanks,
Kishore


Re: Node manager crashing when running an app requiring 100 containers on hadoop-2.1.0-beta RC0

2013-07-25 Thread Krishna Kishore Bonagiri
Hi Devaraj,

 I used to run this application with the same number of containers
successfully on previous version, i.e. hadoop-2.0.4-alpha. Is it failing
with the new version, because YARN itself is also adding some more threads
than the previous versions?

Thanks,
Kishore


On Thu, Jul 25, 2013 at 4:24 PM, Devaraj k devara...@huawei.com wrote:

  Hi Kishore,

 ** **

 It seems that system doesn’t have enough resources to launch a new thread.
 

 ** **

 Could you check the system is affordable to launch the configured
 containers and try increasing the native memory available in the system by
 reducing the no of running processes in the system.

 ** **

 Thanks

 Devaraj k

 ** **

 *From:* Krishna Kishore Bonagiri [mailto:write2kish...@gmail.com]
 *Sent:* 25 July 2013 16:09
 *To:* user@hadoop.apache.org
 *Subject:* Node manager crashing when running an app requiring 100
 containers on hadoop-2.1.0-beta RC0

 ** **

 Hi,

 ** **

   I am running an application against hadoop-2.1.0-beta RC, and my app
 requires 117 containers, I have got all the containers allocated, but while
 starting those containers, at around 99th container the node manager has
 gone down with the following kind of error in it's log. Also, I could
 reproduce this error running a sleep 200; date command using the
 Distributed Shell example, in which case I got this error at around 66th
 container.

 ** **

 ** **

 2013-07-25 06:07:17,743 FATAL
 org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread Thread[process
 reaper,5,main] threw an Error.  Shutting down now...

 java.lang.OutOfMemoryError: Failed to create a thread: retVal -1073741830,
 errno 11

 at java.lang.Thread.startImpl(Native Method)

 at java.lang.Thread.start(Thread.java:887)

 at java.lang.ProcessInputStream.init(UNIXProcess.java:472)

 at java.lang.UNIXProcess$1$1$1.run(UNIXProcess.java:157)

 at
 java.security.AccessController.doPrivileged(AccessController.java:202)

 at java.lang.UNIXProcess$1$1.run(UNIXProcess.java:137)

 2013-07-25 06:07:17,745 INFO org.apache.hadoop.util.ExitUtil: Halt with
 status -1 Message: HaltException

 ** **

 Thanks,

 Kishore



Extra start-up overhead with hadoop-2.1.0-beta

2013-07-25 Thread Krishna Kishore Bonagiri
Hi,

  I have been using the hadoop-2.0.1-beta release candidate and observed
that it is slower in running my simple application that runs on 2
containers. I have tried to find out which parts of it is really having
this extra overhead(compared to hadoop-2.0.4-alpha), and here is what I
found that.

1) From the point my Client has submitted the Application Master to RM, it
is taking 2  seconds extra
2) From the point my container request are set up by Application Master,
till the containers are allocated, it is taking 2 seconds extra

Is this overhead expected with the changes that went into the new version?
Or is there to improve it by changing something in configurations or so?

Thanks,
Kishore


Re: Namenode automatically going to safemode with 2.1.0-beta

2013-07-19 Thread Krishna Kishore Bonagiri
Hi Harsh,

  I have made my dfs.namenode.name.dir point to a subdirectory of my home,
and I don't see this issue again. So, is this a bug that we need to log
into JIRA?

Thanks,
Kishore


On Tue, Jul 16, 2013 at 6:39 AM, Harsh J ha...@cloudera.com wrote:

  2013-07-12 11:04:26,002 WARN
 org.apache.hadoop.hdfs.server.namenode.NameNodeResourceChecker: Space
 available on volume 'null' is 0, which is below the configured reserved
 amount 104857600

 This is interesting. Its calling your volume null, which may be more
 of a superficial bug.

 What is your dfs.namenode.name.dir set to? From
 /tmp/hadoop-dsadm/dfs/name I'd expect you haven't set it up and /tmp
 is being used off of the out-of-box defaults. Could you try to set it
 to a specific directory thats not on /tmp?

 On Mon, Jul 15, 2013 at 2:43 PM, Krishna Kishore Bonagiri
 write2kish...@gmail.com wrote:
  I don't have it in my hdfs-site.xml, in which case probably the default
  value is taken..
 
 
  On Mon, Jul 15, 2013 at 2:29 PM, Azuryy Yu azury...@gmail.com wrote:
 
  please check dfs.datanode.du.reserved in the hdfs-site.xml
 
  On Jul 15, 2013 4:30 PM, Aditya exalter adityaexal...@gmail.com
 wrote:
 
  Hi Krishna,
 
 Can you please send screenshots of namenode web UI.
 
  Thanks Aditya.
 
 
  On Mon, Jul 15, 2013 at 1:54 PM, Krishna Kishore Bonagiri
  write2kish...@gmail.com wrote:
 
  I have had enough space on the disk that is used, like around 30 Gigs
 
  Thanks,
  Kishore
 
 
  On Mon, Jul 15, 2013 at 1:30 PM, Venkatarami Netla
  venkatarami.ne...@cloudwick.com wrote:
 
  Hi,
  pls see the available space for NN storage directory.
 
  Thanks  Regards
 
  Venkat
 
 
  On Mon, Jul 15, 2013 at 12:14 PM, Krishna Kishore Bonagiri
  write2kish...@gmail.com wrote:
 
  Hi,
 
   I am doing no activity on my single node cluster which is using
  2.1.0-beta, and still observed that it has gone to safe mode by
 itself after
  a while. I was looking at the name node log and see many of these
 kinds of
  entries.. Can anything be interpreted from these?
 
  2013-07-12 09:06:11,256 INFO
  org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log
 segment at
  561
  2013-07-12 09:07:11,290 INFO
  org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log
 from
  9.70.137.114
  2013-07-12 09:07:11,290 INFO
  org.apache.hadoop.hdfs.server.namenode.FSEditLog: Rolling edit logs
  2013-07-12 09:07:11,290 INFO
  org.apache.hadoop.hdfs.server.namenode.FSEditLog: Ending log
 segment 561
  2013-07-12 09:07:11,291 INFO
  org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of
 transactions: 2
  Total time for transactions(ms): 1 Number of transactions batched
 in Syncs:
  0 Number of syncs: 2 SyncTimes(ms): 14
  2013-07-12 09:07:11,292 INFO
  org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of
 transactions: 2
  Total time for transactions(ms): 1 Number of transactions batched
 in Syncs:
  0 Number of syncs: 3 SyncTimes(ms): 15
  2013-07-12 09:07:11,293 INFO
  org.apache.hadoop.hdfs.server.namenode.FileJournalManager:
 Finalizing edits
  file
 /tmp/hadoop-dsadm/dfs/name/current/edits_inprogress_561
  -
 
 /tmp/hadoop-dsadm/dfs/name/current/edits_561-562
  2013-07-12 09:07:11,294 INFO
  org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log
 segment at
  563
  2013-07-12 09:08:11,397 INFO
  org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log
 from
  9.70.137.114
  2013-07-12 09:08:11,398 INFO
  org.apache.hadoop.hdfs.server.namenode.FSEditLog: Rolling edit logs
  2013-07-12 09:08:11,398 INFO
  org.apache.hadoop.hdfs.server.namenode.FSEditLog: Ending log
 segment 563
  2013-07-12 09:08:11,399 INFO
  org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of
 transactions: 2
  Total time for transactions(ms): 2 Number of transactions batched
 in Syncs:
  0 Number of syncs: 2 SyncTimes(ms): 11
  2013-07-12 09:08:11,400 INFO
  org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of
 transactions: 2
  Total time for transactions(ms): 2 Number of transactions batched
 in Syncs:
  0 Number of syncs: 3 SyncTimes(ms): 12
  2013-07-12 09:08:11,402 INFO
  org.apache.hadoop.hdfs.server.namenode.FileJournalManager:
 Finalizing edits
  file
 /tmp/hadoop-dsadm/dfs/name/current/edits_inprogress_563
  -
 
 /tmp/hadoop-dsadm/dfs/name/current/edits_563-564
  2013-07-12 09:08:11,402 INFO
  org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log
 segment at
  565
  2013-07-12 09:09:11,440 INFO
  org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log
 from
  9.70.137.114
  2013-07-12 09:09:11,440 INFO
  org.apache.hadoop.hdfs.server.namenode.FSEditLog: Rolling edit logs
  2013-07-12 09:09:11,440 INFO
  org.apache.hadoop.hdfs.server.namenode.FSEditLog: Ending log
 segment 565
  2013-07-12 09:09:11,440 INFO
  org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of
 transactions: 2
  Total time for transactions(ms): 0

Re: Namenode automatically going to safemode with 2.1.0-beta

2013-07-16 Thread Krishna Kishore Bonagiri
Yes Harsh, I haven't set dfs.namenode.name.dir anywhere in config files. My
name node has again gone into safe mode today while it was idle. I shall
try setting this value to something other than /tmp



On Tue, Jul 16, 2013 at 6:39 AM, Harsh J ha...@cloudera.com wrote:

  2013-07-12 11:04:26,002 WARN
 org.apache.hadoop.hdfs.server.namenode.NameNodeResourceChecker: Space
 available on volume 'null' is 0, which is below the configured reserved
 amount 104857600

 This is interesting. Its calling your volume null, which may be more
 of a superficial bug.

 What is your dfs.namenode.name.dir set to? From
 /tmp/hadoop-dsadm/dfs/name I'd expect you haven't set it up and /tmp
 is being used off of the out-of-box defaults. Could you try to set it
 to a specific directory thats not on /tmp?

 On Mon, Jul 15, 2013 at 2:43 PM, Krishna Kishore Bonagiri
 write2kish...@gmail.com wrote:
  I don't have it in my hdfs-site.xml, in which case probably the default
  value is taken..
 
 
  On Mon, Jul 15, 2013 at 2:29 PM, Azuryy Yu azury...@gmail.com wrote:
 
  please check dfs.datanode.du.reserved in the hdfs-site.xml
 
  On Jul 15, 2013 4:30 PM, Aditya exalter adityaexal...@gmail.com
 wrote:
 
  Hi Krishna,
 
 Can you please send screenshots of namenode web UI.
 
  Thanks Aditya.
 
 
  On Mon, Jul 15, 2013 at 1:54 PM, Krishna Kishore Bonagiri
  write2kish...@gmail.com wrote:
 
  I have had enough space on the disk that is used, like around 30 Gigs
 
  Thanks,
  Kishore
 
 
  On Mon, Jul 15, 2013 at 1:30 PM, Venkatarami Netla
  venkatarami.ne...@cloudwick.com wrote:
 
  Hi,
  pls see the available space for NN storage directory.
 
  Thanks  Regards
 
  Venkat
 
 
  On Mon, Jul 15, 2013 at 12:14 PM, Krishna Kishore Bonagiri
  write2kish...@gmail.com wrote:
 
  Hi,
 
   I am doing no activity on my single node cluster which is using
  2.1.0-beta, and still observed that it has gone to safe mode by
 itself after
  a while. I was looking at the name node log and see many of these
 kinds of
  entries.. Can anything be interpreted from these?
 
  2013-07-12 09:06:11,256 INFO
  org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log
 segment at
  561
  2013-07-12 09:07:11,290 INFO
  org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log
 from
  9.70.137.114
  2013-07-12 09:07:11,290 INFO
  org.apache.hadoop.hdfs.server.namenode.FSEditLog: Rolling edit logs
  2013-07-12 09:07:11,290 INFO
  org.apache.hadoop.hdfs.server.namenode.FSEditLog: Ending log
 segment 561
  2013-07-12 09:07:11,291 INFO
  org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of
 transactions: 2
  Total time for transactions(ms): 1 Number of transactions batched
 in Syncs:
  0 Number of syncs: 2 SyncTimes(ms): 14
  2013-07-12 09:07:11,292 INFO
  org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of
 transactions: 2
  Total time for transactions(ms): 1 Number of transactions batched
 in Syncs:
  0 Number of syncs: 3 SyncTimes(ms): 15
  2013-07-12 09:07:11,293 INFO
  org.apache.hadoop.hdfs.server.namenode.FileJournalManager:
 Finalizing edits
  file
 /tmp/hadoop-dsadm/dfs/name/current/edits_inprogress_561
  -
 
 /tmp/hadoop-dsadm/dfs/name/current/edits_561-562
  2013-07-12 09:07:11,294 INFO
  org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log
 segment at
  563
  2013-07-12 09:08:11,397 INFO
  org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log
 from
  9.70.137.114
  2013-07-12 09:08:11,398 INFO
  org.apache.hadoop.hdfs.server.namenode.FSEditLog: Rolling edit logs
  2013-07-12 09:08:11,398 INFO
  org.apache.hadoop.hdfs.server.namenode.FSEditLog: Ending log
 segment 563
  2013-07-12 09:08:11,399 INFO
  org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of
 transactions: 2
  Total time for transactions(ms): 2 Number of transactions batched
 in Syncs:
  0 Number of syncs: 2 SyncTimes(ms): 11
  2013-07-12 09:08:11,400 INFO
  org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of
 transactions: 2
  Total time for transactions(ms): 2 Number of transactions batched
 in Syncs:
  0 Number of syncs: 3 SyncTimes(ms): 12
  2013-07-12 09:08:11,402 INFO
  org.apache.hadoop.hdfs.server.namenode.FileJournalManager:
 Finalizing edits
  file
 /tmp/hadoop-dsadm/dfs/name/current/edits_inprogress_563
  -
 
 /tmp/hadoop-dsadm/dfs/name/current/edits_563-564
  2013-07-12 09:08:11,402 INFO
  org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log
 segment at
  565
  2013-07-12 09:09:11,440 INFO
  org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log
 from
  9.70.137.114
  2013-07-12 09:09:11,440 INFO
  org.apache.hadoop.hdfs.server.namenode.FSEditLog: Rolling edit logs
  2013-07-12 09:09:11,440 INFO
  org.apache.hadoop.hdfs.server.namenode.FSEditLog: Ending log
 segment 565
  2013-07-12 09:09:11,440 INFO
  org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of
 transactions: 2
  Total time

Namenode automatically going to safemode with 2.1.0-beta

2013-07-15 Thread Krishna Kishore Bonagiri
Hi,

 I am doing no activity on my single node cluster which is using
2.1.0-beta, and still observed that it has gone to safe mode by itself
after a while. I was looking at the name node log and see many of these
kinds of entries.. Can anything be interpreted from these?

2013-07-12 09:06:11,256 INFO
org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at
561
2013-07-12 09:07:11,290 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from
9.70.137.114
2013-07-12 09:07:11,290 INFO
org.apache.hadoop.hdfs.server.namenode.FSEditLog: Rolling edit logs
2013-07-12 09:07:11,290 INFO
org.apache.hadoop.hdfs.server.namenode.FSEditLog: Ending log segment 561
2013-07-12 09:07:11,291 INFO
org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2
Total time for transactions(ms): 1 Number of transactions batched in Syncs:
0 Number of syncs: 2 SyncTimes(ms): 14
2013-07-12 09:07:11,292 INFO
org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2
Total time for transactions(ms): 1 Number of transactions batched in Syncs:
0 Number of syncs: 3 SyncTimes(ms): 15
2013-07-12 09:07:11,293 INFO
org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Finalizing edits
file
/tmp/hadoop-dsadm/dfs/name/current/edits_inprogress_561 -
/tmp/hadoop-dsadm/dfs/name/current/edits_561-562
2013-07-12 09:07:11,294 INFO
org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at
563
2013-07-12 09:08:11,397 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from
9.70.137.114
2013-07-12 09:08:11,398 INFO
org.apache.hadoop.hdfs.server.namenode.FSEditLog: Rolling edit logs
2013-07-12 09:08:11,398 INFO
org.apache.hadoop.hdfs.server.namenode.FSEditLog: Ending log segment 563
2013-07-12 09:08:11,399 INFO
org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2
Total time for transactions(ms): 2 Number of transactions batched in Syncs:
0 Number of syncs: 2 SyncTimes(ms): 11
2013-07-12 09:08:11,400 INFO
org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2
Total time for transactions(ms): 2 Number of transactions batched in Syncs:
0 Number of syncs: 3 SyncTimes(ms): 12
2013-07-12 09:08:11,402 INFO
org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Finalizing edits
file
/tmp/hadoop-dsadm/dfs/name/current/edits_inprogress_563 -
/tmp/hadoop-dsadm/dfs/name/current/edits_563-564
2013-07-12 09:08:11,402 INFO
org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at
565
2013-07-12 09:09:11,440 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from
9.70.137.114
2013-07-12 09:09:11,440 INFO
org.apache.hadoop.hdfs.server.namenode.FSEditLog: Rolling edit logs
2013-07-12 09:09:11,440 INFO
org.apache.hadoop.hdfs.server.namenode.FSEditLog: Ending log segment 565
2013-07-12 09:09:11,440 INFO
org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2
Total time for transactions(ms): 0 Number of transactions batched in Syncs:
0 Number of syncs: 2 SyncTimes(ms): 13
2013-07-12 09:09:11,441 INFO
org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2
Total time for transactions(ms): 0 Number of transactions batched in Syncs:
0 Number of syncs: 3 SyncTimes(ms): 13


And after sometime it said:

2013-07-12 11:03:19,799 INFO
org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at
795
2013-07-12 11:04:19,826 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from
9.70.137.114
2013-07-12 11:04:19,826 INFO
org.apache.hadoop.hdfs.server.namenode.FSEditLog: Rolling edit logs
2013-07-12 11:04:19,827 INFO
org.apache.hadoop.hdfs.server.namenode.FSEditLog: Ending log segment 795
2013-07-12 11:04:19,827 INFO
org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2
Total time for transactions(ms): 0 Number of transactions batched in Syncs:
0 Number of syncs: 2 SyncTimes(ms): 12
2013-07-12 11:04:19,827 INFO
org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2
Total time for transactions(ms): 0 Number of transactions batched in Syncs:
0 Number of syncs: 3 SyncTimes(ms): 12
2013-07-12 11:04:19,829 INFO
org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Finalizing edits
file
/tmp/hadoop-dsadm/dfs/name/current/edits_inprogress_795 -
/tmp/hadoop-dsadm/dfs/name/current/edits_795-796
2013-07-12 11:04:19,829 INFO
org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at
797
2013-07-12 11:04:26,002 WARN
org.apache.hadoop.hdfs.server.namenode.NameNodeResourceChecker: Space
available on volume 'null' is 0, which is below the configured reserved
amount 104857600
2013-07-12 11:04:26,003 WARN
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: NameNode low on
available disk space. Entering safe mode.
2013-07-12 11:04:26,004 

Re: Namenode automatically going to safemode with 2.1.0-beta

2013-07-15 Thread Krishna Kishore Bonagiri
I have had enough space on the disk that is used, like around 30 Gigs

Thanks,
Kishore


On Mon, Jul 15, 2013 at 1:30 PM, Venkatarami Netla 
venkatarami.ne...@cloudwick.com wrote:

 Hi,
 pls see the available space for NN storage directory.

 Thanks  Regards

 Venkat


 On Mon, Jul 15, 2013 at 12:14 PM, Krishna Kishore Bonagiri 
 write2kish...@gmail.com wrote:

 Hi,

  I am doing no activity on my single node cluster which is using
 2.1.0-beta, and still observed that it has gone to safe mode by itself
 after a while. I was looking at the name node log and see many of these
 kinds of entries.. Can anything be interpreted from these?

 2013-07-12 09:06:11,256 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at
 561
 2013-07-12 09:07:11,290 INFO
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from
 9.70.137.114
 2013-07-12 09:07:11,290 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Rolling edit logs
 2013-07-12 09:07:11,290 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Ending log segment 561
 2013-07-12 09:07:11,291 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2
 Total time for transactions(ms): 1 Number of transactions batched in Syncs:
 0 Number of syncs: 2 SyncTimes(ms): 14
 2013-07-12 09:07:11,292 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2
 Total time for transactions(ms): 1 Number of transactions batched in Syncs:
 0 Number of syncs: 3 SyncTimes(ms): 15
 2013-07-12 09:07:11,293 INFO
 org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Finalizing edits
 file
 /tmp/hadoop-dsadm/dfs/name/current/edits_inprogress_561 -
 /tmp/hadoop-dsadm/dfs/name/current/edits_561-562
 2013-07-12 09:07:11,294 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at
 563
 2013-07-12 09:08:11,397 INFO
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from
 9.70.137.114
 2013-07-12 09:08:11,398 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Rolling edit logs
 2013-07-12 09:08:11,398 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Ending log segment 563
 2013-07-12 09:08:11,399 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2
 Total time for transactions(ms): 2 Number of transactions batched in Syncs:
 0 Number of syncs: 2 SyncTimes(ms): 11
 2013-07-12 09:08:11,400 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2
 Total time for transactions(ms): 2 Number of transactions batched in Syncs:
 0 Number of syncs: 3 SyncTimes(ms): 12
 2013-07-12 09:08:11,402 INFO
 org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Finalizing edits
 file
 /tmp/hadoop-dsadm/dfs/name/current/edits_inprogress_563 -
 /tmp/hadoop-dsadm/dfs/name/current/edits_563-564
 2013-07-12 09:08:11,402 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at
 565
 2013-07-12 09:09:11,440 INFO
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from
 9.70.137.114
 2013-07-12 09:09:11,440 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Rolling edit logs
 2013-07-12 09:09:11,440 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Ending log segment 565
 2013-07-12 09:09:11,440 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2
 Total time for transactions(ms): 0 Number of transactions batched in Syncs:
 0 Number of syncs: 2 SyncTimes(ms): 13
 2013-07-12 09:09:11,441 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2
 Total time for transactions(ms): 0 Number of transactions batched in Syncs:
 0 Number of syncs: 3 SyncTimes(ms): 13


 And after sometime it said:

 2013-07-12 11:03:19,799 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at
 795
 2013-07-12 11:04:19,826 INFO
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from
 9.70.137.114
 2013-07-12 11:04:19,826 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Rolling edit logs
 2013-07-12 11:04:19,827 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Ending log segment 795
 2013-07-12 11:04:19,827 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2
 Total time for transactions(ms): 0 Number of transactions batched in Syncs:
 0 Number of syncs: 2 SyncTimes(ms): 12
 2013-07-12 11:04:19,827 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2
 Total time for transactions(ms): 0 Number of transactions batched in Syncs:
 0 Number of syncs: 3 SyncTimes(ms): 12
 2013-07-12 11:04:19,829 INFO
 org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Finalizing edits
 file
 /tmp/hadoop-dsadm/dfs/name/current/edits_inprogress_795 -
 /tmp/hadoop-dsadm/dfs/name/current/edits_795-796

Re: Namenode automatically going to safemode with 2.1.0-beta

2013-07-15 Thread Krishna Kishore Bonagiri
Hi,

  I have restarted my cluster after removing the data directory and
formatting the namenode. So, is this screenshot still useful for you or do
you want it only after I reproduce the issue?

Thanks,
Kishore


On Mon, Jul 15, 2013 at 1:59 PM, Venkatarami Netla 
venkatarami.ne...@cloudwick.com wrote:

 Hi,
 Pls send Web UI namemnode page and what is ur total hard disk size


 On Mon, Jul 15, 2013 at 1:54 PM, Krishna Kishore Bonagiri 
 write2kish...@gmail.com wrote:

 I have had enough space on the disk that is used, like around 30 Gigs

 Thanks,
 Kishore


 On Mon, Jul 15, 2013 at 1:30 PM, Venkatarami Netla 
 venkatarami.ne...@cloudwick.com wrote:

 Hi,
 pls see the available space for NN storage directory.

 Thanks  Regards

 Venkat


 On Mon, Jul 15, 2013 at 12:14 PM, Krishna Kishore Bonagiri 
 write2kish...@gmail.com wrote:

 Hi,

  I am doing no activity on my single node cluster which is using
 2.1.0-beta, and still observed that it has gone to safe mode by itself
 after a while. I was looking at the name node log and see many of these
 kinds of entries.. Can anything be interpreted from these?

 2013-07-12 09:06:11,256 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at
 561
 2013-07-12 09:07:11,290 INFO
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from
 9.70.137.114
 2013-07-12 09:07:11,290 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Rolling edit logs
 2013-07-12 09:07:11,290 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Ending log segment 561
 2013-07-12 09:07:11,291 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2
 Total time for transactions(ms): 1 Number of transactions batched in Syncs:
 0 Number of syncs: 2 SyncTimes(ms): 14
 2013-07-12 09:07:11,292 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2
 Total time for transactions(ms): 1 Number of transactions batched in Syncs:
 0 Number of syncs: 3 SyncTimes(ms): 15
 2013-07-12 09:07:11,293 INFO
 org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Finalizing edits
 file
 /tmp/hadoop-dsadm/dfs/name/current/edits_inprogress_561 -
 /tmp/hadoop-dsadm/dfs/name/current/edits_561-562
 2013-07-12 09:07:11,294 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at
 563
 2013-07-12 09:08:11,397 INFO
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from
 9.70.137.114
 2013-07-12 09:08:11,398 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Rolling edit logs
 2013-07-12 09:08:11,398 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Ending log segment 563
 2013-07-12 09:08:11,399 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2
 Total time for transactions(ms): 2 Number of transactions batched in Syncs:
 0 Number of syncs: 2 SyncTimes(ms): 11
 2013-07-12 09:08:11,400 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2
 Total time for transactions(ms): 2 Number of transactions batched in Syncs:
 0 Number of syncs: 3 SyncTimes(ms): 12
 2013-07-12 09:08:11,402 INFO
 org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Finalizing edits
 file
 /tmp/hadoop-dsadm/dfs/name/current/edits_inprogress_563 -
 /tmp/hadoop-dsadm/dfs/name/current/edits_563-564
 2013-07-12 09:08:11,402 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at
 565
 2013-07-12 09:09:11,440 INFO
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from
 9.70.137.114
 2013-07-12 09:09:11,440 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Rolling edit logs
 2013-07-12 09:09:11,440 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Ending log segment 565
 2013-07-12 09:09:11,440 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2
 Total time for transactions(ms): 0 Number of transactions batched in Syncs:
 0 Number of syncs: 2 SyncTimes(ms): 13
 2013-07-12 09:09:11,441 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2
 Total time for transactions(ms): 0 Number of transactions batched in Syncs:
 0 Number of syncs: 3 SyncTimes(ms): 13


 And after sometime it said:

 2013-07-12 11:03:19,799 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at
 795
 2013-07-12 11:04:19,826 INFO
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from
 9.70.137.114
 2013-07-12 11:04:19,826 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Rolling edit logs
 2013-07-12 11:04:19,827 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Ending log segment 795
 2013-07-12 11:04:19,827 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2
 Total time for transactions(ms): 0 Number of transactions batched in Syncs:
 0 Number of syncs: 2 SyncTimes(ms): 12
 2013-07-12 11:04

Re: Namenode automatically going to safemode with 2.1.0-beta

2013-07-15 Thread Krishna Kishore Bonagiri
I don't have it in my hdfs-site.xml, in which case probably the default
value is taken..


On Mon, Jul 15, 2013 at 2:29 PM, Azuryy Yu azury...@gmail.com wrote:

 please check dfs.datanode.du.reserved in the hdfs-site.xml
 On Jul 15, 2013 4:30 PM, Aditya exalter adityaexal...@gmail.com wrote:

 Hi Krishna,

Can you please send screenshots of namenode web UI.

 Thanks Aditya.


 On Mon, Jul 15, 2013 at 1:54 PM, Krishna Kishore Bonagiri 
 write2kish...@gmail.com wrote:

 I have had enough space on the disk that is used, like around 30 Gigs

 Thanks,
 Kishore


 On Mon, Jul 15, 2013 at 1:30 PM, Venkatarami Netla 
 venkatarami.ne...@cloudwick.com wrote:

 Hi,
 pls see the available space for NN storage directory.

 Thanks  Regards

 Venkat


 On Mon, Jul 15, 2013 at 12:14 PM, Krishna Kishore Bonagiri 
 write2kish...@gmail.com wrote:

 Hi,

  I am doing no activity on my single node cluster which is using
 2.1.0-beta, and still observed that it has gone to safe mode by itself
 after a while. I was looking at the name node log and see many of these
 kinds of entries.. Can anything be interpreted from these?

 2013-07-12 09:06:11,256 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at
 561
 2013-07-12 09:07:11,290 INFO
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from
 9.70.137.114
 2013-07-12 09:07:11,290 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Rolling edit logs
 2013-07-12 09:07:11,290 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Ending log segment 561
 2013-07-12 09:07:11,291 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 
 2
 Total time for transactions(ms): 1 Number of transactions batched in 
 Syncs:
 0 Number of syncs: 2 SyncTimes(ms): 14
 2013-07-12 09:07:11,292 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 
 2
 Total time for transactions(ms): 1 Number of transactions batched in 
 Syncs:
 0 Number of syncs: 3 SyncTimes(ms): 15
 2013-07-12 09:07:11,293 INFO
 org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Finalizing 
 edits
 file
 /tmp/hadoop-dsadm/dfs/name/current/edits_inprogress_561 -
 /tmp/hadoop-dsadm/dfs/name/current/edits_561-562
 2013-07-12 09:07:11,294 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at
 563
 2013-07-12 09:08:11,397 INFO
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from
 9.70.137.114
 2013-07-12 09:08:11,398 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Rolling edit logs
 2013-07-12 09:08:11,398 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Ending log segment 563
 2013-07-12 09:08:11,399 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 
 2
 Total time for transactions(ms): 2 Number of transactions batched in 
 Syncs:
 0 Number of syncs: 2 SyncTimes(ms): 11
 2013-07-12 09:08:11,400 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 
 2
 Total time for transactions(ms): 2 Number of transactions batched in 
 Syncs:
 0 Number of syncs: 3 SyncTimes(ms): 12
 2013-07-12 09:08:11,402 INFO
 org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Finalizing 
 edits
 file
 /tmp/hadoop-dsadm/dfs/name/current/edits_inprogress_563 -
 /tmp/hadoop-dsadm/dfs/name/current/edits_563-564
 2013-07-12 09:08:11,402 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at
 565
 2013-07-12 09:09:11,440 INFO
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from
 9.70.137.114
 2013-07-12 09:09:11,440 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Rolling edit logs
 2013-07-12 09:09:11,440 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Ending log segment 565
 2013-07-12 09:09:11,440 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 
 2
 Total time for transactions(ms): 0 Number of transactions batched in 
 Syncs:
 0 Number of syncs: 2 SyncTimes(ms): 13
 2013-07-12 09:09:11,441 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 
 2
 Total time for transactions(ms): 0 Number of transactions batched in 
 Syncs:
 0 Number of syncs: 3 SyncTimes(ms): 13


 And after sometime it said:

 2013-07-12 11:03:19,799 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at
 795
 2013-07-12 11:04:19,826 INFO
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from
 9.70.137.114
 2013-07-12 11:04:19,826 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Rolling edit logs
 2013-07-12 11:04:19,827 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Ending log segment 795
 2013-07-12 11:04:19,827 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 
 2
 Total time for transactions(ms): 0 Number of transactions batched in 
 Syncs:
 0 Number of syncs: 2

Re: whitelist feature of YARN

2013-07-09 Thread Krishna Kishore Bonagiri
Hi Sandy,

  Yes, I have been using AMRMClient APIs. I am planning to shift to
whatever way is this white list feature is supported with. But am not sure
what is meant by submitting ResourceRequests directly to RM. Can you please
elaborate on this or give me a pointer to some example code on how to do
it...

   Thanks for the reply,

-Kishore


On Mon, Jul 8, 2013 at 10:53 PM, Sandy Ryza sandy.r...@cloudera.com wrote:

 Hi Krishna,

 From your previous email, it looks like you are using the AMRMClient APIs.
  Support for whitelisting is not yet supported through them.  I am working
 on this in YARN-521, which should be included in the next release after
 2.1.0-beta.  If you are submitting ResourceRequests directly to the RM, you
 can whitelist a node by
 * setting the relaxLocality flag on the node-level ResourceRequest to true
 * setting the relaxLocality flag on the corresponding rack-level
 ResourceRequest to false
 * setting the relaxLocality flag on the corresponding any-level
 ResourceRequest to false

 -Sandy


 On Mon, Jul 8, 2013 at 6:48 AM, Krishna Kishore Bonagiri 
 write2kish...@gmail.com wrote:

 Hi,

   Can someone please point to some example code of how to use the
 whitelist feature of YARN, I have recently got RC1 for hadoop-2.1.0-beta
 and want to use this feature.

   It would be great if you can point me to some description of what this
 white listing feature is, I have gone through some JIRA logs related to
 this but more concrete explanation would be helpful.

 Thanks,
 Kishore





Re: Requesting containers on a specific host

2013-07-08 Thread Krishna Kishore Bonagiri
I could resolve this error with a simple host name change to a fully
qualified one including the domain name. But to go ahead running some of my
old example code, I am observing at least the following changes :


1) ClientRMProtocol and AMRMProtocol are removed.
2) ContainerManager is removed.
3) YarnRemoteException is removed.
4) ContainerRequest is removed.

It looks like it is now compulsory to modify my application accordingly,
where as on earlier versions, I could have run the same application without
any modifications.

Thanks,
Kishore


On Fri, Jul 5, 2013 at 7:37 PM, Krishna Kishore Bonagiri 
write2kish...@gmail.com wrote:

 Hi Devaraj,

Thanks for pointing me to this RC. I am trying this out, and getting
 this error for NM to get started. My RM is running fine, but NM is failing
 and saying that it is disallowed by RM and received a SHUTDOWN message.
 Please give me clue to resolve this.

 2013-07-05 09:49:20,043 ERROR
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Unexpected
 error starting NodeStatusUpdater
 org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Recieved SHUTDOWN
 signal from Resourcemanager ,Registration of NodeManager failed, Message
 from ResourceManager: Disallowed NodeManager from
 isredeng.swg.usma.ibm.com, Sending SHUTDOWN signal to the NodeManager.
 at
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:290)
 at
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceStart(NodeStatusUpdaterImpl.java:156)
 at
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
 at
 org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:101)
 at
 org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStart(NodeManager.java:213)
 at
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
 at
 org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:401)
 at
 org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:447)

 Thanks,
 Kishore


 On Fri, Jul 5, 2013 at 8:52 AM, Devaraj k devara...@huawei.com wrote:

  Hi Kishore,

 ** **

 hadoop-2.1.0 beta release is in voting process now.

 ** **

 You can try out from hadoop-2.1.0 beta RC
 http://people.apache.org/~acmurthy/hadoop-2.1.0-beta-rc0/ or you could
 check the same with trunk build.

 ** **

 Thanks

 Devaraj k

 ** **

 *From:* Krishna Kishore Bonagiri [mailto:write2kish...@gmail.com]
 *Sent:* 04 July 2013 21:33

 *To:* user@hadoop.apache.org
 *Subject:* Re: Requesting containers on a specific host

  ** **

 Thanks Arun, it seems to be available with 2.1.0-beta, when will that be
 released? Or if I want it now, could I get from the trunk?

 ** **

 -Kishore

 ** **

 On Thu, Jul 4, 2013 at 5:58 PM, Arun C Murthy a...@hortonworks.com
 wrote:

 To guarantee nodes on a specific container you need to use the whitelist
 feature we added recently:

 https://issues.apache.org/jira/browse/YARN-398

  Arun

 ** **

 On Jul 4, 2013, at 3:14 AM, Krishna Kishore Bonagiri 
 write2kish...@gmail.com wrote:



 

 I could get containers on specific nodes using addContainerRequest() on
 AMRMClient. But there are issues with it. I have two nodes, node1 and node2
 in my cluster. And, my Application Master is trying to get 3 containers on
 node1, and 3 containers on node2 in that order. 

 ** **

 While trying to request on node1, it sometimes gives me those on node2,
 and vice verse. When I get a container on a different node than the one I
 need, I release it and make a fresh request. I am having to do like that
 forever to get a container on the node I need. 

 ** **

  Though the node I am requesting has enough resources, why does it keep
 giving me containers on the other node? How can I make sure I get a
 container on the node I want? 

 ** **

 Note: I am using the default scheduler, i.e. Capacity Scheduler.

 ** **

 Thanks,

 Kishore

 ** **

 On Fri, Jun 21, 2013 at 7:25 PM, Arun C Murthy a...@hortonworks.com
 wrote:

 Check if the hostname you are setting is the same in the RM logs…

 ** **

 On Jun 21, 2013, at 2:15 AM, Krishna Kishore Bonagiri 
 write2kish...@gmail.com wrote:



 

 Hi,

   I am trying to get container on a specific host, using setHostName(0
 call on ResourceRequest, but could not get allocated anything forever,
 which just works fine when I change the node name to *. I am working on a
 single node cluster, and I am giving the name of the single node I have in
 my cluster. 

 ** **

   Is there any specific format that I need to give for setHostName(), why
 is it not working...

 ** **

 Thanks,

 Kishore

 ** **

 --

 Arun C. Murthy

 Hortonworks Inc.
 http://hortonworks.com

whitelist feature of YARN

2013-07-08 Thread Krishna Kishore Bonagiri
Hi,

  Can someone please point to some example code of how to use the whitelist
feature of YARN, I have recently got RC1 for hadoop-2.1.0-beta and want to
use this feature.

  It would be great if you can point me to some description of what this
white listing feature is, I have gone through some JIRA logs related to
this but more concrete explanation would be helpful.

Thanks,
Kishore


Re: Requesting containers on a specific host

2013-07-05 Thread Krishna Kishore Bonagiri
Hi Devaraj,

   Thanks for pointing me to this RC. I am trying this out, and getting
this error for NM to get started. My RM is running fine, but NM is failing
and saying that it is disallowed by RM and received a SHUTDOWN message.
Please give me clue to resolve this.

2013-07-05 09:49:20,043 ERROR
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Unexpected
error starting NodeStatusUpdater
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Recieved SHUTDOWN
signal from Resourcemanager ,Registration of NodeManager failed, Message
from ResourceManager: Disallowed NodeManager from  isredeng.swg.usma.ibm.com,
Sending SHUTDOWN signal to the NodeManager.
at
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:290)
at
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceStart(NodeStatusUpdaterImpl.java:156)
at
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at
org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:101)
at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStart(NodeManager.java:213)
at
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:401)
at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:447)

Thanks,
Kishore

On Fri, Jul 5, 2013 at 8:52 AM, Devaraj k devara...@huawei.com wrote:

  Hi Kishore,

 ** **

 hadoop-2.1.0 beta release is in voting process now.

 ** **

 You can try out from hadoop-2.1.0 beta RC
 http://people.apache.org/~acmurthy/hadoop-2.1.0-beta-rc0/ or you could
 check the same with trunk build.

 ** **

 Thanks

 Devaraj k

 ** **

 *From:* Krishna Kishore Bonagiri [mailto:write2kish...@gmail.com]
 *Sent:* 04 July 2013 21:33

 *To:* user@hadoop.apache.org
 *Subject:* Re: Requesting containers on a specific host

  ** **

 Thanks Arun, it seems to be available with 2.1.0-beta, when will that be
 released? Or if I want it now, could I get from the trunk?

 ** **

 -Kishore

 ** **

 On Thu, Jul 4, 2013 at 5:58 PM, Arun C Murthy a...@hortonworks.com wrote:
 

 To guarantee nodes on a specific container you need to use the whitelist
 feature we added recently:

 https://issues.apache.org/jira/browse/YARN-398

  Arun

 ** **

 On Jul 4, 2013, at 3:14 AM, Krishna Kishore Bonagiri 
 write2kish...@gmail.com wrote:



 

 I could get containers on specific nodes using addContainerRequest() on
 AMRMClient. But there are issues with it. I have two nodes, node1 and node2
 in my cluster. And, my Application Master is trying to get 3 containers on
 node1, and 3 containers on node2 in that order. 

 ** **

 While trying to request on node1, it sometimes gives me those on node2,
 and vice verse. When I get a container on a different node than the one I
 need, I release it and make a fresh request. I am having to do like that
 forever to get a container on the node I need. 

 ** **

  Though the node I am requesting has enough resources, why does it keep
 giving me containers on the other node? How can I make sure I get a
 container on the node I want? 

 ** **

 Note: I am using the default scheduler, i.e. Capacity Scheduler.

 ** **

 Thanks,

 Kishore

 ** **

 On Fri, Jun 21, 2013 at 7:25 PM, Arun C Murthy a...@hortonworks.com
 wrote:

 Check if the hostname you are setting is the same in the RM logs…

 ** **

 On Jun 21, 2013, at 2:15 AM, Krishna Kishore Bonagiri 
 write2kish...@gmail.com wrote:



 

 Hi,

   I am trying to get container on a specific host, using setHostName(0
 call on ResourceRequest, but could not get allocated anything forever,
 which just works fine when I change the node name to *. I am working on a
 single node cluster, and I am giving the name of the single node I have in
 my cluster. 

 ** **

   Is there any specific format that I need to give for setHostName(), why
 is it not working...

 ** **

 Thanks,

 Kishore

 ** **

 --

 Arun C. Murthy

 Hortonworks Inc.
 http://hortonworks.com/

 ** **

 ** **

 ** **

 --

 Arun C. Murthy

 Hortonworks Inc.
 http://hortonworks.com/

 ** **

 ** **



Re: Requesting containers on a specific host

2013-07-04 Thread Krishna Kishore Bonagiri
I could get containers on specific nodes using addContainerRequest() on
AMRMClient. But there are issues with it. I have two nodes, node1 and node2
in my cluster. And, my Application Master is trying to get 3 containers on
node1, and 3 containers on node2 in that order.

While trying to request on node1, it sometimes gives me those on node2, and
vice verse. When I get a container on a different node than the one I need,
I release it and make a fresh request. I am having to do like that forever
to get a container on the node I need.

 Though the node I am requesting has enough resources, why does it keep
giving me containers on the other node? How can I make sure I get a
container on the node I want?

Note: I am using the default scheduler, i.e. Capacity Scheduler.

Thanks,
Kishore


On Fri, Jun 21, 2013 at 7:25 PM, Arun C Murthy a...@hortonworks.com wrote:

 Check if the hostname you are setting is the same in the RM logs…

 On Jun 21, 2013, at 2:15 AM, Krishna Kishore Bonagiri 
 write2kish...@gmail.com wrote:

 Hi,
   I am trying to get container on a specific host, using setHostName(0
 call on ResourceRequest, but could not get allocated anything forever,
 which just works fine when I change the node name to *. I am working on a
 single node cluster, and I am giving the name of the single node I have in
 my cluster.

   Is there any specific format that I need to give for setHostName(), why
 is it not working...

 Thanks,
 Kishore


 --
 Arun C. Murthy
 Hortonworks Inc.
 http://hortonworks.com/





Re: Requesting containers on a specific host

2013-07-04 Thread Krishna Kishore Bonagiri
Thanks Arun, it seems to be available with 2.1.0-beta, when will that be
released? Or if I want it now, could I get from the trunk?

-Kishore


On Thu, Jul 4, 2013 at 5:58 PM, Arun C Murthy a...@hortonworks.com wrote:

 To guarantee nodes on a specific container you need to use the whitelist
 feature we added recently:

 https://issues.apache.org/jira/browse/YARN-398

 Arun

 On Jul 4, 2013, at 3:14 AM, Krishna Kishore Bonagiri 
 write2kish...@gmail.com wrote:

 I could get containers on specific nodes using addContainerRequest() on
 AMRMClient. But there are issues with it. I have two nodes, node1 and node2
 in my cluster. And, my Application Master is trying to get 3 containers on
 node1, and 3 containers on node2 in that order.

 While trying to request on node1, it sometimes gives me those on node2,
 and vice verse. When I get a container on a different node than the one I
 need, I release it and make a fresh request. I am having to do like that
 forever to get a container on the node I need.

  Though the node I am requesting has enough resources, why does it keep
 giving me containers on the other node? How can I make sure I get a
 container on the node I want?

 Note: I am using the default scheduler, i.e. Capacity Scheduler.

 Thanks,
 Kishore


 On Fri, Jun 21, 2013 at 7:25 PM, Arun C Murthy a...@hortonworks.comwrote:

 Check if the hostname you are setting is the same in the RM logs…

 On Jun 21, 2013, at 2:15 AM, Krishna Kishore Bonagiri 
 write2kish...@gmail.com wrote:

 Hi,
   I am trying to get container on a specific host, using setHostName(0
 call on ResourceRequest, but could not get allocated anything forever,
 which just works fine when I change the node name to *. I am working on a
 single node cluster, and I am giving the name of the single node I have in
 my cluster.

   Is there any specific format that I need to give for setHostName(), why
 is it not working...

 Thanks,
 Kishore


  --
 Arun C. Murthy
 Hortonworks Inc.
 http://hortonworks.com/




 --
 Arun C. Murthy
 Hortonworks Inc.
 http://hortonworks.com/





Re: Requesting containers on a specific host

2013-06-25 Thread Krishna Kishore Bonagiri
Hi Arun,

  I just found that setHostName() doesn't work on 2.0.0-alpha and works on
2.0..4-alpha, I didn't check the intermediate versions. I have verified it
by starting the daemons of respective versions, and modifying the
ApplicationMaster.java in the distributed shell example, and running a date
command on it.  Is this something already known? or have I been doing
something wrong?

Thanks,
Kishore



On Fri, Jun 21, 2013 at 7:25 PM, Arun C Murthy a...@hortonworks.com wrote:

 Check if the hostname you are setting is the same in the RM logs…

 On Jun 21, 2013, at 2:15 AM, Krishna Kishore Bonagiri 
 write2kish...@gmail.com wrote:

 Hi,
   I am trying to get container on a specific host, using setHostName(0
 call on ResourceRequest, but could not get allocated anything forever,
 which just works fine when I change the node name to *. I am working on a
 single node cluster, and I am giving the name of the single node I have in
 my cluster.

   Is there any specific format that I need to give for setHostName(), why
 is it not working...

 Thanks,
 Kishore


 --
 Arun C. Murthy
 Hortonworks Inc.
 http://hortonworks.com/





Re: Requesting containers on a specific host

2013-06-24 Thread Krishna Kishore Bonagiri
Yes Arun, the one I am giving is same as the one I see in RM's log also.

Thanks,
Kishore


On Fri, Jun 21, 2013 at 7:25 PM, Arun C Murthy a...@hortonworks.com wrote:

 Check if the hostname you are setting is the same in the RM logs…

 On Jun 21, 2013, at 2:15 AM, Krishna Kishore Bonagiri 
 write2kish...@gmail.com wrote:

 Hi,
   I am trying to get container on a specific host, using setHostName(0
 call on ResourceRequest, but could not get allocated anything forever,
 which just works fine when I change the node name to *. I am working on a
 single node cluster, and I am giving the name of the single node I have in
 my cluster.

   Is there any specific format that I need to give for setHostName(), why
 is it not working...

 Thanks,
 Kishore


 --
 Arun C. Murthy
 Hortonworks Inc.
 http://hortonworks.com/





Re: Container allocation on the same node

2013-06-12 Thread Krishna Kishore Bonagiri
Hi Harsh,

   What will happen when I specify local host as the required host? Doesn't
the resource manager give me all the containers on the local host?  I don't
want to constrain myself to the local host, which might be busy while other
nodes in the cluster have enough resources available for me.

Thanks,
Kishore


On Wed, Jun 12, 2013 at 6:45 PM, Harsh J ha...@cloudera.com wrote:

 You can request containers with the local host name as the required
 host, and perhaps reject and re-request if they aren't designated to
 be on that one until you have sufficient. This may take a while
 though.

 On Wed, Jun 12, 2013 at 6:25 PM, Krishna Kishore Bonagiri
 write2kish...@gmail.com wrote:
  Hi,
 
I want to get some containers for my application on the same node, is
  there a way to make such a request.
 
For example, I have an application which needs 10 containers, but have
 a
  constraint that a set of those containers need to be running on the same
  node. Can I ask my resource manager to give me, let us say 5 containers
 on
  the same node?
 
I know that there is now a way to specify the node name on which I
 need a
  container, but I don't bother which node in the cluster I get them
 allocated
  on, I just need them on the same node.
 
Please suggest me if it is possible, and how can I do that?
 
  Thanks,
  Kishore



 --
 Harsh J



Application Master getting started very late

2013-06-11 Thread Krishna Kishore Bonagiri
Hi,

  I have been using YARN since quite sometime, and recently moved to
release 2.0.4. I recently started running huge number of Application
Masters (applications) one after another, and observed that sometimes in
that sequence the Application Master takes around 1 minute or little more
than that also to get started from the point the client launches it.

   Any idea why it (AM) takes so long? It happens very randomly for me, not
always at the  same point.

Thanks,
Kishore


Re: What else can be built on top of YARN.

2013-05-30 Thread Krishna Kishore Bonagiri
Hi Rahul,

  It is at least because of the reasons that Vinod listed that makes my
life easy for porting my application on to YARN instead of making it work
in the Map Reduce framework. The main purpose of me using YARN is to
exploit the resource management capabilities of YARN.

Thanks,
Kishore


On Wed, May 29, 2013 at 11:00 PM, Rahul Bhattacharjee 
rahul.rec@gmail.com wrote:

 Thanks for the response Krishna.

 I was wondering if it were possible for using MR to  solve you problem
 instead of building the whole stack on top of yarn.
 Most likely its not possible , thats why you are building it . I wanted to
 know why is that ?

 I am in just trying to find out the need or why we might need to write the
 application on yarn.

 Rahul


 On Wed, May 29, 2013 at 8:23 PM, Krishna Kishore Bonagiri 
 write2kish...@gmail.com wrote:

 Hi Rahul,

   I am porting a distributed application that runs on a fixed set of
 given resources to YARN, with the aim of  being able to run it on a
 dynamically selected resources whichever are available at the time of
 running the application.

 Thanks,
 Kishore


 On Wed, May 29, 2013 at 8:04 PM, Rahul Bhattacharjee 
 rahul.rec@gmail.com wrote:

 Hi all,

 I was going through the motivation behind Yarn. Splitting the
 responsibility of JT is the major concern.Ultimately the base (Yarn) was
 built in a generic way for building other generic distributed applications
 too.

 I am not able to think of any other parallel processing use case that
 would be useful to built on top of YARN. I though of a lot of use cases
 that would be beneficial when run in parallel , but again ,we can do those
 using map only jobs in MR.

 Can someone tell me a scenario , where a application can utilize Yarn
 features or can be built on top of YARN and at the same time , it cannot be
 done efficiently using MRv2 jobs.

 thanks,
 Rahul







Re: What else can be built on top of YARN.

2013-05-29 Thread Krishna Kishore Bonagiri
Hi Rahul,

  I am porting a distributed application that runs on a fixed set of given
resources to YARN, with the aim of  being able to run it on a dynamically
selected resources whichever are available at the time of running the
application.

Thanks,
Kishore


On Wed, May 29, 2013 at 8:04 PM, Rahul Bhattacharjee 
rahul.rec@gmail.com wrote:

 Hi all,

 I was going through the motivation behind Yarn. Splitting the
 responsibility of JT is the major concern.Ultimately the base (Yarn) was
 built in a generic way for building other generic distributed applications
 too.

 I am not able to think of any other parallel processing use case that
 would be useful to built on top of YARN. I though of a lot of use cases
 that would be beneficial when run in parallel , but again ,we can do those
 using map only jobs in MR.

 Can someone tell me a scenario , where a application can utilize Yarn
 features or can be built on top of YARN and at the same time , it cannot be
 done efficiently using MRv2 jobs.

 thanks,
 Rahul





Out of memory error by Node Manager, and shut down

2013-05-23 Thread Krishna Kishore Bonagiri
Hi,

  I have got the following error in node manager's log, and it got shut
down, after about 1 application were run after it was started. Any clue
why does it occur... or is this a bug?


2013-05-22 11:53:34,456 FATAL
org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread Thread[process
reaper,5,main] threw an Error.  Shutting down now...
java.lang.OutOfMemoryError: Failed to create a thread: retVal -1073741830,
errno 11
at java.lang.Thread.startImpl(Native Method)
at java.lang.Thread.start(Thread.java:887)
at java.lang.ProcessInputStream.init(UNIXProcess.java:472)
at java.lang.UNIXProcess$1$1$1.run(UNIXProcess.java:157)
at
java.security.AccessController.doPrivileged(AccessController.java:202)
at java.lang.UNIXProcess$1$1.run(UNIXProcess.java:137)


Thanks,
Kishore


Namenode going to safe mode on YARN

2013-05-06 Thread Krishna Kishore Bonagiri
Hi,

  I have been running application on my YARN cluster since around 20 days,
about 5000 applications a day. I am getting the following error today.
Please let me know how can I avoid this, is this happening because of a bug?

org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeException):
Cannot create file/1066/AppMaster.jar. Name node is in safe mode.
The reported blocks 4775 needs additional 880 blocks to reach the threshold
0.9990 of total blocks 5660. Safe mode will be turned off automatically.
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1786)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:1737)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1719)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:429)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:271)
at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:40732)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:454)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1014)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1741)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1737)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1478)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1735)


Thanks,
Kishore


Re: Namenode going to safe mode on YARN

2013-05-06 Thread Krishna Kishore Bonagiri
Hi Nithin  Ted,

  Thanks for the replies.

 I don't know what my replication factor is, I don't seem to have set
anything in my configuration files. I run on a single node cluster. My data
node has gone down and came back, and also I didn't delete any of the hdfs
blocks.

 I know that name node enter safe mode when HDFS is restarted, and will
leave soon. Is it safe to execute command to leave safe mode? I mean, can
something wrong happen if we do it ourselves? because it wouldn't have
collected all the needed data and could not leave the safe mode by itself?

  And, does the error I gave above indicate some clue as to what I could do
better?

Thanks,
Kishore



On Mon, May 6, 2013 at 2:56 PM, Ted Xu t...@gopivotal.com wrote:

 Hi Kishore,

 It should not be a bug. After restarting HDFS, namenode will enter safe
 mode until all needed data is collected. During safe mode, all update
 operations will fail.

 In some cases, as Nitin mentioned, namenode will never leave safe mode
 because it can't get enough data. In that case you may need to force name
 node leave safe mode.

 For more information, see
 http://hadoop.apache.org/docs/r2.0.4-alpha/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html#Safemode
 .


 On Mon, May 6, 2013 at 5:00 PM, Nitin Pawar nitinpawar...@gmail.comwrote:

 What is your replication factor on hdfs?
 Did any of your datanode go down recently and is not back in rotation?
 Did you delete any hdfs blocks directly from datanodes?
 On May 6, 2013 2:28 PM, Krishna Kishore Bonagiri 
 write2kish...@gmail.com wrote:

 Hi,

   I have been running application on my YARN cluster since around 20
 days, about 5000 applications a day. I am getting the following error
 today. Please let me know how can I avoid this, is this happening because
 of a bug?

 org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeException):
 Cannot create file/1066/AppMaster.jar. Name node is in safe mode.
 The reported blocks 4775 needs additional 880 blocks to reach the
 threshold 0.9990 of total blocks 5660. Safe mode will be turned off
 automatically.
 at
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1786)
 at
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:1737)
 at
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1719)
 at
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:429)
  at
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:271)
 at
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:40732)
 at
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:454)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1014)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1741)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1737)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1478)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1735)


 Thanks,
 Kishore




 --
 Regards,
 Ted Xu



Re: Namenode going to safe mode on YARN

2013-05-06 Thread Krishna Kishore Bonagiri
Hi Nithin  Ted,

  Thanks for the replies.

 I don't know what my replication factor is, I don't seem to have set
anything in my configuration files. I run on a single node cluster. My data
node has gone down and came back, and also I didn't delete any of the hdfs
blocks.

 I know that name node enter safe mode when HDFS is restarted, and will
leave soon. Is it safe to execute command to leave safe mode? I mean, can
something wrong happen if we do it ourselves? because it wouldn't have
collected the needed data and could not leave the safe mode by itself?

  And, does the error I gave above indicate some clue as to what I could do
better?

Thanks,
Kishore



On Mon, May 6, 2013 at 2:56 PM, Ted Xu t...@gopivotal.com wrote:

 Hi Kishore,

 It should not be a bug. After restarting HDFS, namenode will enter safe
 mode until all needed data is collected. During safe mode, all update
 operations will fail.

 In some cases, as Nitin mentioned, namenode will never leave safe mode
 because it can't get enough data. In that case you may need to force name
 node leave safe mode.

 For more information, see
 http://hadoop.apache.org/docs/r2.0.4-alpha/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html#Safemode
 .


 On Mon, May 6, 2013 at 5:00 PM, Nitin Pawar nitinpawar...@gmail.comwrote:

 What is your replication factor on hdfs?
 Did any of your datanode go down recently and is not back in rotation?
 Did you delete any hdfs blocks directly from datanodes?
 On May 6, 2013 2:28 PM, Krishna Kishore Bonagiri 
 write2kish...@gmail.com wrote:

 Hi,

   I have been running application on my YARN cluster since around 20
 days, about 5000 applications a day. I am getting the following error
 today. Please let me know how can I avoid this, is this happening because
 of a bug?

 org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeException):
 Cannot create file/1066/AppMaster.jar. Name node is in safe mode.
 The reported blocks 4775 needs additional 880 blocks to reach the
 threshold 0.9990 of total blocks 5660. Safe mode will be turned off
 automatically.
 at
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1786)
 at
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:1737)
 at
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1719)
 at
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:429)
  at
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:271)
 at
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:40732)
 at
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:454)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1014)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1741)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1737)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1478)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1735)


 Thanks,
 Kishore




 --
 Regards,
 Ted Xu



Re: POLL: Using YARN or pre-YARN?

2013-04-25 Thread Krishna Kishore Bonagiri
I have been using YARN, i.e. hadopp-2.0.0-alpha to hadoop-2.0.4-alpha, I
don't know what you meant by pre-YARN.

Thanks,
Kishore


On Wed, Apr 24, 2013 at 10:41 PM, Otis Gospodnetic 
otis.gospodne...@gmail.com wrote:

 Hi,

 Quick poll, would be great to know how many people are using YARN vs.
 pre-YARN:


 http://www.linkedin.com/groups/YARN-preYARN-which-version-Hadoop-988957.S.234719475

 Thanks,
 Otis
 --
 Hadoop Performance Monitoring - http://sematext.com/spm/index.html



Re: Differences hadoop-2.0.0-alpha Vs hadoop-2.0.3-alpha

2013-03-27 Thread Krishna Kishore Bonagiri
Hi Arun,

 Thanks for the reply. I have just compiled and ran my
ApplicationMaste.java and Client.java against hadoop-2.0.3-alpha and
getting this exception. The same runs fine on 2.0.0. Please suggest me what
could be the issue...

2013-03-28 00:50:47,576 FATAL [main] Client (Client.java:main(151)) - Error
running CLient
java.lang.reflect.UndeclaredThrowableException
at
org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl.unwrapAndThrowException(YarnRemoteExceptionPBImpl.java:128)
at
org.apache.hadoop.yarn.api.impl.pb.client.ClientRMProtocolPBClientImpl.getQueueInfo(ClientRMProtocolPBClientImpl.java:215)
at Client.dumpClusterInfo(Client.java:263)
at Client.launchAndMonitorAM(Client.java:471)
at Client.main(Client.java:149)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
at java.lang.reflect.Method.invoke(Method.java:611)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Caused by:
org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException):
java.lang.NullPointerException
at
java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:779)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.getQueueInfo(CapacityScheduler.java:542)
at
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getQueueInfo(ClientRMService.java:420)
at
org.apache.hadoop.yarn.api.impl.pb.service.ClientRMProtocolPBServiceImpl.getQueueInfo(ClientRMProtocolPBServiceImpl.java:191)
at
org.apache.hadoop.yarn.proto.ClientRMProtocol$ClientRMProtocolService$2.callBlockingMethod(ClientRMProtocol.java:214)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:454)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1014)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1735)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1731)
at
java.security.AccessController.doPrivileged(AccessController.java:284)
at javax.security.auth.Subject.doAs(Subject.java:573)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1441)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1729)

at org.apache.hadoop.ipc.Client.call(Client.java:1235)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
at $Proxy4.getQueueInfo(Unknown Source)
at
org.apache.hadoop.yarn.api.impl.pb.client.ClientRMProtocolPBClientImpl.getQueueInfo(ClientRMProtocolPBClientImpl.java:212)
... 8 more

Thanks,
Kishore


On Thu, Mar 28, 2013 at 2:39 AM, Arun C Murthy a...@hortonworks.com wrote:

 YarnClient etc. is a just a bunch of helper libs to make it easier to
 write new applications.

 OTOH, your existing application should continue to work.

 hth,
 Arun

 On Mar 26, 2013, at 3:21 AM, Krishna Kishore Bonagiri wrote:

 Hi,
   I  have some YARN application written and running properly against
 hadoop-2.0.0-alpha but when I recently downloaded and started using
 hadoop-2.0.3-alpha it doesn't work. I think the original one I wrote was
 looking at the Client.java and ApplicationMaster.java in DistributedShell
 example. It looks like this example code also has changed with the new
 version, it now has the Client being extended from YarnClientImpl and many
 other changes it has.

   Is there any guide as to how should I modify my old application to work
 against the new version?

 Thanks,
 Kishore


 --
 Arun C. Murthy
 Hortonworks Inc.
 http://hortonworks.com/





  1   2   >