Hi All,

Thanks for your reply. I believe you guys are right. There is data torrent
application which keeps on restarting. I observed resource manager UI, I
always see one application running even no one running app from my team.

Chetan,

yarn.resourcemanager.am.max-attempts property is currently set to 2. I
checked a log for that application,there are some
AlreadybeingCreatedException is coming.Attaching log along this mail.Can
some one help me on this?

Thanks and Regards,
Shashi



On Thu, Aug 27, 2015 at 1:01 AM, Chetan Narsude <[email protected]>
wrote:

> Can you check: yarn.resourcemanager.am.max-attempts setting for YARN
> (yarn-site.xml or yarn-default.xml whichever you are using)?
>
> Also can you look at the application master logs for one of the app
> instances you did not start to see why it was shutdown?
>
>
> --
> Chetan
>
>
> On Wed, Aug 26, 2015 at 9:51 AM, Tushar Gosavi <[email protected]>
> wrote:
>
>> You can also check yarn resource manager ui and logs to verify which
>> applications are getting restarted continuously.
>>
>> On Wed, Aug 26, 2015 at 9:08 AM, David Yan <[email protected]> wrote:
>>
>>> That's a lot of applications.  I suspect there is something that keeps
>>> starting the application, which causes the folder to keep increasing in
>>> size. Can you just run get-app-info on dtcli on just one application and
>>> see what is being spawned up?
>>>
>>> David
>>>
>>> On Tue, Aug 25, 2015 at 11:44 PM, Shashi Vishwakarma <
>>> [email protected]> wrote:
>>>
>>>> Thanks David for detailed explanation. I checked apps directory in
>>>> HDFS,there are around 12858 application in that folder each of having 6.2 M
>>>> size. It will be a time consuming process to find status of each
>>>> application by running get-app-info in dtcli. So logged in to web
>>>> interface of datatorrent(port 9090) but there is no application running at
>>>> this moment.
>>>>
>>>> Still HDFS space utilization  is increasing,any pointers on this?
>>>>
>>>> Thanks and Regards,
>>>> Shashi
>>>>
>>>> On Wed, Aug 26, 2015 at 2:16 AM, Amol Kekre <[email protected]>
>>>> wrote:
>>>>
>>>>>
>>>>> Adding [email protected]
>>>>>
>>>>> Thks,
>>>>> Amol
>>>>>
>>>>>
>>>>> On Tue, Aug 25, 2015 at 10:34 AM, David Yan <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hi Shashi,
>>>>>>
>>>>>> That directory is where Apex stores application information, like
>>>>>> application jar files, checkpoints, container information, etc.
>>>>>> Please run this command to see which directory is taking the most
>>>>>> space.
>>>>>>
>>>>>> $ hdfs dfs -du /user/dtadmin/datatorrent/apps
>>>>>>
>>>>>> Then open dtcli and use the get-app-info command look at the
>>>>>> information of that application.  For example:
>>>>>>
>>>>>> dt> get-app-info application_1439598948299_0557
>>>>>>
>>>>>> The field "state" will tell you whether the application is running or
>>>>>> not.
>>>>>>
>>>>>> If you don't care about the application, you can safely kill it if
>>>>>> it's running and delete the HDFS directory by doing hdfs dfs -rm -r
>>>>>> /user/dtadmin/datatorrent/apps/application_xxx_yyy (replace xxx and yyy
>>>>>> with appropriate values).  Note that doing so will wipe all stored
>>>>>> information about that application.
>>>>>>
>>>>>> David
>>>>>>
>>>>>> On Tue, Aug 25, 2015 at 6:32 AM, Shashi Vishwakarma <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I have  DataTorrent 3.x installed on my cluster.Even thought there
>>>>>>> is no data torrent application is running , still my hdfs space 
>>>>>>> utilization
>>>>>>> goes on increasing. Below is hdfs path that has occupied most of the 
>>>>>>> space.
>>>>>>>
>>>>>>> /user/dtadmin/datatorrent/apps
>>>>>>>
>>>>>>> Why this is happening? Am I missing something here?
>>>>>>>
>>>>>>> Thanks
>>>>>>> Shashi
>>>>>>>
>>>>>>> --
>>>>>>> You received this message because you are subscribed to the Google
>>>>>>> Groups "apex-dev" group.
>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>> send an email to [email protected].
>>>>>>> To post to this group, send email to [email protected].
>>>>>>> To view this discussion on the web visit
>>>>>>> https://groups.google.com/d/msgid/apex-dev/8754d662-4948-4920-96f3-cb58f70d5f39%40googlegroups.com
>>>>>>> <https://groups.google.com/d/msgid/apex-dev/8754d662-4948-4920-96f3-cb58f70d5f39%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>> .
>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "apex-dev" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>> send an email to [email protected].
>>>>>> To post to this group, send email to [email protected].
>>>>>> To view this discussion on the web visit
>>>>>> https://groups.google.com/d/msgid/apex-dev/CAMqituP83nSGd4Ln6phTe0okyojwsE%3DGq22unu%3D-yDgyf0Y8tA%40mail.gmail.com
>>>>>> <https://groups.google.com/d/msgid/apex-dev/CAMqituP83nSGd4Ln6phTe0okyojwsE%3DGq22unu%3D-yDgyf0Y8tA%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>>>> .
>>>>>>
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>
>>>>>
>>>>>
>>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "apex-dev" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To post to this group, send email to [email protected].
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/apex-dev/CAMqituMeKHC84rJpFAHKbcFi-psC-zDqrOTRwQXhq75CbSQcBQ%40mail.gmail.com
>>> <https://groups.google.com/d/msgid/apex-dev/CAMqituMeKHC84rJpFAHKbcFi-psC-zDqrOTRwQXhq75CbSQcBQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
>>
>> --
>> “I'd have blown my top, because I want to beat this damn thing,
>>  as long as I've gone this far. I can't just leave it after I've found
>>  out so much about it. I have to keep going to find out ultimately
>> what is the matter with it in the end."
>>                 Richard P. Feynman
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "apex-dev" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/apex-dev/CAHYazdeHVNqPgn8ABwic92HEkSrEoWU%3D_cXDw%2Brb5Li4GoDpww%40mail.gmail.com
>> <https://groups.google.com/d/msgid/apex-dev/CAHYazdeHVNqPgn8ABwic92HEkSrEoWU%3D_cXDw%2Brb5Li4GoDpww%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
Exception in thread "main" 
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException):
 Failed to create file 
[/user/dtadmin/datatorrent/audit/dt-20150514010221-88gt3kvr/audit/audit-201508] 
for [DFSClient_NONMAPREDUCE_743265090_1] for client [153.65.231.16], because 
this file is already being created by [DFSClient_NONMAPREDUCE_80867563_1] on 
[153.65.231.16]
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2543)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInternal(FSNamesystem.java:2372)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInt(FSNamesystem.java:2607)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:2570)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeRpcServer.java:543)
        at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:373)
        at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)

        at com.datatorrent.stram.b.F.F(qi:200)
        at com.datatorrent.stram.b.F.start(qi:114)
        at 
org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
        at com.datatorrent.stram.b.g.serviceStart(ze:78)
        at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
        at com.datatorrent.stram.LicensingAppMaster.main(lj:0)
Caused by: 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException):
 Failed to create file 
[/user/dtadmin/datatorrent/audit/dt-20150514010221-88gt3kvr/audit/audit-201508] 
for [DFSClient_NONMAPREDUCE_743265090_1] for client [153.65.231.16], because 
this file is already being created by [DFSClient_NONMAPREDUCE_80867563_1] on 
[153.65.231.16]
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2543)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInternal(FSNamesystem.java:2372)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInt(FSNamesystem.java:2607)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:2570)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeaRpcServer.java:543)
        at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:373)
        at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)

        at org.apache.hadoop.ipc.Client.call(Client.java:1410)
        at org.apache.hadoop.ipc.Client.call(Client.java:1363)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
        at com.sun.proxy.$Proxy14.append(Unknown Source)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
        at com.sun.proxy.$Proxy14.append(Unknown Source)
        at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.append(ClientNamenodeProtocolTranslatorPB.java:276)
        at org.apache.hadoop.hdfs.DFSClient.callAppend(DFSClient.java:1563)
        at org.apache.hadoop.hdfs.DFSClient.append(DFSClient.java:1603)
        at org.apache.hadoop.hdfs.DFSClient.append(DFSClient.java:1591)
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:320)
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:316)
        at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
        at 
org.apache.hadoop.hdfs.DistributedFileSystem.append(DistributedFileSystem.java:316)
        at org.apache.hadoop.fs.FileSystem.append(FileSystem.java:1161)
        at com.datatorrent.stram.b.d.j.F(em:115)
        at com.datatorrent.stram.b.j.I.F(be:114)
        at com.datatorrent.stram.b.j.I.F(be:41)
        at com.datatorrent.stram.b.F.F(qi:133)
        ... 5 more

Reply via email to