Can you check: yarn.resourcemanager.am.max-attempts setting for YARN (yarn-site.xml or yarn-default.xml whichever you are using)?
Also can you look at the application master logs for one of the app instances you did not start to see why it was shutdown? -- Chetan On Wed, Aug 26, 2015 at 9:51 AM, Tushar Gosavi <[email protected]> wrote: > You can also check yarn resource manager ui and logs to verify which > applications are getting restarted continuously. > > On Wed, Aug 26, 2015 at 9:08 AM, David Yan <[email protected]> wrote: > >> That's a lot of applications. I suspect there is something that keeps >> starting the application, which causes the folder to keep increasing in >> size. Can you just run get-app-info on dtcli on just one application and >> see what is being spawned up? >> >> David >> >> On Tue, Aug 25, 2015 at 11:44 PM, Shashi Vishwakarma < >> [email protected]> wrote: >> >>> Thanks David for detailed explanation. I checked apps directory in >>> HDFS,there are around 12858 application in that folder each of having 6.2 M >>> size. It will be a time consuming process to find status of each >>> application by running get-app-info in dtcli. So logged in to web >>> interface of datatorrent(port 9090) but there is no application running at >>> this moment. >>> >>> Still HDFS space utilization is increasing,any pointers on this? >>> >>> Thanks and Regards, >>> Shashi >>> >>> On Wed, Aug 26, 2015 at 2:16 AM, Amol Kekre <[email protected]> >>> wrote: >>> >>>> >>>> Adding [email protected] >>>> >>>> Thks, >>>> Amol >>>> >>>> >>>> On Tue, Aug 25, 2015 at 10:34 AM, David Yan <[email protected]> >>>> wrote: >>>> >>>>> Hi Shashi, >>>>> >>>>> That directory is where Apex stores application information, like >>>>> application jar files, checkpoints, container information, etc. >>>>> Please run this command to see which directory is taking the most >>>>> space. >>>>> >>>>> $ hdfs dfs -du /user/dtadmin/datatorrent/apps >>>>> >>>>> Then open dtcli and use the get-app-info command look at the >>>>> information of that application. For example: >>>>> >>>>> dt> get-app-info application_1439598948299_0557 >>>>> >>>>> The field "state" will tell you whether the application is running or >>>>> not. >>>>> >>>>> If you don't care about the application, you can safely kill it if >>>>> it's running and delete the HDFS directory by doing hdfs dfs -rm -r >>>>> /user/dtadmin/datatorrent/apps/application_xxx_yyy (replace xxx and yyy >>>>> with appropriate values). Note that doing so will wipe all stored >>>>> information about that application. >>>>> >>>>> David >>>>> >>>>> On Tue, Aug 25, 2015 at 6:32 AM, Shashi Vishwakarma < >>>>> [email protected]> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I have DataTorrent 3.x installed on my cluster.Even thought there is >>>>>> no data torrent application is running , still my hdfs space utilization >>>>>> goes on increasing. Below is hdfs path that has occupied most of the >>>>>> space. >>>>>> >>>>>> /user/dtadmin/datatorrent/apps >>>>>> >>>>>> Why this is happening? Am I missing something here? >>>>>> >>>>>> Thanks >>>>>> Shashi >>>>>> >>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "apex-dev" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to [email protected]. >>>>>> To post to this group, send email to [email protected]. >>>>>> To view this discussion on the web visit >>>>>> https://groups.google.com/d/msgid/apex-dev/8754d662-4948-4920-96f3-cb58f70d5f39%40googlegroups.com >>>>>> <https://groups.google.com/d/msgid/apex-dev/8754d662-4948-4920-96f3-cb58f70d5f39%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>> . >>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>> >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "apex-dev" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> To post to this group, send email to [email protected]. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/apex-dev/CAMqituP83nSGd4Ln6phTe0okyojwsE%3DGq22unu%3D-yDgyf0Y8tA%40mail.gmail.com >>>>> <https://groups.google.com/d/msgid/apex-dev/CAMqituP83nSGd4Ln6phTe0okyojwsE%3DGq22unu%3D-yDgyf0Y8tA%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>> >>>> >>> >> -- >> You received this message because you are subscribed to the Google Groups >> "apex-dev" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To post to this group, send email to [email protected]. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/apex-dev/CAMqituMeKHC84rJpFAHKbcFi-psC-zDqrOTRwQXhq75CbSQcBQ%40mail.gmail.com >> <https://groups.google.com/d/msgid/apex-dev/CAMqituMeKHC84rJpFAHKbcFi-psC-zDqrOTRwQXhq75CbSQcBQ%40mail.gmail.com?utm_medium=email&utm_source=footer> >> . >> >> For more options, visit https://groups.google.com/d/optout. >> > > > > -- > “I'd have blown my top, because I want to beat this damn thing, > as long as I've gone this far. I can't just leave it after I've found > out so much about it. I have to keep going to find out ultimately > what is the matter with it in the end." > Richard P. Feynman > > -- > You received this message because you are subscribed to the Google Groups > "apex-dev" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/apex-dev/CAHYazdeHVNqPgn8ABwic92HEkSrEoWU%3D_cXDw%2Brb5Li4GoDpww%40mail.gmail.com > <https://groups.google.com/d/msgid/apex-dev/CAHYazdeHVNqPgn8ABwic92HEkSrEoWU%3D_cXDw%2Brb5Li4GoDpww%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. >
