Jonathon, thanks for the yarn config idea. That helps. Billie, Gour, I plan to upgrade, but I need to get this app to stabilize before I go changing anything else ;-)
Thanks, -david On 3/24/17, 9:32 AM, <gs...@hortonworks.com> wrote: Ha, my bad. I forgot they are using 0.91. I looked at the latest code and assumed that the check was there. Yup, as Billie mentioned you need to make the below change to move forward or upgrade to Slider 0.92.0 which we shipped yesterday. -Gour On 3/24/17, 9:23 AM, "Billie Rinaldi" <billie.rina...@gmail.com> wrote: >I think this is a bug that was fixed in the latest version of Slider, >0.92. >I didn't figure out why this happens sometimes, but it seems to be >resolved >by changing the following: > >diff --git a/slider-agent/src/main/python/agent/ActionQueue.py >b/slider-agent/src/main/python/agent/ActionQueue.py >index 7514337..e973337 100644 >--- a/slider-agent/src/main/python/agent/ActionQueue.py >+++ b/slider-agent/src/main/python/agent/ActionQueue.py >@@ -161,7 +161,7 @@ class ActionQueue(threading.Thread): > self.commandStatuses.put_command_status(command, in_progress_status, >reportResult) > > store_config = False >- if ActionQueue.STORE_APPLIED_CONFIG in command['commandParams']: >+ if 'commandParams' in command and ActionQueue.STORE_APPLIED_CONFIG in >command['commandParams']: > store_config = 'true' == >command['commandParams'][ActionQueue.STORE_APPLIED_CONFIG] > store_command = False > if 'roleParams' in command and command['roleParams'] is not None and >ActionQueue.AUTO_RESTART in command['roleParams']: > > >On Thu, Mar 23, 2017 at 6:25 PM, David.Serafini ><david.seraf...@target.com> >wrote: > >> Can anyone tell me what this error means and whether it is significant? >> I have a slider job that seems to randomly fail, and I don't see >>anything >> interesting in the AppMaster logs except this. (That doesn't mean >>there >> isn't an error elsewhere: yarn is wiping out the job directories as >>soon as >> the containter terminates: I haven't figured out how to fix that). >> >> In case it matters, my job is a shell script specified in metainfo.json >> in application.components.commands.exec . The script does some setup >> and then runs tomcat. >> >> thanks in advance, >> david >> >> >> Connecting to the server at https://brdn1088.target.com: >> 42721/ws/v1/slider/agents/... >> Registered with the server >> Traceback (most recent call last): >> File "./infra/agent/slider-agent/agent/main.py", line 318, in <module> >> main() >> File "./infra/agent/slider-agent/agent/main.py", line 311, in main >> controller.join(timeout=1.0) >> File "/usr/lib64/python2.6/threading.py", line 655, in join >> self.__block.wait(delay) >> File "/usr/lib64/python2.6/threading.py", line 258, in wait >> _sleep(delay) >> File "./infra/agent/slider-agent/agent/main.py", line 66, in >> signal_handler >> controller.actionQueue.execute_command(controller.stopCommand) >> File "/grid/4/hadoop/yarn/local/usercache/Z002JSF/appcache/ >> application_1490038663882_9176/filecache/11/slider- >> agent.tar.gz/slider-agent/agent/ActionQueue.py", line 164, in >> execute_command >> if ActionQueue.STORE_APPLIED_CONFIG in command['commandParams']: >> KeyError: 'commandParams' >> >> >>