Gour,

Thanks for the prompt reply.


   1. Temp hickup in HDFS as possible cause has been on mind as well,
   wanted to reach out to slider community to check if there were other issues
   causing this symptom.
   2. I remember I had stopped and started the slider app after this time
   stamp. Apparently App Stop/Start did not delete this file. Can you confirm
   that behaviour ? Also would it make sense to have a enhancement to delete
   this file on App stop/start if indeed not being done ?

Thanks,

Manoj

On Wed, Jan 17, 2018 at 1:50 PM, Gour Saha <gs...@hortonworks.com> wrote:

> Manoj,
> By any chance is it possible to find out (maybe from logs or sar files) if
> there was HDFS unavailability (say NN node connection issue) around the
> time of 2018-01-06 00:33 (based on the readlock file timestamp)?
>
> -rw-r--r--   3 xxx xxx         23 2018-01-06 00:33
> hdfs://xxx/user/xxx/.slider/cluster/spas/readlock
>
>
> -Gour
>
> On 1/17/18, 1:05 PM, "Manoj Samel" <manojsamelt...@gmail.com> wrote:
>
> >Hello,
> >
> >Slider version 0.80 on CDH 5.5.1 cluster with kerberos
> >
> >Slider upgrade <App> --template /xxx/appConfig.json --resources
> >/xxx/resources.json --queue tenant --force failed with following trace
> >
> >2018-01-17 20:31:23,030 [main] INFO  tools.SliderUtils - JVM initialized
> >into secure mode with kerberos realm BIGDATA
> >2018-01-17 20:31:23,869 [main]
> >INFO  client.ConfiguredRMFailoverProxyProvider - Failing over to rm2
> >2018-01-17 20:31:24,325 [main] WARN  client.SliderClient - Failed to get a
> >Lock on Builder working with spas at
> >hdfs://xxx/user/xxx/.slider/cluster/spas :
> >org.apache.slider.core.persist.LockAcquireFailedException: Failed to
> >acquire lock hdfs://xxx/user/xxx/.slider/cluster/spas/readlock
> >org.apache.slider.core.persist.LockAcquireFailedException: Failed to
> >acquire lock hdfs://xxx/user/xxx/.slider/cluster/spas/readlock
> >    at
> >org.apache.slider.core.persist.ConfPersister.
> acquireWritelock(ConfPersiste
> >r.java:141)
> >
> >    at
> >org.apache.slider.core.persist.ConfPersister.save(ConfPersister.java:253)
> >    at
> >org.apache.slider.core.build.InstanceBuilder.persist(
> InstanceBuilder.java:
> >270)
> >
> >    at
> >org.apache.slider.client.SliderClient.persistInstanceDefinition(
> SliderClie
> >nt.java:1836)
> >
> >    at
> >org.apache.slider.client.SliderClient.buildInstanceDefinition(
> SliderClient
> >.java:1734)
> >
> >    at
> >org.apache.slider.client.SliderClient.actionUpgrade(
> SliderClient.java:802)
> >    at org.apache.slider.client.SliderClient.exec(SliderClient.java:542)
> >    at
> >org.apache.slider.client.SliderClient.runService(SliderClient.java:424)
> >    at
> >org.apache.slider.core.main.ServiceLauncher.launchService(
> ServiceLauncher.
> >java:188)
> >
> >    at
> >org.apache.slider.core.main.ServiceLauncher.
> launchServiceRobustly(ServiceL
> >auncher.java:475)
> >
> >    at
> >org.apache.slider.core.main.ServiceLauncher.
> launchServiceAndExit(ServiceLa
> >uncher.java:403)
> >
> >    at
> >org.apache.slider.core.main.ServiceLauncher.serviceMain(
> ServiceLauncher.ja
> >va:630)
> >
> >    at org.apache.slider.Slider.main(Slider.java:49)
> >2018-01-17 20:31:24,327 [main] ERROR main.ServiceLauncher - Failed to save
> >spas: org.apache.slider.core.persist.LockAcquireFailedException: Failed
> to
> >acquire lock hdfs://xxx/user/xxx/.slider/cluster/spas/readlock
> >2018-01-17 20:31:24,328 [main] INFO  util.ExitUtil - Exiting with status
> >70
> >
> >HDFS ls listing showed a file readlock was created few days back
> >
> >hdfs dfs -ls hdfs://xxx/user/xxx/.slider/cluster/spas
> >...
> >-rw-r--r--   3 xxx xxx         23 2018-01-06 00:33
> >hdfs://xxx/user/xxx/.slider/cluster/spas/readlock
> >...
> >
> >After deleting this file manually, the upgrade command works.
> >
> >Any idea when is this file created and why it was not removed ?
> >
> >Thanks in advance,
> >
> >Manoj
>
>

Reply via email to