Re: Review Request 59394: Race condition: webhdfs call mkdir /tmp/druid-indexing before /tmp making tmp not writable.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/59394/#review177018 --- Ship it! Ship It! - Dmytro Grinenko On May 19, 2017, 9:54 a.m., Andrew Onischuk wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/59394/ > --- > > (Updated May 19, 2017, 9:54 a.m.) > > > Review request for Ambari and Vitalyi Brodetskyi. > > > Bugs: AMBARI-21070 > https://issues.apache.org/jira/browse/AMBARI-21070 > > > Repository: ambari > > > Description > --- > > Race condition: webhdfs call mkdir /tmp/druid-indexing before /tmp making tmp > not writable. > > @HDP install through ambari , just at the step start components on host< > we > have some webhdfs operations in background which is creating HDFS directory > structures required for specific components like (/tmp, /tmp/hive /user/druid > /tmp/druid-indexing ...) > > generally the expected order is getfileInfo : /tmp --> mkdir: /tmp > changePermission: /tmp to 777 (hdfs:hdfs) so that /tmp is accessible to all , > hence hivemetastore able to create /tmp/hive(hive scratch directory) > > But here in this case specific to druid install , most of the times mkdir of > /tmp/druid-indexing called before(actual /tmp creation) and thus /tmp is > having just default directory permission(755). > > ->So next call of getfileInfo : /tmp says already exist it will not further > create and change permission > > This made /tmp not accessible to write, So HiveServer process gets shutdown as > it unable to create/access /tmp/hive. > > hdfs-audit log: > > > > > 2017-05-12 06:39:51,067 INFO FSNamesystem.audit: allowed=true ugi=hdfs > (auth:SIMPLE) ip=/172.27.26.3 cmd=getfileinfo src=/tmp/druid-indexing > dst=nullperm=null proto=webhdfs > 2017-05-12 06:39:51,120 INFO FSNamesystem.audit: allowed=true ugi=hdfs > (auth:SIMPLE) ip=/172.27.22.81cmd=contentSummary > src=/user/druid dst=nullperm=null proto=webhdfs > 2017-05-12 06:39:51,133 INFO FSNamesystem.audit: allowed=true ugi=hdfs > (auth:SIMPLE) ip=/172.27.37.200 cmd=setPermission > src=/ats/active dst=nullperm=hdfs:hadoop:rwxr-xr-x proto=webhdfs > 2017-05-12 06:39:51,155 INFO FSNamesystem.audit: allowed=true ugi=hdfs > (auth:SIMPLE) ip=/172.27.26.3 cmd=mkdirs src=/tmp/druid-indexing > dst=nullperm=hdfs:hdfs:rwxr-xr-xproto=webhdfs > 2017-05-12 06:39:51,206 INFO FSNamesystem.audit: allowed=true ugi=hdfs > (auth:SIMPLE) ip=/172.27.22.81cmd=listStatus src=/user/druid > dst=nullperm=null proto=webhdfs > 2017-05-12 06:39:51,235 INFO FSNamesystem.audit: allowed=true ugi=hdfs > (auth:SIMPLE) ip=/172.27.37.200 cmd=setPermission src=/ats/ > dst=nullperm=yarn:hadoop:rwxr-xr-x proto=webhdfs > 2017-05-12 06:39:51,249 INFO FSNamesystem.audit: allowed=true ugi=hdfs > (auth:SIMPLE) ip=/172.27.26.3 cmd=setPermission > src=/tmp/druid-indexing dst=nullperm=hdfs:hdfs:rwxr-xr-x > proto=webhdfs > 2017-05-12 06:39:51,290 INFO FSNamesystem.audit: allowed=true ugi=hdfs > (auth:SIMPLE) ip=/172.27.22.81cmd=listStatus src=/user/druid/data > dst=nullperm=null proto=webhdfs > 2017-05-12 06:39:51,339 INFO FSNamesystem.audit: allowed=true ugi=hdfs > (auth:SIMPLE) ip=/172.27.37.200 cmd=setPermission > src=/ats/active/dst=nullperm=hdfs:hadoop:rwxr-xr-x > proto=webhdfs > 2017-05-12 06:39:51,341 INFO FSNamesystem.audit: allowed=true ugi=hdfs > (auth:SIMPLE) ip=/172.27.26.3 cmd=setOwnersrc=/tmp/druid-indexing > dst=nullperm=druid:hdfs:rwxr-xr-x proto=webhdfs > 2017-05-12 06:39:51,380 INFO FSNamesystem.audit: allowed=true ugi=hdfs > (auth:SIMPLE) ip=/172.27.22.81cmd=setOwnersrc=/user/druid/data > dst=nullperm=druid:hdfs:rwxr-xr-x proto=webhdfs > 2017-05-12 06:39:51,431 INFO FSNamesystem.audit: allowed=true ugi=hdfs > (auth:SIMPLE) ip=/172.27.37.200 cmd=setOwnersrc=/ats/active > dst=nullperm=yarn:hadoop:rwxr-xr-x proto=webhdfs > 2017-05-12 06:39:51,526 INFO FSNamesystem.audit: allowed=true ugi=hdfs > (auth:SIMPLE) ip=/172.27.37.200 cmd=setOwnersrc=/ats/ > dst=nullperm=yarn:hadoop:rwxr-xr-x proto=webhdfs > 2017-05-12 06:39:51,580 INFO FSNamesystem.audit: allowed=true ugi=hdfs > (auth:SIMPLE) ip=/172.27.32.12cmd=getfileinfo > src=/apps/hbase/staging dst=nullperm=null proto=webhdfs > 2017-05-12 06:39:51,620 INFO FSNamesystem.audit: allowed=true ugi=hdfs
Review Request 59394: Race condition: webhdfs call mkdir /tmp/druid-indexing before /tmp making tmp not writable.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/59394/ --- Review request for Ambari and Vitalyi Brodetskyi. Bugs: AMBARI-21070 https://issues.apache.org/jira/browse/AMBARI-21070 Repository: ambari Description --- Race condition: webhdfs call mkdir /tmp/druid-indexing before /tmp making tmp not writable. @HDP install through ambari , just at the step start components on host< > we have some webhdfs operations in background which is creating HDFS directory structures required for specific components like (/tmp, /tmp/hive /user/druid /tmp/druid-indexing ...) generally the expected order is getfileInfo : /tmp --> mkdir: /tmp changePermission: /tmp to 777 (hdfs:hdfs) so that /tmp is accessible to all , hence hivemetastore able to create /tmp/hive(hive scratch directory) But here in this case specific to druid install , most of the times mkdir of /tmp/druid-indexing called before(actual /tmp creation) and thus /tmp is having just default directory permission(755). ->So next call of getfileInfo : /tmp says already exist it will not further create and change permission This made /tmp not accessible to write, So HiveServer process gets shutdown as it unable to create/access /tmp/hive. hdfs-audit log: 2017-05-12 06:39:51,067 INFO FSNamesystem.audit: allowed=true ugi=hdfs (auth:SIMPLE) ip=/172.27.26.3 cmd=getfileinfo src=/tmp/druid-indexing dst=null perm=null proto=webhdfs 2017-05-12 06:39:51,120 INFO FSNamesystem.audit: allowed=true ugi=hdfs (auth:SIMPLE) ip=/172.27.22.81cmd=contentSummary src=/user/druid dst=nullperm=null proto=webhdfs 2017-05-12 06:39:51,133 INFO FSNamesystem.audit: allowed=true ugi=hdfs (auth:SIMPLE) ip=/172.27.37.200 cmd=setPermission src=/ats/active dst=nullperm=hdfs:hadoop:rwxr-xr-x proto=webhdfs 2017-05-12 06:39:51,155 INFO FSNamesystem.audit: allowed=true ugi=hdfs (auth:SIMPLE) ip=/172.27.26.3 cmd=mkdirs src=/tmp/druid-indexing dst=null perm=hdfs:hdfs:rwxr-xr-xproto=webhdfs 2017-05-12 06:39:51,206 INFO FSNamesystem.audit: allowed=true ugi=hdfs (auth:SIMPLE) ip=/172.27.22.81cmd=listStatus src=/user/druid dst=null perm=null proto=webhdfs 2017-05-12 06:39:51,235 INFO FSNamesystem.audit: allowed=true ugi=hdfs (auth:SIMPLE) ip=/172.27.37.200 cmd=setPermission src=/ats/ dst=nullperm=yarn:hadoop:rwxr-xr-x proto=webhdfs 2017-05-12 06:39:51,249 INFO FSNamesystem.audit: allowed=true ugi=hdfs (auth:SIMPLE) ip=/172.27.26.3 cmd=setPermission src=/tmp/druid-indexing dst=nullperm=hdfs:hdfs:rwxr-xr-xproto=webhdfs 2017-05-12 06:39:51,290 INFO FSNamesystem.audit: allowed=true ugi=hdfs (auth:SIMPLE) ip=/172.27.22.81cmd=listStatus src=/user/druid/data dst=nullperm=null proto=webhdfs 2017-05-12 06:39:51,339 INFO FSNamesystem.audit: allowed=true ugi=hdfs (auth:SIMPLE) ip=/172.27.37.200 cmd=setPermission src=/ats/active/ dst=nullperm=hdfs:hadoop:rwxr-xr-x proto=webhdfs 2017-05-12 06:39:51,341 INFO FSNamesystem.audit: allowed=true ugi=hdfs (auth:SIMPLE) ip=/172.27.26.3 cmd=setOwnersrc=/tmp/druid-indexing dst=null perm=druid:hdfs:rwxr-xr-x proto=webhdfs 2017-05-12 06:39:51,380 INFO FSNamesystem.audit: allowed=true ugi=hdfs (auth:SIMPLE) ip=/172.27.22.81cmd=setOwnersrc=/user/druid/data dst=nullperm=druid:hdfs:rwxr-xr-x proto=webhdfs 2017-05-12 06:39:51,431 INFO FSNamesystem.audit: allowed=true ugi=hdfs (auth:SIMPLE) ip=/172.27.37.200 cmd=setOwnersrc=/ats/active dst=null perm=yarn:hadoop:rwxr-xr-x proto=webhdfs 2017-05-12 06:39:51,526 INFO FSNamesystem.audit: allowed=true ugi=hdfs (auth:SIMPLE) ip=/172.27.37.200 cmd=setOwnersrc=/ats/ dst=null perm=yarn:hadoop:rwxr-xr-x proto=webhdfs 2017-05-12 06:39:51,580 INFO FSNamesystem.audit: allowed=true ugi=hdfs (auth:SIMPLE) ip=/172.27.32.12cmd=getfileinfo src=/apps/hbase/staging dst=nullperm=null proto=webhdfs 2017-05-12 06:39:51,620 INFO FSNamesystem.audit: allowed=true ugi=hdfs (auth:SIMPLE) ip=/172.27.37.200 cmd=setOwnersrc=/ats/active/ dst=nullperm=yarn:hadoop:rwxr-xr-x proto=webhdfs 2017-05-12 06:39:53,289 INFO FSNamesystem.audit: allowed=true ugi=hdfs (auth:SIMPLE) ip=/172.27.26.202 cmd=getfileinfo src=/tmpdst=null perm=null proto=webhdfs We can see in the log accessing /tmp/druid-indexing at 06:39:51(hence /tmp/have just 755 permission as per call), and accessing /tmp(getfileinfo) at 06:39:53, which returns /tmp already