[ https://issues.apache.org/jira/browse/HDFS-9908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wei-Chiu Chuang updated HDFS-9908: ---------------------------------- Priority: Major (was: Critical) > Datanode should tolerate disk failure during NN handshake > --------------------------------------------------------- > > Key: HDFS-9908 > URL: https://issues.apache.org/jira/browse/HDFS-9908 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode > Affects Versions: 2.5.0 > Environment: CDH5.3.3 > Reporter: Wei-Chiu Chuang > Assignee: Wei-Chiu Chuang > > DN may treat a disk failure exception as NN handshake exception, and this can > prevent a DN to join a cluster even if most of its disks are healthy. > During NN handshake, DN initializes block pools. It will create a lock files > per disk, and then scan the volumes. However, if the scanning throws > exceptions due to disk failure, DN will think it's an exception because NN is > inconsistent with the local storage (see {{DataNode#initBlockPool}}. As a > result, it will attempt to reconnect to NN again. > However, at this point, DN has not deleted its lock files on the disks. If it > reconnects to NN again, it will think the same disks are already being used, > and then it will fail handshake again because all disks can not be used (due > to locking), and repeatedly. This will happen even if the DN has multiple > disks, and only one of them fails. The DN will not be able to connect to NN > despite just one failing disk. Note that it is possible to successfully > create a lock file on a disk, and then has error scanning the disk. > We saw this on a CDH 5.3.3 cluster (which is based on Apache Hadoop 2.5.0, > and we still see the same bug in 3.0.0 trunk branch). The root cause is that > DN treats an internal error (single disk failure) as an external one (NN > handshake failure) and we should fix it. > {code:title=DataNode.java} > /** > * One of the Block Pools has successfully connected to its NN. > * This initializes the local storage for that block pool, > * checks consistency of the NN's cluster ID, etc. > * > * If this is the first block pool to register, this also initializes > * the datanode-scoped storage. > * > * @param bpos Block pool offer service > * @throws IOException if the NN is inconsistent with the local storage. > */ > void initBlockPool(BPOfferService bpos) throws IOException { > NamespaceInfo nsInfo = bpos.getNamespaceInfo(); > if (nsInfo == null) { > throw new IOException("NamespaceInfo not found: Block pool " + bpos > + " should have retrieved namespace info before initBlockPool."); > } > > setClusterId(nsInfo.clusterID, nsInfo.getBlockPoolID()); > // Register the new block pool with the BP manager. > blockPoolManager.addBlockPool(bpos); > > // In the case that this is the first block pool to connect, initialize > // the dataset, block scanners, etc. > initStorage(nsInfo); > // Exclude failed disks before initializing the block pools to avoid > startup > // failures. > checkDiskError(); > data.addBlockPool(nsInfo.getBlockPoolID(), conf); <----- this line > throws disk error exception > blockScanner.enableBlockPoolId(bpos.getBlockPoolId()); > initDirectoryScanner(conf); > } > {code} > {{FsVolumeList#addBlockPool}} is the source of exception. > {code:title=FsVolumeList.java} > void addBlockPool(final String bpid, final Configuration conf) throws > IOException { > long totalStartTime = Time.monotonicNow(); > > final List<IOException> exceptions = Collections.synchronizedList( > new ArrayList<IOException>()); > List<Thread> blockPoolAddingThreads = new ArrayList<Thread>(); > for (final FsVolumeImpl v : volumes) { > Thread t = new Thread() { > public void run() { > try (FsVolumeReference ref = v.obtainReference()) { > FsDatasetImpl.LOG.info("Scanning block pool " + bpid + > " on volume " + v + "..."); > long startTime = Time.monotonicNow(); > v.addBlockPool(bpid, conf); > long timeTaken = Time.monotonicNow() - startTime; > FsDatasetImpl.LOG.info("Time taken to scan block pool " + bpid + > " on " + v + ": " + timeTaken + "ms"); > } catch (ClosedChannelException e) { > // ignore. > } catch (IOException ioe) { > FsDatasetImpl.LOG.info("Caught exception while scanning " + v + > ". Will throw later.", ioe); > exceptions.add(ioe); > } > } > }; > blockPoolAddingThreads.add(t); > t.start(); > } > for (Thread t : blockPoolAddingThreads) { > try { > t.join(); > } catch (InterruptedException ie) { > throw new IOException(ie); > } > } > if (!exceptions.isEmpty()) { > throw exceptions.get(0); <----- here's the original of exception > } > > long totalTimeTaken = Time.monotonicNow() - totalStartTime; > FsDatasetImpl.LOG.info("Total time to scan all replicas for block pool " + > bpid + ": " + totalTimeTaken + "ms"); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)