[jira] [Commented] (HDFS-14437) Exception happened when rollEditLog expects empty EditsDoubleBuffer.bufCurrent but not
[ https://issues.apache.org/jira/browse/HDFS-14437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16821040#comment-16821040 ] He Xiaoqiao commented on HDFS-14437: [~angerszhuuu],Thanks for your correction, I will check it later. > Exception happened when rollEditLog expects empty > EditsDoubleBuffer.bufCurrent but not > - > > Key: HDFS-14437 > URL: https://issues.apache.org/jira/browse/HDFS-14437 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, namenode, qjm >Reporter: angerszhu >Priority: Major > > For the problem mentioned in https://issues.apache.org/jira/browse/HDFS-10943 > , I have sort the process of write and flush EditLog and some important > function, I found the in the class FSEditLog class, the close() function > will call such process like below: > > {code:java} > waitForSyncToFinish(); > endCurrentLogSegment(true);{code} > since we have gain the object lock in the function close(), so when > waitForSyncToFish() method return, it mean all logSync job has done and all > data in bufReady has been flushed out, and since current thread has the lock > of this object, when call endCurrentLogSegment(), no other thread will gain > the lock so they can't write new editlog into currentBuf. > But when we don't call waitForSyncToFish() before endCurrentLogSegment(), > there may be some autoScheduled logSync()'s flush process is doing, since > this process don't need > synchronization since it has mention in the comment of logSync() method : > > {code:java} > /** > * Sync all modifications done by this thread. > * > * The internal concurrency design of this class is as follows: > * - Log items are written synchronized into an in-memory buffer, > * and each assigned a transaction ID. > * - When a thread (client) would like to sync all of its edits, logSync() > * uses a ThreadLocal transaction ID to determine what edit number must > * be synced to. > * - The isSyncRunning volatile boolean tracks whether a sync is currently > * under progress. > * > * The data is double-buffered within each edit log implementation so that > * in-memory writing can occur in parallel with the on-disk writing. > * > * Each sync occurs in three steps: > * 1. synchronized, it swaps the double buffer and sets the isSyncRunning > * flag. > * 2. unsynchronized, it flushes the data to storage > * 3. synchronized, it resets the flag and notifies anyone waiting on the > * sync. > * > * The lack of synchronization on step 2 allows other threads to continue > * to write into the memory buffer while the sync is in progress. > * Because this step is unsynchronized, actions that need to avoid > * concurrency with sync() should be synchronized and also call > * waitForSyncToFinish() before assuming they are running alone. > */ > public void logSync() { > long syncStart = 0; > // Fetch the transactionId of this thread. > long mytxid = myTransactionId.get().txid; > > boolean sync = false; > try { > EditLogOutputStream logStream = null; > synchronized (this) { > try { > printStatistics(false); > // if somebody is already syncing, then wait > while (mytxid > synctxid && isSyncRunning) { > try { > wait(1000); > } catch (InterruptedException ie) { > } > } > // > // If this transaction was already flushed, then nothing to do > // > if (mytxid <= synctxid) { > numTransactionsBatchedInSync++; > if (metrics != null) { > // Metrics is non-null only when used inside name node > metrics.incrTransactionsBatchedInSync(); > } > return; > } > > // now, this thread will do the sync > syncStart = txid; > isSyncRunning = true; > sync = true; > // swap buffers > try { > if (journalSet.isEmpty()) { > throw new IOException("No journals available to flush"); > } > editLogStream.setReadyToFlush(); > } catch (IOException e) { > final String msg = > "Could not sync enough journals to persistent storage " + > "due to " + e.getMessage() + ". " + > "Unsynced transactions: " + (txid - synctxid); > LOG.fatal(msg, new Exception()); > synchronized(journalSetLock) { > IOUtils.cleanup(LOG, journalSet); > } > terminate(1, msg); > } > } finally { > // Prevent RuntimeException from blocking other log edit write > doneWithAutoSyncScheduling(); > } > //editLogStream may become null, > //so store a
[jira] [Commented] (HDFS-14437) Exception happened when rollEditLog expects empty EditsDoubleBuffer.bufCurrent but not
[ https://issues.apache.org/jira/browse/HDFS-14437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16820974#comment-16820974 ] He Xiaoqiao commented on HDFS-14437: Thanks [~starphin]. {quote}But the table you presented will hardly occurred, for rollEditLog and FSEditLog#logEdit are almost called with FSNamesystem.fsLock holding such as mkdir, setAcl. I'm still working on it to find that racing case.{quote} It is complex to reproduce actually, especially in unit test. I think we could construct the right way if we make sure it is truth. for instance multi-thread to logSync, just my own opinion, maybe there is more graceful way. I think it is not necessary to consider #fsLock as mentioned above, it is not related with #fsLock, we should test FSEditLog independently. > Exception happened when rollEditLog expects empty > EditsDoubleBuffer.bufCurrent but not > - > > Key: HDFS-14437 > URL: https://issues.apache.org/jira/browse/HDFS-14437 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, namenode, qjm >Reporter: angerszhu >Priority: Major > > For the problem mentioned in https://issues.apache.org/jira/browse/HDFS-10943 > , I have sort the process of write and flush EditLog and some important > function, I found the in the class FSEditLog class, the close() function > will call such process like below: > > {code:java} > waitForSyncToFinish(); > endCurrentLogSegment(true);{code} > since we have gain the object lock in the function close(), so when > waitForSyncToFish() method return, it mean all logSync job has done and all > data in bufReady has been flushed out, and since current thread has the lock > of this object, when call endCurrentLogSegment(), no other thread will gain > the lock so they can't write new editlog into currentBuf. > But when we don't call waitForSyncToFish() before endCurrentLogSegment(), > there may be some autoScheduled logSync()'s flush process is doing, since > this process don't need > synchronization since it has mention in the comment of logSync() method : > > {code:java} > /** > * Sync all modifications done by this thread. > * > * The internal concurrency design of this class is as follows: > * - Log items are written synchronized into an in-memory buffer, > * and each assigned a transaction ID. > * - When a thread (client) would like to sync all of its edits, logSync() > * uses a ThreadLocal transaction ID to determine what edit number must > * be synced to. > * - The isSyncRunning volatile boolean tracks whether a sync is currently > * under progress. > * > * The data is double-buffered within each edit log implementation so that > * in-memory writing can occur in parallel with the on-disk writing. > * > * Each sync occurs in three steps: > * 1. synchronized, it swaps the double buffer and sets the isSyncRunning > * flag. > * 2. unsynchronized, it flushes the data to storage > * 3. synchronized, it resets the flag and notifies anyone waiting on the > * sync. > * > * The lack of synchronization on step 2 allows other threads to continue > * to write into the memory buffer while the sync is in progress. > * Because this step is unsynchronized, actions that need to avoid > * concurrency with sync() should be synchronized and also call > * waitForSyncToFinish() before assuming they are running alone. > */ > public void logSync() { > long syncStart = 0; > // Fetch the transactionId of this thread. > long mytxid = myTransactionId.get().txid; > > boolean sync = false; > try { > EditLogOutputStream logStream = null; > synchronized (this) { > try { > printStatistics(false); > // if somebody is already syncing, then wait > while (mytxid > synctxid && isSyncRunning) { > try { > wait(1000); > } catch (InterruptedException ie) { > } > } > // > // If this transaction was already flushed, then nothing to do > // > if (mytxid <= synctxid) { > numTransactionsBatchedInSync++; > if (metrics != null) { > // Metrics is non-null only when used inside name node > metrics.incrTransactionsBatchedInSync(); > } > return; > } > > // now, this thread will do the sync > syncStart = txid; > isSyncRunning = true; > sync = true; > // swap buffers > try { > if (journalSet.isEmpty()) { > throw new IOException("No journals available to flush"); > } > editLogStream.setReadyToFlush(); > } catch (IOException e) { > final String msg = > "Could not sync
[jira] [Commented] (HDFS-14437) Exception happened when rollEditLog expects empty EditsDoubleBuffer.bufCurrent but not
[ https://issues.apache.org/jira/browse/HDFS-14437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16820953#comment-16820953 ] He Xiaoqiao commented on HDFS-14437: [~angerszhuuu],[~starphin] please help to double check. [~angerszhuuu] would you like to submit patch to fix this issue, If need any help, please send ping to me. > Exception happened when rollEditLog expects empty > EditsDoubleBuffer.bufCurrent but not > - > > Key: HDFS-14437 > URL: https://issues.apache.org/jira/browse/HDFS-14437 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, namenode, qjm >Reporter: angerszhu >Priority: Major > > For the problem mentioned in https://issues.apache.org/jira/browse/HDFS-10943 > , I have sort the process of write and flush EditLog and some important > function, I found the in the class FSEditLog class, the close() function > will call such process like below: > > {code:java} > waitForSyncToFinish(); > endCurrentLogSegment(true);{code} > since we have gain the object lock in the function close(), so when > waitForSyncToFish() method return, it mean all logSync job has done and all > data in bufReady has been flushed out, and since current thread has the lock > of this object, when call endCurrentLogSegment(), no other thread will gain > the lock so they can't write new editlog into currentBuf. > But when we don't call waitForSyncToFish() before endCurrentLogSegment(), > there may be some autoScheduled logSync()'s flush process is doing, since > this process don't need > synchronization since it has mention in the comment of logSync() method : > > {code:java} > /** > * Sync all modifications done by this thread. > * > * The internal concurrency design of this class is as follows: > * - Log items are written synchronized into an in-memory buffer, > * and each assigned a transaction ID. > * - When a thread (client) would like to sync all of its edits, logSync() > * uses a ThreadLocal transaction ID to determine what edit number must > * be synced to. > * - The isSyncRunning volatile boolean tracks whether a sync is currently > * under progress. > * > * The data is double-buffered within each edit log implementation so that > * in-memory writing can occur in parallel with the on-disk writing. > * > * Each sync occurs in three steps: > * 1. synchronized, it swaps the double buffer and sets the isSyncRunning > * flag. > * 2. unsynchronized, it flushes the data to storage > * 3. synchronized, it resets the flag and notifies anyone waiting on the > * sync. > * > * The lack of synchronization on step 2 allows other threads to continue > * to write into the memory buffer while the sync is in progress. > * Because this step is unsynchronized, actions that need to avoid > * concurrency with sync() should be synchronized and also call > * waitForSyncToFinish() before assuming they are running alone. > */ > public void logSync() { > long syncStart = 0; > // Fetch the transactionId of this thread. > long mytxid = myTransactionId.get().txid; > > boolean sync = false; > try { > EditLogOutputStream logStream = null; > synchronized (this) { > try { > printStatistics(false); > // if somebody is already syncing, then wait > while (mytxid > synctxid && isSyncRunning) { > try { > wait(1000); > } catch (InterruptedException ie) { > } > } > // > // If this transaction was already flushed, then nothing to do > // > if (mytxid <= synctxid) { > numTransactionsBatchedInSync++; > if (metrics != null) { > // Metrics is non-null only when used inside name node > metrics.incrTransactionsBatchedInSync(); > } > return; > } > > // now, this thread will do the sync > syncStart = txid; > isSyncRunning = true; > sync = true; > // swap buffers > try { > if (journalSet.isEmpty()) { > throw new IOException("No journals available to flush"); > } > editLogStream.setReadyToFlush(); > } catch (IOException e) { > final String msg = > "Could not sync enough journals to persistent storage " + > "due to " + e.getMessage() + ". " + > "Unsynced transactions: " + (txid - synctxid); > LOG.fatal(msg, new Exception()); > synchronized(journalSetLock) { > IOUtils.cleanup(LOG, journalSet); > } > terminate(1, msg); > } > } finally { > // Prevent RuntimeException from blocking other log edit write >
[jira] [Comment Edited] (HDFS-14437) Exception happened when rollEditLog expects empty EditsDoubleBuffer.bufCurrent but not
[ https://issues.apache.org/jira/browse/HDFS-14437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16820942#comment-16820942 ] He Xiaoqiao edited comment on HDFS-14437 at 4/18/19 10:30 AM: -- Thanks [~angerszhuuu],[~starphin], It looks that root cause is clearer. some minor comment: [~starphin], IIUC, all FSEditLog#logSync is out of FsLock and only FSEditLog#rollEditlog hold on FsLock, so {{FSEditLog}} is not related with {{FsLock}} overall. Thus {{FSEditLog}} using {{synchronized}} to control concurrency. In my opinion, The following code segment in FSEditLog#logSync which is out of {{synchronized}} is core reason. {code:java} try { if (logStream != null) { logStream.flush(); } } {code} Consider scenario: ||Time||Thread1(rollEditlog)||Thread2(invoke logSync by somewhat NN write op)|| |t1| |*enter synchronized*| |t2| |check Syncing| |t3| |set #syncStart and #isSyncRunning| |t4| |swap buffers| |t5| |do auto sync scheduling| |t6| |*exit synchronized*| |t7|*enter synchronized*| | |t8|endCurrentLogSegment#logEdit| | |t9|endCurrentLogSegment#logSyncAll (then doble buffer will be empty)| | |t10|checkArgument pass about txid| | |t11| |logStream.flush() (then double buffer will fill will editlog record, maybe many records here)| |t12|journalSet.finalizeLogSegment| | |t13|try to close JournalAndStream but fail since double buffer not | | |t14|exception and terminate| | I try to reproduce this issue with Unit test, first try to stop rollEditLog at {{journalSet.finalizeLogSegment}} in #endCurrentLogSegment, then #logEdit something and #logSync, after that, resume #rollEditLog, the expect exception appears. If this is truth, I am confused that: 1. Why only few report about this issue, IMO, the probability may be high. 2. I agree to invoke #waitForSyncToFinish at the beginning of #rollEditLog could resolve this issue. Please correct if something I missed, Thanks again. was (Author: hexiaoqiao): Thanks [~angerszhuuu],[~starphin], It looks that root cause is clearer. some minor comment: [~starphin], IIUC, all FSEditLog#logSync is out of FsLock and only FSEditLog#rollEditlog hold on FsLock, so {{FSEditLog}} is not related with {{FsLock}} overall. Thus {{FSEditLog}} using {{synchronized}} to control concurrency. In my opinion, The following code segment in FSEditLog#logSync which is out of {{synchronized}} is core reason. {code:java} try { if (logStream != null) { logStream.flush(); } } {code} Consider scenario: ||Time||Thread1(rollEditlog)||Thread2(invoke logSync by somewhat NN write op)|| |t1| |*enter synchronized*| |t2| |check Syncing| |t3| |set #syncStart and #isSyncRunning| |t4| |swap buffers| |t5| |do auto sync scheduling| |t6| |*exit synchronized*| |t7|*enter synchronized*| | |t8|endCurrentLogSegment#logEdit| | |t9|endCurrentLogSegment#logSyncAll (then doble buffer will be empty)| | |t10|checkArgument pass about txid| | |t11| |logStream.flush() (then double buffer will fill will editlog record, maybe many records here)| |t12|journalSet.finalizeLogSegment| | |t13|try to close JournalAndStream but fail since double buffer not | | |t14|exception and terminate| | I try to reproduce this issue with Unit test, first try to stop rollEditLog at {{journalSet.finalizeLogSegment}} in #endCurrentLogSegment, then #logEdit something and #logSync, after that, resume #rollEditLog, the expect exception appears. If this is truth, I am confused that: 1. Why only few report about this issue, IMO, the probability may be high. 2. How to fix that, only invoke #waitForSyncToFinish at the beginning of #rollEditLog could not be resolved. Please correct if something I missed, Thanks again. > Exception happened when rollEditLog expects empty > EditsDoubleBuffer.bufCurrent but not > - > > Key: HDFS-14437 > URL: https://issues.apache.org/jira/browse/HDFS-14437 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, namenode, qjm >Reporter: angerszhu >Priority: Major > > For the problem mentioned in https://issues.apache.org/jira/browse/HDFS-10943 > , I have sort the process of write and flush EditLog and some important > function, I found the in the class FSEditLog class, the close() function > will call such process like below: > > {code:java} > waitForSyncToFinish(); > endCurrentLogSegment(true);{code} > since we have gain the object lock in the function close(), so when > waitForSyncToFish() method return, it mean all logSync job has done and all > data in bufReady has been flushed out, and since current thread has the lock > of this object, when call endCurrentLogSegment(), no other thread will gain > the lock so they can't write new editlog into
[jira] [Commented] (HDFS-14437) Exception happened when rollEditLog expects empty EditsDoubleBuffer.bufCurrent but not
[ https://issues.apache.org/jira/browse/HDFS-14437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16820942#comment-16820942 ] He Xiaoqiao commented on HDFS-14437: Thanks [~angerszhuuu],[~starphin], It looks that root cause is clearer. some minor comment: [~starphin], IIUC, all FSEditLog#logSync is out of FsLock and only FSEditLog#rollEditlog hold on FsLock, so {{FSEditLog}} is not related with {{FsLock}} overall. Thus {{FSEditLog}} using {{synchronized}} to control concurrency. In my opinion, The following code segment in FSEditLog#logSync which is out of {{synchronized}} is core reason. {code:java} try { if (logStream != null) { logStream.flush(); } } {code} Consider scenario: ||Time||Thread1(rollEditlog)||Thread2(invoke logSync by somewhat NN write op)|| |t1| |*enter synchronized*| |t2| |check Syncing| |t3| |set #syncStart and #isSyncRunning| |t4| |swap buffers| |t5| |do auto sync scheduling| |t6| |*exit synchronized*| |t7|*enter synchronized*| | |t8|endCurrentLogSegment#logEdit| | |t9|endCurrentLogSegment#logSyncAll (then doble buffer will be empty)| | |t10|checkArgument pass about txid| | |t11| |logStream.flush() (then double buffer will fill will editlog record, maybe many records here)| |t12|journalSet.finalizeLogSegment| | |t13|try to close JournalAndStream but fail since double buffer not | | |t14|exception and terminate| | I try to reproduce this issue with Unit test, first try to stop rollEditLog at {{journalSet.finalizeLogSegment}} in #endCurrentLogSegment, then #logEdit something and #logSync, after that, resume #rollEditLog, the expect exception appears. If this is truth, I am confused that: 1. Why only few report about this issue, IMO, the probability may be high. 2. How to fix that, only invoke #waitForSyncToFinish at the beginning of #rollEditLog could not be resolved. Please correct if something I missed, Thanks again. > Exception happened when rollEditLog expects empty > EditsDoubleBuffer.bufCurrent but not > - > > Key: HDFS-14437 > URL: https://issues.apache.org/jira/browse/HDFS-14437 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, namenode, qjm >Reporter: angerszhu >Priority: Major > > For the problem mentioned in https://issues.apache.org/jira/browse/HDFS-10943 > , I have sort the process of write and flush EditLog and some important > function, I found the in the class FSEditLog class, the close() function > will call such process like below: > > {code:java} > waitForSyncToFinish(); > endCurrentLogSegment(true);{code} > since we have gain the object lock in the function close(), so when > waitForSyncToFish() method return, it mean all logSync job has done and all > data in bufReady has been flushed out, and since current thread has the lock > of this object, when call endCurrentLogSegment(), no other thread will gain > the lock so they can't write new editlog into currentBuf. > But when we don't call waitForSyncToFish() before endCurrentLogSegment(), > there may be some autoScheduled logSync()'s flush process is doing, since > this process don't need > synchronization since it has mention in the comment of logSync() method : > > {code:java} > /** > * Sync all modifications done by this thread. > * > * The internal concurrency design of this class is as follows: > * - Log items are written synchronized into an in-memory buffer, > * and each assigned a transaction ID. > * - When a thread (client) would like to sync all of its edits, logSync() > * uses a ThreadLocal transaction ID to determine what edit number must > * be synced to. > * - The isSyncRunning volatile boolean tracks whether a sync is currently > * under progress. > * > * The data is double-buffered within each edit log implementation so that > * in-memory writing can occur in parallel with the on-disk writing. > * > * Each sync occurs in three steps: > * 1. synchronized, it swaps the double buffer and sets the isSyncRunning > * flag. > * 2. unsynchronized, it flushes the data to storage > * 3. synchronized, it resets the flag and notifies anyone waiting on the > * sync. > * > * The lack of synchronization on step 2 allows other threads to continue > * to write into the memory buffer while the sync is in progress. > * Because this step is unsynchronized, actions that need to avoid > * concurrency with sync() should be synchronized and also call > * waitForSyncToFinish() before assuming they are running alone. > */ > public void logSync() { > long syncStart = 0; > // Fetch the transactionId of this thread. > long mytxid = myTransactionId.get().txid; > > boolean sync = false; > try { > EditLogOutputStream logStream = null; >
[jira] [Commented] (HDFS-14437) Exception happened when rollEditLog expects empty EditsDoubleBuffer.bufCurrent but not
[ https://issues.apache.org/jira/browse/HDFS-14437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16820750#comment-16820750 ] He Xiaoqiao commented on HDFS-14437: [~angerszhuuu], tracking code logic again, and do you mean the following run-sequence cause this issue? ||Time||Thread1(rollEditlog)||Thread2(somewhat write)|| |t1|logEdit|-| |t2|logSyncAll|-| |t3|-|logSync| |t4|finalize|-| |t5|exception and terminate| | > Exception happened when rollEditLog expects empty > EditsDoubleBuffer.bufCurrent but not > - > > Key: HDFS-14437 > URL: https://issues.apache.org/jira/browse/HDFS-14437 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, namenode, qjm >Reporter: angerszhu >Priority: Major > > For the problem mentioned in https://issues.apache.org/jira/browse/HDFS-10943 > , I have sort the process of write and flush EditLog and some important > function, I found the in the class FSEditLog class, the close() function > will call such process like below: > > {code:java} > waitForSyncToFinish(); > endCurrentLogSegment(true);{code} > since we have gain the object lock in the function close(), so when > waitForSyncToFish() method return, it mean all logSync job has done and all > data in bufReady has been flushed out, and since current thread has the lock > of this object, when call endCurrentLogSegment(), no other thread will gain > the lock so they can't write new editlog into currentBuf. > But when we don't call waitForSyncToFish() before endCurrentLogSegment(), > there may be some autoScheduled logSync()'s flush process is doing, since > this process don't need > synchronization since it has mention in the comment of logSync() method : > > {code:java} > /** > * Sync all modifications done by this thread. > * > * The internal concurrency design of this class is as follows: > * - Log items are written synchronized into an in-memory buffer, > * and each assigned a transaction ID. > * - When a thread (client) would like to sync all of its edits, logSync() > * uses a ThreadLocal transaction ID to determine what edit number must > * be synced to. > * - The isSyncRunning volatile boolean tracks whether a sync is currently > * under progress. > * > * The data is double-buffered within each edit log implementation so that > * in-memory writing can occur in parallel with the on-disk writing. > * > * Each sync occurs in three steps: > * 1. synchronized, it swaps the double buffer and sets the isSyncRunning > * flag. > * 2. unsynchronized, it flushes the data to storage > * 3. synchronized, it resets the flag and notifies anyone waiting on the > * sync. > * > * The lack of synchronization on step 2 allows other threads to continue > * to write into the memory buffer while the sync is in progress. > * Because this step is unsynchronized, actions that need to avoid > * concurrency with sync() should be synchronized and also call > * waitForSyncToFinish() before assuming they are running alone. > */ > public void logSync() { > long syncStart = 0; > // Fetch the transactionId of this thread. > long mytxid = myTransactionId.get().txid; > > boolean sync = false; > try { > EditLogOutputStream logStream = null; > synchronized (this) { > try { > printStatistics(false); > // if somebody is already syncing, then wait > while (mytxid > synctxid && isSyncRunning) { > try { > wait(1000); > } catch (InterruptedException ie) { > } > } > // > // If this transaction was already flushed, then nothing to do > // > if (mytxid <= synctxid) { > numTransactionsBatchedInSync++; > if (metrics != null) { > // Metrics is non-null only when used inside name node > metrics.incrTransactionsBatchedInSync(); > } > return; > } > > // now, this thread will do the sync > syncStart = txid; > isSyncRunning = true; > sync = true; > // swap buffers > try { > if (journalSet.isEmpty()) { > throw new IOException("No journals available to flush"); > } > editLogStream.setReadyToFlush(); > } catch (IOException e) { > final String msg = > "Could not sync enough journals to persistent storage " + > "due to " + e.getMessage() + ". " + > "Unsynced transactions: " + (txid - synctxid); > LOG.fatal(msg, new Exception()); > synchronized(journalSetLock) { > IOUtils.cleanup(LOG, journalSet); > } > terminate(1, msg); > } > }
[jira] [Commented] (HDFS-14437) Exception happened when rollEditLog expects empty EditsDoubleBuffer.bufCurrent but not
[ https://issues.apache.org/jira/browse/HDFS-14437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16820742#comment-16820742 ] He Xiaoqiao commented on HDFS-14437: [~angerszhuuu], Thanks for the ping. It is interesting dig. I think it is OK here if attach somewhat documents. I confused detailed explain about #endCurrentLogSegment: When we invoke #endCurrentLogSegment, it will do all log sync firstly, then finalize log segment, which are all in synchronized. IIUC, it should guarantee that edit double-buffer are empty before finalize. Look forward the truth. Thanks again. > Exception happened when rollEditLog expects empty > EditsDoubleBuffer.bufCurrent but not > - > > Key: HDFS-14437 > URL: https://issues.apache.org/jira/browse/HDFS-14437 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, namenode, qjm >Reporter: angerszhu >Priority: Major > > For the problem mentioned in https://issues.apache.org/jira/browse/HDFS-10943 > , I have sort the process of write and flush EditLog and some important > function, I found the in the class FSEditLog class, the close() function > will call such process like below: > > {code:java} > waitForSyncToFinish(); > endCurrentLogSegment(true);{code} > since we have gain the object lock in the function close(), so when > waitForSyncToFish() method return, it mean all logSync job has done and all > data in bufReady has been flushed out, and since current thread has the lock > of this object, when call endCurrentLogSegment(), no other thread will gain > the lock so they can't write new editlog into currentBuf. > But when we don't call waitForSyncToFish() before endCurrentLogSegment(), > there may be some autoScheduled logSync()'s flush process is doing, since > this process don't need > synchronization since it has mention in the comment of logSync() method : > > {code:java} > /** > * Sync all modifications done by this thread. > * > * The internal concurrency design of this class is as follows: > * - Log items are written synchronized into an in-memory buffer, > * and each assigned a transaction ID. > * - When a thread (client) would like to sync all of its edits, logSync() > * uses a ThreadLocal transaction ID to determine what edit number must > * be synced to. > * - The isSyncRunning volatile boolean tracks whether a sync is currently > * under progress. > * > * The data is double-buffered within each edit log implementation so that > * in-memory writing can occur in parallel with the on-disk writing. > * > * Each sync occurs in three steps: > * 1. synchronized, it swaps the double buffer and sets the isSyncRunning > * flag. > * 2. unsynchronized, it flushes the data to storage > * 3. synchronized, it resets the flag and notifies anyone waiting on the > * sync. > * > * The lack of synchronization on step 2 allows other threads to continue > * to write into the memory buffer while the sync is in progress. > * Because this step is unsynchronized, actions that need to avoid > * concurrency with sync() should be synchronized and also call > * waitForSyncToFinish() before assuming they are running alone. > */ > public void logSync() { > long syncStart = 0; > // Fetch the transactionId of this thread. > long mytxid = myTransactionId.get().txid; > > boolean sync = false; > try { > EditLogOutputStream logStream = null; > synchronized (this) { > try { > printStatistics(false); > // if somebody is already syncing, then wait > while (mytxid > synctxid && isSyncRunning) { > try { > wait(1000); > } catch (InterruptedException ie) { > } > } > // > // If this transaction was already flushed, then nothing to do > // > if (mytxid <= synctxid) { > numTransactionsBatchedInSync++; > if (metrics != null) { > // Metrics is non-null only when used inside name node > metrics.incrTransactionsBatchedInSync(); > } > return; > } > > // now, this thread will do the sync > syncStart = txid; > isSyncRunning = true; > sync = true; > // swap buffers > try { > if (journalSet.isEmpty()) { > throw new IOException("No journals available to flush"); > } > editLogStream.setReadyToFlush(); > } catch (IOException e) { > final String msg = > "Could not sync enough journals to persistent storage " + > "due to " + e.getMessage() + ". " + > "Unsynced transactions: " + (txid - synctxid); > LOG.fatal(msg, new Exception()); >
[jira] [Commented] (HDFS-14430) RBF: Fix MockNamenode bug about mocking RPC getListing and mkdir
[ https://issues.apache.org/jira/browse/HDFS-14430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16819724#comment-16819724 ] He Xiaoqiao commented on HDFS-14430: [~elgoiri],[~ayushtkn] Thanks for point out that, it makes sense for me to fix it in HDFS-14117. I will watch that issue and will close this one later. Thanks. > RBF: Fix MockNamenode bug about mocking RPC getListing and mkdir > > > Key: HDFS-14430 > URL: https://issues.apache.org/jira/browse/HDFS-14430 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-14430-HDFS-13891.001.patch > > > Some unexpected result when invoke mocking #getListing and #mkdirs in current > MockNamenode implement. > * for mock mkdirs, we do not check if parent directory exists. > * for mock getListing, some child dirs/files are not listing. > It may be cause some unexpected result and cause some unit test fail. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14430) RBF: Fix MockNamenode bug about mocking RPC getListing and mkdir
[ https://issues.apache.org/jira/browse/HDFS-14430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16819107#comment-16819107 ] He Xiaoqiao commented on HDFS-14430: Thanks [~ayushtkn] for your quick response, I just check interface using MockNamenode locally, but some result is not my expectation. For instance using MockNamenode, 1. mkdir '/user', '/user/hive/warehouse', '/user/hadoop/test'; 2. then get null when invoke getListing of '/user'. I expect the correct result may be {'/user/hive', '/user/hadoop'}. > RBF: Fix MockNamenode bug about mocking RPC getListing and mkdir > > > Key: HDFS-14430 > URL: https://issues.apache.org/jira/browse/HDFS-14430 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-14430-HDFS-13891.001.patch > > > Some unexpected result when invoke mocking #getListing and #mkdirs in current > MockNamenode implement. > * for mock mkdirs, we do not check if parent directory exists. > * for mock getListing, some child dirs/files are not listing. > It may be cause some unexpected result and cause some unit test fail. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14430) RBF: Fix MockNamenode bug about mocking RPC getListing and mkdir
[ https://issues.apache.org/jira/browse/HDFS-14430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Xiaoqiao updated HDFS-14430: --- Attachment: HDFS-14430-HDFS-13891.001.patch Status: Patch Available (was: Open) upload v001 and fix #mkdirs and #getListing cc [~elgoiri], please take a look. * I am not sure if there are some other consideration which not check parent path. * This change will cause some unit test failed. If I missing something, please correct. Thanks. > RBF: Fix MockNamenode bug about mocking RPC getListing and mkdir > > > Key: HDFS-14430 > URL: https://issues.apache.org/jira/browse/HDFS-14430 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-14430-HDFS-13891.001.patch > > > Some unexpected result when invoke mocking #getListing and #mkdirs in current > MockNamenode implement. > * for mock mkdirs, we do not check if parent directory exists. > * for mock getListing, some child dirs/files are not listing. > It may be cause some unexpected result and cause some unit test fail. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-14430) RBF: Fix MockNamenode bug about mocking RPC getListing and mkdir
He Xiaoqiao created HDFS-14430: -- Summary: RBF: Fix MockNamenode bug about mocking RPC getListing and mkdir Key: HDFS-14430 URL: https://issues.apache.org/jira/browse/HDFS-14430 Project: Hadoop HDFS Issue Type: Sub-task Components: rbf Reporter: He Xiaoqiao Assignee: He Xiaoqiao Some unexpected result when invoke mocking #getListing and #mkdirs in current MockNamenode implement. * for mock mkdirs, we do not check if parent directory exists. * for mock getListing, some child dirs/files are not listing. It may be cause some unexpected result and cause some unit test fail. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14421) HDFS block two replicas exist in one DataNode
[ https://issues.apache.org/jira/browse/HDFS-14421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16818000#comment-16818000 ] He Xiaoqiao commented on HDFS-14421: [~yuanbo], thanks for your comments, I am still confused that why same block located more than one replicas in one datanode instance even if change VERSION files of different disks which belong to one datanode instance. looks forward to your more digging information. > HDFS block two replicas exist in one DataNode > - > > Key: HDFS-14421 > URL: https://issues.apache.org/jira/browse/HDFS-14421 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Yuanbo Liu >Priority: Major > Attachments: 326942161.log > > > We're using Hadoop-2.7.0. > There is a file in the cluster and it's replication factor is 2. Those two > replicas exist in one Datande. the fsck info is here: > {color:#707070}BP-499819267-xx.xxx.131.201-1452072365222:blk_1400651575_326942161 > len=484045 repl=2 > [DatanodeInfoWithStorage[xx.xxx.80.205:50010,DS-d321be27-cbd4-4edd-81ad-29b3d021ee82,DISK], > > DatanodeInfoWithStorage[xx.xx.80.205:50010,DS-d321be27-cbd4-4edd-81ad-29b3d021ee82,DISK]].{color} > and this is the exception from xx.xx.80.205 > {color:#707070}org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: > Replica not found for > BP-499819267-xx.xxx.131.201-1452072365222:blk_1400651575_326942161{color} > It's confusing that why NameNode doesn't update block map after exception. > What's the reason of two replicas existing in one Datande? > Hope to get your comments. Thanks in advance. > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14421) HDFS block two replicas exist in one DataNode
[ https://issues.apache.org/jira/browse/HDFS-14421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815307#comment-16815307 ] He Xiaoqiao commented on HDFS-14421: [~yuanbo] I don't think we can find the root cause only dependence with datanode's log. Any namenode log about blk_1400651575? > HDFS block two replicas exist in one DataNode > - > > Key: HDFS-14421 > URL: https://issues.apache.org/jira/browse/HDFS-14421 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Yuanbo Liu >Priority: Major > Attachments: 326942161.log > > > We're using Hadoop-2.7.0. > There is a file in the cluster and it's replication factor is 2. Those two > replicas exist in one Datande. the fsck info is here: > {color:#707070}BP-499819267-xx.xxx.131.201-1452072365222:blk_1400651575_326942161 > len=484045 repl=2 > [DatanodeInfoWithStorage[xx.xxx.80.205:50010,DS-d321be27-cbd4-4edd-81ad-29b3d021ee82,DISK], > > DatanodeInfoWithStorage[xx.xx.80.205:50010,DS-d321be27-cbd4-4edd-81ad-29b3d021ee82,DISK]].{color} > and this is the exception from xx.xx.80.205 > {color:#707070}org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: > Replica not found for > BP-499819267-xx.xxx.131.201-1452072365222:blk_1400651575_326942161{color} > It's confusing that why NameNode doesn't update block map after exception. > What's the reason of two replicas existing in one Datande? > Hope to get your comments. Thanks in advance. > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14421) HDFS block two replicas exist in one DataNode
[ https://issues.apache.org/jira/browse/HDFS-14421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815123#comment-16815123 ] He Xiaoqiao commented on HDFS-14421: Thanks [~yuanbo] for your report, it's very interesting issue, could you share more information about block blk_1400651575, it will be very helpful if collect all logs about blk_1400651575 from namenode and datanode. > HDFS block two replicas exist in one DataNode > - > > Key: HDFS-14421 > URL: https://issues.apache.org/jira/browse/HDFS-14421 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Yuanbo Liu >Priority: Major > > We're using Hadoop-2.7.0. > There is a file which replication factor is 2. Those two replicas exist in > one Datande. the fsck info is here: > {color:#707070}BP-499819267-xx.xxx.131.201-1452072365222:blk_1400651575_326942161 > len=484045 repl=2 > [DatanodeInfoWithStorage[xx.xxx.80.205:50010,DS-d321be27-cbd4-4edd-81ad-29b3d021ee82,DISK], > > DatanodeInfoWithStorage[xx.xx.80.205:50010,DS-d321be27-cbd4-4edd-81ad-29b3d021ee82,DISK]].{color} > and this is the exception from xx.xx.80.205 > {color:#707070}org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: > Replica not found for > BP-499819267-xx.xxx.131.201-1452072365222:blk_1400651575_326942161{color} > It's confusing that why NameNode doesn't update block map after exception. > What's the reason of two replicas exist in one Datande? > Hope to get anyone's comments. Thanks in advance. > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13248) RBF: Namenode need to choose block location for the client
[ https://issues.apache.org/jira/browse/HDFS-13248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815088#comment-16815088 ] He Xiaoqiao commented on HDFS-13248: Thanks [~elgoiri], I just send mail to hdfs-dev and common-dev to invite more folks involve to give some more suggestions and votes. Thanks again. > RBF: Namenode need to choose block location for the client > -- > > Key: HDFS-13248 > URL: https://issues.apache.org/jira/browse/HDFS-13248 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Weiwei Wu >Assignee: Íñigo Goiri >Priority: Major > Attachments: HDFS-13248.000.patch, HDFS-13248.001.patch, > HDFS-13248.002.patch, HDFS-13248.003.patch, HDFS-13248.004.patch, > HDFS-13248.005.patch, HDFS-Router-Data-Locality.odt, RBF Data Locality > Design.pdf, clientMachine-call-path.jpeg, debug-info-1.jpeg, debug-info-2.jpeg > > > When execute a put operation via router, the NameNode will choose block > location for the router, not for the real client. This will affect the file's > locality. > I think on both NameNode and Router, we should add a new addBlock method, or > add a parameter for the current addBlock method, to pass the real client > information. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13248) RBF: Namenode need to choose block location for the client
[ https://issues.apache.org/jira/browse/HDFS-13248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16813539#comment-16813539 ] He Xiaoqiao commented on HDFS-13248: Thanks [~elgoiri] and [~ayushtkn] {quote}Fake client hostname: I don't quite get this one.{quote} It is possible to construct fake client hostname no matter which solution we choice between changing {{ClientProtocol}} and {{RPC Framework}}, since ClientProtocol interface or Protocbuffer are totally open to client, and client could input arbitrary hostname as itself. However I believe it's under control and no more security risk even if it is fake hostname in generally. Just point the leak. Totally, I am not very clear preference for changing {{ClientProtocol}} or {{RPC}} layer now. I believe there may be less works If we adopt to changing {{RPC/IPC}} layer solution. My suggestion is we should vote one solution and catch up this feature to native as soon as possible. We still have chances to optimize it latter. This is just my personal opinion.:) > RBF: Namenode need to choose block location for the client > -- > > Key: HDFS-13248 > URL: https://issues.apache.org/jira/browse/HDFS-13248 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Weiwei Wu >Assignee: Íñigo Goiri >Priority: Major > Attachments: HDFS-13248.000.patch, HDFS-13248.001.patch, > HDFS-13248.002.patch, HDFS-13248.003.patch, HDFS-13248.004.patch, > HDFS-13248.005.patch, HDFS-Router-Data-Locality.odt, RBF Data Locality > Design.pdf, clientMachine-call-path.jpeg, debug-info-1.jpeg, debug-info-2.jpeg > > > When execute a put operation via router, the NameNode will choose block > location for the router, not for the real client. This will affect the file's > locality. > I think on both NameNode and Router, we should add a new addBlock method, or > add a parameter for the current addBlock method, to pass the real client > information. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13248) RBF: Namenode need to choose block location for the client
[ https://issues.apache.org/jira/browse/HDFS-13248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16812173#comment-16812173 ] He Xiaoqiao commented on HDFS-13248: Thanks [~elgoiri], {quote}My main concern with modifying ClientProtocol is that it requires the client itself to change. The change is backwards compatible but for it to work you need the client to be up to date. >From our experience, this is pretty challenging.{quote} The documents does not depict scope of changing detailed. Actually, we only need `modify the Namenode and the Router` rather than require change client if we push to use modifying {{ClientProtocol}} (1) All client keep to use current interface of {{ClientProtocol}}; (2) When router receive RPC request, it get client hostname firstly, then switch to invoke additional method which include parameter {{ClientMachine}} to Namenode; (3) When RPC request to Namenode, it determine to use {{clientMachine}} if not null or get client hostname by {{Server.getRemoteAddress}} if {{ClientMachine}} is null. In one word, We need to modify Namenode and Router but Client. {quote}BTW, we could do right away the one that RouterRpcServer#getBlockLocations() reorders the destinations.{quote} I agree to reorder at Router again, but I think it's not necessary if we can handle this case under this ticket. Since reorder operation may reduce QPS of router. Please correct me if there are something wrong. FYI. > RBF: Namenode need to choose block location for the client > -- > > Key: HDFS-13248 > URL: https://issues.apache.org/jira/browse/HDFS-13248 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Weiwei Wu >Assignee: Íñigo Goiri >Priority: Major > Attachments: HDFS-13248.000.patch, HDFS-13248.001.patch, > HDFS-13248.002.patch, HDFS-13248.003.patch, HDFS-13248.004.patch, > HDFS-13248.005.patch, HDFS-Router-Data-Locality.odt, RBF Data Locality > Design.pdf, clientMachine-call-path.jpeg, debug-info-1.jpeg, debug-info-2.jpeg > > > When execute a put operation via router, the NameNode will choose block > location for the router, not for the real client. This will affect the file's > locality. > I think on both NameNode and Router, we should add a new addBlock method, or > add a parameter for the current addBlock method, to pass the real client > information. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13248) RBF: Namenode need to choose block location for the client
[ https://issues.apache.org/jira/browse/HDFS-13248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16811521#comment-16811521 ] He Xiaoqiao commented on HDFS-13248: [~elgoiri],[~ayushtkn] Thanks for the document. I draft another summary approaches for solve RBF data locality. welcome furthermore comments and suggestions. [^RBF Data Locality Design.pdf] [~ayushtkn] If you have time, let's cooperate in the design document. > RBF: Namenode need to choose block location for the client > -- > > Key: HDFS-13248 > URL: https://issues.apache.org/jira/browse/HDFS-13248 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Weiwei Wu >Assignee: Íñigo Goiri >Priority: Major > Attachments: HDFS-13248.000.patch, HDFS-13248.001.patch, > HDFS-13248.002.patch, HDFS-13248.003.patch, HDFS-13248.004.patch, > HDFS-13248.005.patch, HDFS-Router-Data-Locality.odt, RBF Data Locality > Design.pdf, clientMachine-call-path.jpeg, debug-info-1.jpeg, debug-info-2.jpeg > > > When execute a put operation via router, the NameNode will choose block > location for the router, not for the real client. This will affect the file's > locality. > I think on both NameNode and Router, we should add a new addBlock method, or > add a parameter for the current addBlock method, to pass the real client > information. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13248) RBF: Namenode need to choose block location for the client
[ https://issues.apache.org/jira/browse/HDFS-13248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Xiaoqiao updated HDFS-13248: --- Attachment: RBF Data Locality Design.pdf > RBF: Namenode need to choose block location for the client > -- > > Key: HDFS-13248 > URL: https://issues.apache.org/jira/browse/HDFS-13248 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Weiwei Wu >Assignee: Íñigo Goiri >Priority: Major > Attachments: HDFS-13248.000.patch, HDFS-13248.001.patch, > HDFS-13248.002.patch, HDFS-13248.003.patch, HDFS-13248.004.patch, > HDFS-13248.005.patch, HDFS-Router-Data-Locality.odt, RBF Data Locality > Design.pdf, clientMachine-call-path.jpeg, debug-info-1.jpeg, debug-info-2.jpeg > > > When execute a put operation via router, the NameNode will choose block > location for the router, not for the real client. This will affect the file's > locality. > I think on both NameNode and Router, we should add a new addBlock method, or > add a parameter for the current addBlock method, to pass the real client > information. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13248) RBF: Namenode need to choose block location for the client
[ https://issues.apache.org/jira/browse/HDFS-13248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16809478#comment-16809478 ] He Xiaoqiao commented on HDFS-13248: Thanks [~elgoiri], [~ayushtkn] To [~elgoiri], {quote}we need a full design docs.{quote} Correct, design docs will attach later. I am not very familiar with solution `modifying the RPC protocol`, anyone would like to explain will very helpful. To [~ayushtkn], {quote}Saw this getAdditionalDatanode() having client name parameter in(need to dig in more) Wouldn't that work for us?{quote} Parameter #clientName about {{getAdditionalDatanode}} just a name tag and do not include any hostname/IP unless we change it. {code:java} this.clientName = "DFSClient_" + dfsClientConf.getTaskId() + "_" + ThreadLocalRandom.current().nextInt() + "_" + Thread.currentThread().getId(); {code} {quote}Do we need to do with scenarios like HBASE-22103?{quote} If extend protocol, we need compatible with all current interface. Please correct if something wrong. > RBF: Namenode need to choose block location for the client > -- > > Key: HDFS-13248 > URL: https://issues.apache.org/jira/browse/HDFS-13248 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Weiwei Wu >Assignee: Íñigo Goiri >Priority: Major > Attachments: HDFS-13248.000.patch, HDFS-13248.001.patch, > HDFS-13248.002.patch, HDFS-13248.003.patch, HDFS-13248.004.patch, > HDFS-13248.005.patch, clientMachine-call-path.jpeg, debug-info-1.jpeg, > debug-info-2.jpeg > > > When execute a put operation via router, the NameNode will choose block > location for the router, not for the real client. This will affect the file's > locality. > I think on both NameNode and Router, we should add a new addBlock method, or > add a parameter for the current addBlock method, to pass the real client > information. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13248) RBF: Namenode need to choose block location for the client
[ https://issues.apache.org/jira/browse/HDFS-13248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808863#comment-16808863 ] He Xiaoqiao commented on HDFS-13248: Overall, we need to extend three method in #ClientProtocol: addBlock/getAdditionalDatanode/getBlockLocations. 1. In order to avoid compatibility issues, we could just add new method as aboves with additional only parameter #clientHostname. And keep all current interface. 2. The new interface just for Router in generally, of course it can invoke by client directly, but I think the risk is under control:(1) RBF final target is disable access from DFSClient to Namenode directly rather than through Router. (2) If not disable, I think DFSClient using a fake #clientHostname will not weaken data security. Welcome any more suggestions. Based on above conditions, I has implemented quick-and-dirty prototype and run on my test env for weeks. Anyway, it is necessary to vote and get suggestions through mail-list. I would like to push that forward if not any more suggestions here after this week. > RBF: Namenode need to choose block location for the client > -- > > Key: HDFS-13248 > URL: https://issues.apache.org/jira/browse/HDFS-13248 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Weiwei Wu >Assignee: Íñigo Goiri >Priority: Major > Attachments: HDFS-13248.000.patch, HDFS-13248.001.patch, > HDFS-13248.002.patch, HDFS-13248.003.patch, HDFS-13248.004.patch, > HDFS-13248.005.patch, clientMachine-call-path.jpeg, debug-info-1.jpeg, > debug-info-2.jpeg > > > When execute a put operation via router, the NameNode will choose block > location for the router, not for the real client. This will affect the file's > locality. > I think on both NameNode and Router, we should add a new addBlock method, or > add a parameter for the current addBlock method, to pass the real client > information. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13248) RBF: Namenode need to choose block location for the client
[ https://issues.apache.org/jira/browse/HDFS-13248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16806803#comment-16806803 ] He Xiaoqiao commented on HDFS-13248: {quote}In any case, the proper approach as discussed earlier would be to be able to let know the Namenode which is the actual client. This may require some change/addition to the protocol.{quote} +1 for changing protocol, maybe have high-cost but I think it is more safe and graceful solution than add using {{favoredNodesList}} since it will change the original means and we have to solve some corner case just as [~ayushtkn] mentioned above. > RBF: Namenode need to choose block location for the client > -- > > Key: HDFS-13248 > URL: https://issues.apache.org/jira/browse/HDFS-13248 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Weiwei Wu >Assignee: Íñigo Goiri >Priority: Major > Attachments: HDFS-13248.000.patch, HDFS-13248.001.patch, > HDFS-13248.002.patch, HDFS-13248.003.patch, HDFS-13248.004.patch, > HDFS-13248.005.patch, clientMachine-call-path.jpeg, debug-info-1.jpeg, > debug-info-2.jpeg > > > When execute a put operation via router, the NameNode will choose block > location for the router, not for the real client. This will affect the file's > locality. > I think on both NameNode and Router, we should add a new addBlock method, or > add a parameter for the current addBlock method, to pass the real client > information. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14385) RBF: Optimize MiniRouterDFSCluster with optional light weight MiniDFSCluster
[ https://issues.apache.org/jira/browse/HDFS-14385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Xiaoqiao updated HDFS-14385: --- Attachment: HDFS-14385-HDFS-13891.001.patch Status: Patch Available (was: Open) Sorry for late updating. upload quick-and-dirty patch [^HDFS-14385-HDFS-13891.001.patch] and I add parameter #isMockNamenode as condition in {{MiniRouterDFSCluster}} to determine if start {{MiniDFSCluster}} or {{MockNamenode}}. will update related unit test after first reviews. > RBF: Optimize MiniRouterDFSCluster with optional light weight MiniDFSCluster > > > Key: HDFS-14385 > URL: https://issues.apache.org/jira/browse/HDFS-14385 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-14385-HDFS-13891.001.patch > > > MiniRouterDFSCluster mimic federated HDFS cluster with routers to support RBF > test, In MiniRouterDFSCluster, it starts MiniDFSCluster with complete roles > of HDFS which have significant time cost. As HDFS-14351 discussed, it is > better to provide mock MiniDFSCluster/Namenodes as one option to support some > test case and reduce time cost. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14316) RBF: Support unavailable subclusters for mount points with multiple destinations
[ https://issues.apache.org/jira/browse/HDFS-14316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16800723#comment-16800723 ] He Xiaoqiao commented on HDFS-14316: {quote}In HDFS-14316-HDFS-13891.010.patch I added a separate unit test (TestRouterFaultTolerant) which uses the MockNamenode (He Xiaoqiao you may want to take a look). {quote} Sorry for the late response and I will take time out this days to work on update MiniRouterDFSCluster ref MockNamenode, Thanks [~elgoiri] call me here. > RBF: Support unavailable subclusters for mount points with multiple > destinations > > > Key: HDFS-14316 > URL: https://issues.apache.org/jira/browse/HDFS-14316 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Íñigo Goiri >Assignee: Íñigo Goiri >Priority: Major > Attachments: HDFS-14316-HDFS-13891.000.patch, > HDFS-14316-HDFS-13891.001.patch, HDFS-14316-HDFS-13891.002.patch, > HDFS-14316-HDFS-13891.003.patch, HDFS-14316-HDFS-13891.004.patch, > HDFS-14316-HDFS-13891.005.patch, HDFS-14316-HDFS-13891.006.patch, > HDFS-14316-HDFS-13891.007.patch, HDFS-14316-HDFS-13891.008.patch, > HDFS-14316-HDFS-13891.009.patch, HDFS-14316-HDFS-13891.010.patch, > HDFS-14316-HDFS-13891.011.patch, HDFS-14316-HDFS-13891.012.patch, > HDFS-14316-HDFS-13891.013.patch > > > Currently mount points with multiple destinations (e.g., HASH_ALL) fail > writes when the destination subcluster is down. We need an option to allow > writing in other subclusters when one is down. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14385) RBF: Optimize MiniRouterDFSCluster with optional light weight MiniDFSCluster
[ https://issues.apache.org/jira/browse/HDFS-14385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16800718#comment-16800718 ] He Xiaoqiao commented on HDFS-14385: Thanks [~elgoiri] for more comments, I will work on this jira the next days. > RBF: Optimize MiniRouterDFSCluster with optional light weight MiniDFSCluster > > > Key: HDFS-14385 > URL: https://issues.apache.org/jira/browse/HDFS-14385 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > > MiniRouterDFSCluster mimic federated HDFS cluster with routers to support RBF > test, In MiniRouterDFSCluster, it starts MiniDFSCluster with complete roles > of HDFS which have significant time cost. As HDFS-14351 discussed, it is > better to provide mock MiniDFSCluster/Namenodes as one option to support some > test case and reduce time cost. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-14385) RBF: Optimize MiniRouterDFSCluster with optional light weight MiniDFSCluster
He Xiaoqiao created HDFS-14385: -- Summary: RBF: Optimize MiniRouterDFSCluster with optional light weight MiniDFSCluster Key: HDFS-14385 URL: https://issues.apache.org/jira/browse/HDFS-14385 Project: Hadoop HDFS Issue Type: Sub-task Components: rbf Reporter: He Xiaoqiao Assignee: He Xiaoqiao MiniRouterDFSCluster mimic federated HDFS cluster with routers to support RBF test, In MiniRouterDFSCluster, it starts MiniDFSCluster with complete roles of HDFS which have significant time cost. As HDFS-14351 discussed, it is better to provide mock MiniDFSCluster/Namenodes as one option to support some test case and reduce time cost. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14351) RBF: Optimize configuration item resolving for monitor namenode
[ https://issues.apache.org/jira/browse/HDFS-14351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16797013#comment-16797013 ] He Xiaoqiao commented on HDFS-14351: Thanks [~elgoiri] and [~ayushtkn] for the reviews and commit. {quote}As you have the mini cluster with the mock nodes, do you mind opening the JIRA? {quote} NP, I will create new JIRA to trace MiniRouterDFSCluster with light weight MiniDFSCluster as option for different test case. > RBF: Optimize configuration item resolving for monitor namenode > --- > > Key: HDFS-14351 > URL: https://issues.apache.org/jira/browse/HDFS-14351 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Fix For: HDFS-13891 > > Attachments: HDFS-14351-HDFS-13891.001.patch, > HDFS-14351-HDFS-13891.002.patch, HDFS-14351-HDFS-13891.003.patch, > HDFS-14351-HDFS-13891.004.patch, HDFS-14351-HDFS-13891.005.patch, > HDFS-14351-HDFS-13891.006.patch, HDFS-14351.001.patch, HDFS-14351.002.patch > > > We invoke {{configuration.get}} to resolve configuration item > `dfs.federation.router.monitor.namenode` at `Router.java`, then split the > value by comma to get nsid and nnid, it may confused users since this is not > compatible with blank space but other common parameters could do. The > following segment show example that resolve fails. > {code:java} > > dfs.federation.router.monitor.namenode > nameservice1.nn1, nameservice1.nn2 > > The identifier of the namenodes to monitor and heartbeat. > > > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14351) RBF: Optimize configuration item resolving for monitor namenode
[ https://issues.apache.org/jira/browse/HDFS-14351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16796792#comment-16796792 ] He Xiaoqiao commented on HDFS-14351: Pull [^HDFS-14351-HDFS-13891.006.patch] and test pass locally, time cost is same as [~elgoiri]. +1 for [^HDFS-14351-HDFS-13891.006.patch]. Build quick-and-dirty MiniRouterDFSCluster with MockNamenode replace MiniDFSCluster and compare setup time cost, MiniRouterDFSCluter with MockNamenode takes 2.8s vs native MiniRouterDFSCluter takes 10.2s. {quote}I think we should create a new JIRA to have a light weight MiniDFSCluster equivalent with these MockNamenodes. {quote} +1, it is not necessary for some tests to setup all roles of Cluster. As mentioned above, maybe we should build MockMiniDFSCluter replace with MockNamenodes. > RBF: Optimize configuration item resolving for monitor namenode > --- > > Key: HDFS-14351 > URL: https://issues.apache.org/jira/browse/HDFS-14351 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-14351-HDFS-13891.001.patch, > HDFS-14351-HDFS-13891.002.patch, HDFS-14351-HDFS-13891.003.patch, > HDFS-14351-HDFS-13891.004.patch, HDFS-14351-HDFS-13891.005.patch, > HDFS-14351-HDFS-13891.006.patch, HDFS-14351.001.patch, HDFS-14351.002.patch > > > We invoke {{configuration.get}} to resolve configuration item > `dfs.federation.router.monitor.namenode` at `Router.java`, then split the > value by comma to get nsid and nnid, it may confused users since this is not > compatible with blank space but other common parameters could do. The > following segment show example that resolve fails. > {code:java} > > dfs.federation.router.monitor.namenode > nameservice1.nn1, nameservice1.nn2 > > The identifier of the namenodes to monitor and heartbeat. > > > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14374) Expose total number of delegation tokens in AbstractDelegationTokenSecretManager
[ https://issues.apache.org/jira/browse/HDFS-14374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16794154#comment-16794154 ] He Xiaoqiao commented on HDFS-14374: [~crh] Thanks for working on this. just one suggestion, we could expose the number as one metric, if that it will be more convenient to monitor. FYI. > Expose total number of delegation tokens in > AbstractDelegationTokenSecretManager > > > Key: HDFS-14374 > URL: https://issues.apache.org/jira/browse/HDFS-14374 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: CR Hota >Assignee: CR Hota >Priority: Major > Attachments: HDFS-14374.001.patch, HDFS-14374.002.patch > > > AbstractDelegationTokenSecretManager should expose total number of active > delegation tokens for specific implementations to track for observability. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14351) RBF: Optimize configuration item resolving for monitor namenode
[ https://issues.apache.org/jira/browse/HDFS-14351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16793280#comment-16793280 ] He Xiaoqiao commented on HDFS-14351: +1, [^HDFS-14351-HDFS-13891.005.patch]. Thanks, it's good work for unit test. [~elgoiri] you could assign the task to yourself at any time if necessary. > RBF: Optimize configuration item resolving for monitor namenode > --- > > Key: HDFS-14351 > URL: https://issues.apache.org/jira/browse/HDFS-14351 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-14351-HDFS-13891.001.patch, > HDFS-14351-HDFS-13891.002.patch, HDFS-14351-HDFS-13891.003.patch, > HDFS-14351-HDFS-13891.004.patch, HDFS-14351-HDFS-13891.005.patch, > HDFS-14351.001.patch, HDFS-14351.002.patch > > > We invoke {{configuration.get}} to resolve configuration item > `dfs.federation.router.monitor.namenode` at `Router.java`, then split the > value by comma to get nsid and nnid, it may confused users since this is not > compatible with blank space but other common parameters could do. The > following segment show example that resolve fails. > {code:java} > > dfs.federation.router.monitor.namenode > nameservice1.nn1, nameservice1.nn2 > > The identifier of the namenodes to monitor and heartbeat. > > > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14351) RBF: Optimize configuration item resolving for monitor namenode
[ https://issues.apache.org/jira/browse/HDFS-14351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16792641#comment-16792641 ] He Xiaoqiao commented on HDFS-14351: fix checkstyle warn in [^HDFS-14351-HDFS-13891.004.patch] > RBF: Optimize configuration item resolving for monitor namenode > --- > > Key: HDFS-14351 > URL: https://issues.apache.org/jira/browse/HDFS-14351 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-14351-HDFS-13891.001.patch, > HDFS-14351-HDFS-13891.002.patch, HDFS-14351-HDFS-13891.003.patch, > HDFS-14351-HDFS-13891.004.patch, HDFS-14351.001.patch, HDFS-14351.002.patch > > > We invoke {{configuration.get}} to resolve configuration item > `dfs.federation.router.monitor.namenode` at `Router.java`, then split the > value by comma to get nsid and nnid, it may confused users since this is not > compatible with blank space but other common parameters could do. The > following segment show example that resolve fails. > {code:java} > > dfs.federation.router.monitor.namenode > nameservice1.nn1, nameservice1.nn2 > > The identifier of the namenodes to monitor and heartbeat. > > > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14351) RBF: Optimize configuration item resolving for monitor namenode
[ https://issues.apache.org/jira/browse/HDFS-14351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Xiaoqiao updated HDFS-14351: --- Attachment: HDFS-14351-HDFS-13891.004.patch > RBF: Optimize configuration item resolving for monitor namenode > --- > > Key: HDFS-14351 > URL: https://issues.apache.org/jira/browse/HDFS-14351 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-14351-HDFS-13891.001.patch, > HDFS-14351-HDFS-13891.002.patch, HDFS-14351-HDFS-13891.003.patch, > HDFS-14351-HDFS-13891.004.patch, HDFS-14351.001.patch, HDFS-14351.002.patch > > > We invoke {{configuration.get}} to resolve configuration item > `dfs.federation.router.monitor.namenode` at `Router.java`, then split the > value by comma to get nsid and nnid, it may confused users since this is not > compatible with blank space but other common parameters could do. The > following segment show example that resolve fails. > {code:java} > > dfs.federation.router.monitor.namenode > nameservice1.nn1, nameservice1.nn2 > > The identifier of the namenodes to monitor and heartbeat. > > > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14351) RBF: Optimize configuration item resolving for monitor namenode
[ https://issues.apache.org/jira/browse/HDFS-14351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16792563#comment-16792563 ] He Xiaoqiao commented on HDFS-14351: [~elgoiri] Thanks, upload [^HDFS-14351-HDFS-13891.003.patch] to test configuration monitor namenodes only. > RBF: Optimize configuration item resolving for monitor namenode > --- > > Key: HDFS-14351 > URL: https://issues.apache.org/jira/browse/HDFS-14351 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-14351-HDFS-13891.001.patch, > HDFS-14351-HDFS-13891.002.patch, HDFS-14351-HDFS-13891.003.patch, > HDFS-14351.001.patch, HDFS-14351.002.patch > > > We invoke {{configuration.get}} to resolve configuration item > `dfs.federation.router.monitor.namenode` at `Router.java`, then split the > value by comma to get nsid and nnid, it may confused users since this is not > compatible with blank space but other common parameters could do. The > following segment show example that resolve fails. > {code:java} > > dfs.federation.router.monitor.namenode > nameservice1.nn1, nameservice1.nn2 > > The identifier of the namenodes to monitor and heartbeat. > > > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14351) RBF: Optimize configuration item resolving for monitor namenode
[ https://issues.apache.org/jira/browse/HDFS-14351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Xiaoqiao updated HDFS-14351: --- Attachment: HDFS-14351-HDFS-13891.003.patch > RBF: Optimize configuration item resolving for monitor namenode > --- > > Key: HDFS-14351 > URL: https://issues.apache.org/jira/browse/HDFS-14351 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-14351-HDFS-13891.001.patch, > HDFS-14351-HDFS-13891.002.patch, HDFS-14351-HDFS-13891.003.patch, > HDFS-14351.001.patch, HDFS-14351.002.patch > > > We invoke {{configuration.get}} to resolve configuration item > `dfs.federation.router.monitor.namenode` at `Router.java`, then split the > value by comma to get nsid and nnid, it may confused users since this is not > compatible with blank space but other common parameters could do. The > following segment show example that resolve fails. > {code:java} > > dfs.federation.router.monitor.namenode > nameservice1.nn1, nameservice1.nn2 > > The identifier of the namenodes to monitor and heartbeat. > > > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14351) RBF: Optimize configuration item resolving for monitor namenode
[ https://issues.apache.org/jira/browse/HDFS-14351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791494#comment-16791494 ] He Xiaoqiao commented on HDFS-14351: Attached [^HDFS-14351-HDFS-13891.002.patch] following comments. [~elgoiri], I am not very familiar with #MiniRouterDFSCluster, please help to review, if something is wrong correct me. Thanks again. > RBF: Optimize configuration item resolving for monitor namenode > --- > > Key: HDFS-14351 > URL: https://issues.apache.org/jira/browse/HDFS-14351 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-14351-HDFS-13891.001.patch, > HDFS-14351-HDFS-13891.002.patch, HDFS-14351.001.patch, HDFS-14351.002.patch > > > We invoke {{configuration.get}} to resolve configuration item > `dfs.federation.router.monitor.namenode` at `Router.java`, then split the > value by comma to get nsid and nnid, it may confused users since this is not > compatible with blank space but other common parameters could do. The > following segment show example that resolve fails. > {code:java} > > dfs.federation.router.monitor.namenode > nameservice1.nn1, nameservice1.nn2 > > The identifier of the namenodes to monitor and heartbeat. > > > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14351) RBF: Optimize configuration item resolving for monitor namenode
[ https://issues.apache.org/jira/browse/HDFS-14351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Xiaoqiao updated HDFS-14351: --- Attachment: HDFS-14351-HDFS-13891.002.patch > RBF: Optimize configuration item resolving for monitor namenode > --- > > Key: HDFS-14351 > URL: https://issues.apache.org/jira/browse/HDFS-14351 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-14351-HDFS-13891.001.patch, > HDFS-14351-HDFS-13891.002.patch, HDFS-14351.001.patch, HDFS-14351.002.patch > > > We invoke {{configuration.get}} to resolve configuration item > `dfs.federation.router.monitor.namenode` at `Router.java`, then split the > value by comma to get nsid and nnid, it may confused users since this is not > compatible with blank space but other common parameters could do. The > following segment show example that resolve fails. > {code:java} > > dfs.federation.router.monitor.namenode > nameservice1.nn1, nameservice1.nn2 > > The identifier of the namenodes to monitor and heartbeat. > > > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14351) RBF: Optimize configuration item resolving for monitor namenode
[ https://issues.apache.org/jira/browse/HDFS-14351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791344#comment-16791344 ] He Xiaoqiao commented on HDFS-14351: I will add new test to for RouterNamenodeMonitoringConfig later. Thanks [~elgoiri]. > RBF: Optimize configuration item resolving for monitor namenode > --- > > Key: HDFS-14351 > URL: https://issues.apache.org/jira/browse/HDFS-14351 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-14351-HDFS-13891.001.patch, HDFS-14351.001.patch, > HDFS-14351.002.patch > > > We invoke {{configuration.get}} to resolve configuration item > `dfs.federation.router.monitor.namenode` at `Router.java`, then split the > value by comma to get nsid and nnid, it may confused users since this is not > compatible with blank space but other common parameters could do. The > following segment show example that resolve fails. > {code:java} > > dfs.federation.router.monitor.namenode > nameservice1.nn1, nameservice1.nn2 > > The identifier of the namenodes to monitor and heartbeat. > > > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14351) RBF: Optimize configuration item resolving for monitor namenode
[ https://issues.apache.org/jira/browse/HDFS-14351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16790332#comment-16790332 ] He Xiaoqiao commented on HDFS-14351: Thanks [~elgoiri] for reviewing. [^HDFS-14351-HDFS-13891.001.patch] rebased branch HDFS-13891 and just change configuration settings tiny about #TestRouterNamenodeMonitoring. Do we need another more unit test for this configuration item since it has verified nsid and nnid in #TestRouterNamenodeMonitoring? > RBF: Optimize configuration item resolving for monitor namenode > --- > > Key: HDFS-14351 > URL: https://issues.apache.org/jira/browse/HDFS-14351 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-14351-HDFS-13891.001.patch, HDFS-14351.001.patch, > HDFS-14351.002.patch > > > We invoke {{configuration.get}} to resolve configuration item > `dfs.federation.router.monitor.namenode` at `Router.java`, then split the > value by comma to get nsid and nnid, it may confused users since this is not > compatible with blank space but other common parameters could do. The > following segment show example that resolve fails. > {code:java} > > dfs.federation.router.monitor.namenode > nameservice1.nn1, nameservice1.nn2 > > The identifier of the namenodes to monitor and heartbeat. > > > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14351) RBF: Optimize configuration item resolving for monitor namenode
[ https://issues.apache.org/jira/browse/HDFS-14351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Xiaoqiao updated HDFS-14351: --- Attachment: HDFS-14351-HDFS-13891.001.patch > RBF: Optimize configuration item resolving for monitor namenode > --- > > Key: HDFS-14351 > URL: https://issues.apache.org/jira/browse/HDFS-14351 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-14351-HDFS-13891.001.patch, HDFS-14351.001.patch, > HDFS-14351.002.patch > > > We invoke {{configuration.get}} to resolve configuration item > `dfs.federation.router.monitor.namenode` at `Router.java`, then split the > value by comma to get nsid and nnid, it may confused users since this is not > compatible with blank space but other common parameters could do. The > following segment show example that resolve fails. > {code:java} > > dfs.federation.router.monitor.namenode > nameservice1.nn1, nameservice1.nn2 > > The identifier of the namenodes to monitor and heartbeat. > > > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14351) RBF: Optimize configuration item resolving for monitor namenode
[ https://issues.apache.org/jira/browse/HDFS-14351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Xiaoqiao updated HDFS-14351: --- Attachment: HDFS-14351.002.patch > RBF: Optimize configuration item resolving for monitor namenode > --- > > Key: HDFS-14351 > URL: https://issues.apache.org/jira/browse/HDFS-14351 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-14351.001.patch, HDFS-14351.002.patch > > > We invoke {{configuration.get}} to resolve configuration item > `dfs.federation.router.monitor.namenode` at `Router.java`, then split the > value by comma to get nsid and nnid, it may confused users since this is not > compatible with blank space but other common parameters could do. The > following segment show example that resolve fails. > {code:java} > > dfs.federation.router.monitor.namenode > nameservice1.nn1, nameservice1.nn2 > > The identifier of the namenodes to monitor and heartbeat. > > > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14351) RBF: Optimize configuration item resolving for monitor namenode
[ https://issues.apache.org/jira/browse/HDFS-14351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Xiaoqiao updated HDFS-14351: --- Attachment: (was: HDFS-14351.002.patch) > RBF: Optimize configuration item resolving for monitor namenode > --- > > Key: HDFS-14351 > URL: https://issues.apache.org/jira/browse/HDFS-14351 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-14351.001.patch > > > We invoke {{configuration.get}} to resolve configuration item > `dfs.federation.router.monitor.namenode` at `Router.java`, then split the > value by comma to get nsid and nnid, it may confused users since this is not > compatible with blank space but other common parameters could do. The > following segment show example that resolve fails. > {code:java} > > dfs.federation.router.monitor.namenode > nameservice1.nn1, nameservice1.nn2 > > The identifier of the namenodes to monitor and heartbeat. > > > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14351) RBF: Optimize configuration item resolving for monitor namenode
[ https://issues.apache.org/jira/browse/HDFS-14351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16789470#comment-16789470 ] He Xiaoqiao commented on HDFS-14351: attach another patch [^HDFS-14351.002.patch] based on branch trunk. > RBF: Optimize configuration item resolving for monitor namenode > --- > > Key: HDFS-14351 > URL: https://issues.apache.org/jira/browse/HDFS-14351 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-14351.001.patch, HDFS-14351.002.patch > > > We invoke {{configuration.get}} to resolve configuration item > `dfs.federation.router.monitor.namenode` at `Router.java`, then split the > value by comma to get nsid and nnid, it may confused users since this is not > compatible with blank space but other common parameters could do. The > following segment show example that resolve fails. > {code:java} > > dfs.federation.router.monitor.namenode > nameservice1.nn1, nameservice1.nn2 > > The identifier of the namenodes to monitor and heartbeat. > > > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14351) RBF: Optimize configuration item resolving for monitor namenode
[ https://issues.apache.org/jira/browse/HDFS-14351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Xiaoqiao updated HDFS-14351: --- Attachment: HDFS-14351.002.patch > RBF: Optimize configuration item resolving for monitor namenode > --- > > Key: HDFS-14351 > URL: https://issues.apache.org/jira/browse/HDFS-14351 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-14351.001.patch, HDFS-14351.002.patch > > > We invoke {{configuration.get}} to resolve configuration item > `dfs.federation.router.monitor.namenode` at `Router.java`, then split the > value by comma to get nsid and nnid, it may confused users since this is not > compatible with blank space but other common parameters could do. The > following segment show example that resolve fails. > {code:java} > > dfs.federation.router.monitor.namenode > nameservice1.nn1, nameservice1.nn2 > > The identifier of the namenodes to monitor and heartbeat. > > > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14351) RBF: Optimize configuration item resolving for monitor namenode
[ https://issues.apache.org/jira/browse/HDFS-14351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Xiaoqiao updated HDFS-14351: --- Summary: RBF: Optimize configuration item resolving for monitor namenode (was: Optimize configuration item resolving for monitor namenode) > RBF: Optimize configuration item resolving for monitor namenode > --- > > Key: HDFS-14351 > URL: https://issues.apache.org/jira/browse/HDFS-14351 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > > We invoke {{configuration.get}} to resolve configuration item > `dfs.federation.router.monitor.namenode` at `Router.java`, then split the > value by comma to get nsid and nnid, it may confused users since this is not > compatible with blank space but other common parameters could do. The > following segment show example that resolve fails. > {code:java} > > dfs.federation.router.monitor.namenode > nameservice1.nn1, nameservice1.nn2 > > The identifier of the namenodes to monitor and heartbeat. > > > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-14351) Optimize configuration item resolving for monitor namenode
He Xiaoqiao created HDFS-14351: -- Summary: Optimize configuration item resolving for monitor namenode Key: HDFS-14351 URL: https://issues.apache.org/jira/browse/HDFS-14351 Project: Hadoop HDFS Issue Type: Sub-task Components: rbf Reporter: He Xiaoqiao Assignee: He Xiaoqiao We invoke {{configuration.get}} to resolve configuration item `dfs.federation.router.monitor.namenode` at `Router.java`, then split the value by comma to get nsid and nnid, it may confused users since this is not compatible with blank space but other common parameters could do. The following segment show example that resolve fails. {code:java} dfs.federation.router.monitor.namenode nameservice1.nn1, nameservice1.nn2 The identifier of the namenodes to monitor and heartbeat. {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14351) RBF: Optimize configuration item resolving for monitor namenode
[ https://issues.apache.org/jira/browse/HDFS-14351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Xiaoqiao updated HDFS-14351: --- Attachment: HDFS-14351.001.patch Status: Patch Available (was: Open) I attached a quick-and-dirty demonstration patch without unit test. Please correct me if something wrong. > RBF: Optimize configuration item resolving for monitor namenode > --- > > Key: HDFS-14351 > URL: https://issues.apache.org/jira/browse/HDFS-14351 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-14351.001.patch > > > We invoke {{configuration.get}} to resolve configuration item > `dfs.federation.router.monitor.namenode` at `Router.java`, then split the > value by comma to get nsid and nnid, it may confused users since this is not > compatible with blank space but other common parameters could do. The > following segment show example that resolve fails. > {code:java} > > dfs.federation.router.monitor.namenode > nameservice1.nn1, nameservice1.nn2 > > The identifier of the namenodes to monitor and heartbeat. > > > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13532) RBF: Adding security
[ https://issues.apache.org/jira/browse/HDFS-13532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16787509#comment-16787509 ] He Xiaoqiao commented on HDFS-13532: [~elgoiri], [~crh]. Thanks for very helpful suggestions. The migration steps are very clear and helpful! I have to estimate the whole cost to migrate to RBF completely since all our default filesystem is viewfs://nameservice/(include hivemeta, and user applications) and very massive scale, so it will bring high cost to switch hdfs://nameservice. I would like to share information in time and thanks for your help again. > RBF: Adding security > > > Key: HDFS-13532 > URL: https://issues.apache.org/jira/browse/HDFS-13532 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Íñigo Goiri >Assignee: CR Hota >Priority: Major > Attachments: RBF _ Security delegation token thoughts.pdf, RBF _ > Security delegation token thoughts_updated.pdf, RBF _ Security delegation > token thoughts_updated_2.pdf, RBF-DelegationToken-Approach1b.pdf, RBF_ > Security delegation token thoughts_updated_3.pdf, Security_for_Router-based > Federation_design_doc.pdf > > > HDFS Router based federation should support security. This includes > authentication and delegation tokens. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13248) RBF: Namenode need to choose block location for the client
[ https://issues.apache.org/jira/browse/HDFS-13248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786417#comment-16786417 ] He Xiaoqiao commented on HDFS-13248: Thanks [~elgoiri], [^HDFS-13248.005.patch] seems a trick but valid solution. However I don't think this approach could apply to #getBlockLocations, which interface define show as following, looks very different from #addBlock. {code:java} LocatedBlocks getBlockLocations(String src, long offset, long length) throws IOException; {code} looking forward to more graceful approach. and I would like to join in and push this issue forward. > RBF: Namenode need to choose block location for the client > -- > > Key: HDFS-13248 > URL: https://issues.apache.org/jira/browse/HDFS-13248 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Weiwei Wu >Assignee: Íñigo Goiri >Priority: Major > Attachments: HDFS-13248.000.patch, HDFS-13248.001.patch, > HDFS-13248.002.patch, HDFS-13248.003.patch, HDFS-13248.004.patch, > HDFS-13248.005.patch, clientMachine-call-path.jpeg, debug-info-1.jpeg, > debug-info-2.jpeg > > > When execute a put operation via router, the NameNode will choose block > location for the router, not for the real client. This will affect the file's > locality. > I think on both NameNode and Router, we should add a new addBlock method, or > add a parameter for the current addBlock method, to pass the real client > information. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13532) RBF: Adding security
[ https://issues.apache.org/jira/browse/HDFS-13532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786382#comment-16786382 ] He Xiaoqiao commented on HDFS-13532: [~crh],[~elgoiri],[~brahmareddy], really appreciate your feedback. Basically I am concerning 2 points in one word: * (1) how to gray upgrade HDFS to support RBF with security feature. * (2) performance cost using ZKDelegationTokenSecretManagerImpl. And It is clear about (2) performance of ZKDelegationTokenSecretManagerImpl with my colleague's help. it is OK for me that >5K QPS. I do not understand about gray upgrade completely. First of all, I would like to share ideal plan for me to upgrade RBF smoothly: (1) HDFS build on Federation + ViewFS now. (2) It's better for me to rolling upgrade Client rather than switch to RBF once time. [~elgoiri] and [~crh] both mentioned solution with 'Router nameservice' as following step: * (1) update YARN(RM/NM) configuration within new router nameservice; * (2) rolling client to support RBF; * (3) updete YARN(RM/NM) configuration which include router nameservice config only; IIUC, this solution will not solve delegation token issue, since client obtains DT from router only after step (2) and submit job normally, however executor will fail when request to NameNode due to DT checks fail, since for some compute engine (for instance MR) it merges client and NM configuration together, then executor still request to NameNode directly without proper DT. To [~crh] {quote}jobs try to access something like hdfs://router-nameservice/mydata, rm will use the same filesystem i.e. hdfs://router-nameservice to renew tokens {quote} I think it need to enhance compute engine, may be more high-cost. {quote}Routers not having security feature was a big hindrance in adopting it for any secure use case irrespective of scale. {quote} security feature is also very important for me, I try my best to dig solution that can transmit to RBF smoothly. Thanks [~crh], [~elgoiri] again. > RBF: Adding security > > > Key: HDFS-13532 > URL: https://issues.apache.org/jira/browse/HDFS-13532 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Íñigo Goiri >Assignee: CR Hota >Priority: Major > Attachments: RBF _ Security delegation token thoughts.pdf, RBF _ > Security delegation token thoughts_updated.pdf, RBF _ Security delegation > token thoughts_updated_2.pdf, RBF-DelegationToken-Approach1b.pdf, RBF_ > Security delegation token thoughts_updated_3.pdf, Security_for_Router-based > Federation_design_doc.pdf > > > HDFS Router based federation should support security. This includes > authentication and delegation tokens. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13248) RBF: Namenode need to choose block location for the client
[ https://issues.apache.org/jira/browse/HDFS-13248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16785674#comment-16785674 ] He Xiaoqiao commented on HDFS-13248: [~elgoiri] I think this is also issue about read operation, since namenode gets router hostname/ip rather than client information so it could not sort block locations correctly as expect, right? > RBF: Namenode need to choose block location for the client > -- > > Key: HDFS-13248 > URL: https://issues.apache.org/jira/browse/HDFS-13248 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Weiwei Wu >Assignee: Íñigo Goiri >Priority: Major > Attachments: HDFS-13248.000.patch, HDFS-13248.001.patch, > HDFS-13248.002.patch, HDFS-13248.003.patch, HDFS-13248.004.patch, > HDFS-13248.005.patch, clientMachine-call-path.jpeg, debug-info-1.jpeg, > debug-info-2.jpeg > > > When execute a put operation via router, the NameNode will choose block > location for the router, not for the real client. This will affect the file's > locality. > I think on both NameNode and Router, we should add a new addBlock method, or > add a parameter for the current addBlock method, to pass the real client > information. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13532) RBF: Adding security
[ https://issues.apache.org/jira/browse/HDFS-13532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784726#comment-16784726 ] He Xiaoqiao commented on HDFS-13532: Thanks [~brahmareddy] and [~elgoiri] for your detailed comments. To [~elgoiri], {quote}If the job is submitted against the Router, then the job can only access data through RBF. However, I think this is OK; as I mentioned before you could still have jobs that query the NameNodes directly.{quote} IIUC, client/jobsubmitter and executors have to switch to RBF in the same time, otherwise, delegation token check will not pass since they are not matching distributed from namenode and router. on another side, majority compute engine run on yarn rely on RM to renew token, So In on word, it looks that there are no graceful solution to support rolling upgrade, for instance rolling upgrade client to RBF, then YARN(RM/NM)? {quote}For the RM itself, you can transition it from using RBF or not whenever you want. {quote} As mentioned above, I am confused about RM using RBF or not. your more explains is greatly appreciated. To [~brahmareddy] {quote}Did you try it..? do you've failed logs..?{quote} I am sorry that no time to test this case now, I will offer more info in time when cover this scenario. Thanks again. > RBF: Adding security > > > Key: HDFS-13532 > URL: https://issues.apache.org/jira/browse/HDFS-13532 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Íñigo Goiri >Assignee: CR Hota >Priority: Major > Attachments: RBF _ Security delegation token thoughts.pdf, RBF _ > Security delegation token thoughts_updated.pdf, RBF _ Security delegation > token thoughts_updated_2.pdf, RBF-DelegationToken-Approach1b.pdf, RBF_ > Security delegation token thoughts_updated_3.pdf, Security_for_Router-based > Federation_design_doc.pdf > > > HDFS Router based federation should support security. This includes > authentication and delegation tokens. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13532) RBF: Adding security
[ https://issues.apache.org/jira/browse/HDFS-13532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784327#comment-16784327 ] He Xiaoqiao commented on HDFS-13532: Thanks for the great works here, I have followed RBF recently and sorry for no-timely questions. I found that branch-HDFS-13891 implement Approach#1, and it works well in my test env. And some confusion about Approach #1: 1. any suggestions or guide for upgrade gracefully? Approach #1 based on two point: (1) router takes over delegation tokens management from namenodes at all, (2) namenode only maintain delegation token request from router. right? IIUC, maybe there are no graceful gray solution to upgrade clients. Consider about one job submit to YARN from client which is upgrade to support RBF, and all delegation tokens are distributed from router, but if yarn still not upgrade, all executors will authenticate fail to namenode since delegation token is not matching. Of course this issue is also true if upgrade yarn first then client. 2. any performance test results about zookeeper which manage massive delegation tokens? I am not very familiar with zookeeper, and if there are obvious performance differences between zookeeper and memory at namenode before RBF. If no evaluation, I would like to test it later. 3. if znode number impact performance of delegation token request in zookeeper? delegation token request ops is very high for a large cluster, for instance, 1000K jobs every day and the maximum lifetime for which a delegation token is valid set default by 7 days, in the worst case, it will backlog 7000K znodes at all. some risk for more large cluster? 4. any plan to support different approach and let user to choice? Please correct me if there are something wrong. Thanks again. > RBF: Adding security > > > Key: HDFS-13532 > URL: https://issues.apache.org/jira/browse/HDFS-13532 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Íñigo Goiri >Assignee: CR Hota >Priority: Major > Attachments: RBF _ Security delegation token thoughts.pdf, RBF _ > Security delegation token thoughts_updated.pdf, RBF _ Security delegation > token thoughts_updated_2.pdf, RBF-DelegationToken-Approach1b.pdf, RBF_ > Security delegation token thoughts_updated_3.pdf, Security_for_Router-based > Federation_design_doc.pdf > > > HDFS Router based federation should support security. This includes > authentication and delegation tokens. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-14332) NetworkTopology#getWeightUsingNetworkLocation return unexpected result
He Xiaoqiao created HDFS-14332: -- Summary: NetworkTopology#getWeightUsingNetworkLocation return unexpected result Key: HDFS-14332 URL: https://issues.apache.org/jira/browse/HDFS-14332 Project: Hadoop HDFS Issue Type: Bug Components: namenode Reporter: He Xiaoqiao Assignee: He Xiaoqiao Consider the following scenario: 1. there are 4 slaves and topology like: Rack: /IDC/RACK1 hostname1 hostname2 Rack: /IDC/RACK2 hostname3 hostname4 2. Reader from hostname1, and calculate weight between reader and [hostname1, hostname3, hostname4] by #getWeight, and their corresponding values are [0,4,4] 3. Reader from client which is not in the topology, and in the same IDC but in none rack of the topology, and calculate weight between reader and [hostname1, hostname3, hostname4] by #getWeightUsingNetworkLocation, and their corresponding values are [2,2,2] 4. Other different Reader can get the similar results. The weight result for case #3 is obviously not the expected value, the truth is [4,4,4]. this issue may cause reader not really following arrange: local -> local rack -> remote rack. After dig the detailed implement, the root cause is #getWeightUsingNetworkLocation only calculate distance between Racks rather than hosts. I think we should add constant 2 to correct the weight of #getWeightUsingNetworkLocation. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14332) NetworkTopology#getWeightUsingNetworkLocation return unexpected result
[ https://issues.apache.org/jira/browse/HDFS-14332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Xiaoqiao updated HDFS-14332: --- Attachment: HDFS-14332.001.patch Status: Patch Available (was: Open) > NetworkTopology#getWeightUsingNetworkLocation return unexpected result > -- > > Key: HDFS-14332 > URL: https://issues.apache.org/jira/browse/HDFS-14332 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-14332.001.patch > > > Consider the following scenario: > 1. there are 4 slaves and topology like: > Rack: /IDC/RACK1 >hostname1 >hostname2 > Rack: /IDC/RACK2 >hostname3 >hostname4 > 2. Reader from hostname1, and calculate weight between reader and [hostname1, > hostname3, hostname4] by #getWeight, and their corresponding values are > [0,4,4] > 3. Reader from client which is not in the topology, and in the same IDC but > in none rack of the topology, and calculate weight between reader and > [hostname1, hostname3, hostname4] by #getWeightUsingNetworkLocation, and > their corresponding values are [2,2,2] > 4. Other different Reader can get the similar results. > The weight result for case #3 is obviously not the expected value, the truth > is [4,4,4]. this issue may cause reader not really following arrange: local > -> local rack -> remote rack. > After dig the detailed implement, the root cause is > #getWeightUsingNetworkLocation only calculate distance between Racks rather > than hosts. > I think we should add constant 2 to correct the weight of > #getWeightUsingNetworkLocation. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14305) Serial number in BlockTokenSecretManager could overlap between different namenodes
[ https://issues.apache.org/jira/browse/HDFS-14305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16781343#comment-16781343 ] He Xiaoqiao commented on HDFS-14305: [~xkrogen],[~csun] After check all dev branches containing HDFS-6440 (branch-3.0, branch-3.1, branch-3.2), it can cherry-pick directly, and do not need to add new patches in my opinion. FYI. > Serial number in BlockTokenSecretManager could overlap between different > namenodes > -- > > Key: HDFS-14305 > URL: https://issues.apache.org/jira/browse/HDFS-14305 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode, security >Reporter: Chao Sun >Assignee: He Xiaoqiao >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-14305.001.patch, HDFS-14305.002.patch, > HDFS-14305.003.patch, HDFS-14305.004.patch, HDFS-14305.005.patch, > HDFS-14305.006.patch > > > Currently, a {{BlockTokenSecretManager}} starts with a random integer as the > initial serial number, and then use this formula to rotate it: > {code:java} > this.intRange = Integer.MAX_VALUE / numNNs; > this.nnRangeStart = intRange * nnIndex; > this.serialNo = (this.serialNo % intRange) + (nnRangeStart); > {code} > while {{numNNs}} is the total number of NameNodes in the cluster, and > {{nnIndex}} is the index of the current NameNode specified in the > configuration {{dfs.ha.namenodes.}}. > However, with this approach, different NameNode could have overlapping ranges > for serial number. For simplicity, let's assume {{Integer.MAX_VALUE}} is 100, > and we have 2 NameNodes {{nn1}} and {{nn2}} in configuration. Then the ranges > for these two are: > {code} > nn1 -> [-49, 49] > nn2 -> [1, 99] > {code} > This is because the initial serial number could be any negative integer. > Moreover, when the keys are updated, the serial number will again be updated > with the formula: > {code} > this.serialNo = (this.serialNo % intRange) + (nnRangeStart); > {code} > which means the new serial number could be updated to a range that belongs to > a different NameNode, thus increasing the chance of collision again. > When the collision happens, DataNodes could overwrite an existing key which > will cause clients to fail because of {{InvalidToken}} error. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14314) fullBlockReportLeaseId should be reset after registering to NN
[ https://issues.apache.org/jira/browse/HDFS-14314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16781326#comment-16781326 ] He Xiaoqiao commented on HDFS-14314: Thanks [~starphin], +1, [^HDFS-14314-trunk.005.patch] LGTM. I run failed unit test (exclude TestWebHdfsTimeouts,TestJournalNodeSync) locally and all passed. I also agree that test failures (TestWebHdfsTimeouts,TestJournalNodeSync ) are not related. Pending another one reviews. > fullBlockReportLeaseId should be reset after registering to NN > -- > > Key: HDFS-14314 > URL: https://issues.apache.org/jira/browse/HDFS-14314 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.8.4 > Environment: > > >Reporter: star >Priority: Critical > Fix For: 2.8.4 > > Attachments: HDFS-14314-trunk.001.patch, HDFS-14314-trunk.001.patch, > HDFS-14314-trunk.002.patch, HDFS-14314-trunk.003.patch, > HDFS-14314-trunk.004.patch, HDFS-14314-trunk.005.patch, HDFS-14314.0.patch, > HDFS-14314.2.patch, HDFS-14314.patch > > > since HDFS-7923 ,to rate-limit DN block report, DN will ask for a full > block lease id from active NN before sending full block to NN. Then DN will > send full block report together with lease id. If the lease id is invalid, NN > will reject the full block report and log "not in the pending set". > In a case when DN is doing full block reporting while NN is restarted. > It happens that DN will later send a full block report with lease id > ,acquired from previous NN instance, which is invalid to the new NN instance. > Though DN recognized the new NN instance by heartbeat and reregister itself, > it did not reset the lease id from previous instance. > The issuse may cause DNs to temporarily go dead, making it unsafe to > restart NN especially in hadoop cluster which has large amount of DNs. > HDFS-12914 reported the issue without any clues why it occurred and remain > unsolved. > To make it clear, look at code below. We take it from method > offerService of class BPServiceActor. We eliminate some code to focus on > current issue. fullBlockReportLeaseId is a local variable to hold lease id > from NN. Exceptions will occur at blockReport call when NN restarting, which > will be caught by catch block in while loop. Thus fullBlockReportLeaseId will > not be set to 0. After NN restarted, DN will send full block report which > will be rejected by the new NN instance. DN will never send full block report > until the next full block report schedule, about an hour later. > Solution is simple, just reset fullBlockReportLeaseId to 0 after any > exception or after registering to NN. Thus it will ask for a valid > fullBlockReportLeaseId from new NN instance. > {code:java} > private void offerService() throws Exception { > long fullBlockReportLeaseId = 0; > // > // Now loop for a long time > // > while (shouldRun()) { > try { > final long startTime = scheduler.monotonicNow(); > // > // Every so often, send heartbeat or block-report > // > final boolean sendHeartbeat = scheduler.isHeartbeatDue(startTime); > HeartbeatResponse resp = null; > if (sendHeartbeat) { > > boolean requestBlockReportLease = (fullBlockReportLeaseId == 0) && > scheduler.isBlockReportDue(startTime); > scheduler.scheduleNextHeartbeat(); > if (!dn.areHeartbeatsDisabledForTests()) { > resp = sendHeartBeat(requestBlockReportLease); > assert resp != null; > if (resp.getFullBlockReportLeaseId() != 0) { > if (fullBlockReportLeaseId != 0) { > LOG.warn(nnAddr + " sent back a full block report lease " + > "ID of 0x" + > Long.toHexString(resp.getFullBlockReportLeaseId()) + > ", but we already have a lease ID of 0x" + > Long.toHexString(fullBlockReportLeaseId) + ". " + > "Overwriting old lease ID."); > } > fullBlockReportLeaseId = resp.getFullBlockReportLeaseId(); > } > > } > } > > > if ((fullBlockReportLeaseId != 0) || forceFullBr) { > //Exception occurred here when NN restarting > cmds = blockReport(fullBlockReportLeaseId); > fullBlockReportLeaseId = 0; > } > > } catch(RemoteException re) { > > } // while (shouldRun()) > } // offerService{code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail:
[jira] [Commented] (HDFS-14305) Serial number in BlockTokenSecretManager could overlap between different namenodes
[ https://issues.apache.org/jira/browse/HDFS-14305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16781274#comment-16781274 ] He Xiaoqiao commented on HDFS-14305: Thanks [~arpitagarwal], {quote}Is this a compatible change and can it be applied safely during rolling upgrade without breaking anything?{quote} I believe this fix will not introduce incompatibility as [~xkrogen] and [~csun] descriptions. {quote}This does make me wonder if we should push this back to all branches containing HDFS-6440. {quote} +1 for backporting this fix to other branches. I will prepare patches soon. > Serial number in BlockTokenSecretManager could overlap between different > namenodes > -- > > Key: HDFS-14305 > URL: https://issues.apache.org/jira/browse/HDFS-14305 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode, security >Reporter: Chao Sun >Assignee: He Xiaoqiao >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-14305.001.patch, HDFS-14305.002.patch, > HDFS-14305.003.patch, HDFS-14305.004.patch, HDFS-14305.005.patch, > HDFS-14305.006.patch > > > Currently, a {{BlockTokenSecretManager}} starts with a random integer as the > initial serial number, and then use this formula to rotate it: > {code:java} > this.intRange = Integer.MAX_VALUE / numNNs; > this.nnRangeStart = intRange * nnIndex; > this.serialNo = (this.serialNo % intRange) + (nnRangeStart); > {code} > while {{numNNs}} is the total number of NameNodes in the cluster, and > {{nnIndex}} is the index of the current NameNode specified in the > configuration {{dfs.ha.namenodes.}}. > However, with this approach, different NameNode could have overlapping ranges > for serial number. For simplicity, let's assume {{Integer.MAX_VALUE}} is 100, > and we have 2 NameNodes {{nn1}} and {{nn2}} in configuration. Then the ranges > for these two are: > {code} > nn1 -> [-49, 49] > nn2 -> [1, 99] > {code} > This is because the initial serial number could be any negative integer. > Moreover, when the keys are updated, the serial number will again be updated > with the formula: > {code} > this.serialNo = (this.serialNo % intRange) + (nnRangeStart); > {code} > which means the new serial number could be updated to a range that belongs to > a different NameNode, thus increasing the chance of collision again. > When the collision happens, DataNodes could overwrite an existing key which > will cause clients to fail because of {{InvalidToken}} error. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14201) Ability to disallow safemode NN to become active
[ https://issues.apache.org/jira/browse/HDFS-14201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16780169#comment-16780169 ] He Xiaoqiao commented on HDFS-14201: +1 LGTM, Thanks [~surmountian]. > Ability to disallow safemode NN to become active > > > Key: HDFS-14201 > URL: https://issues.apache.org/jira/browse/HDFS-14201 > Project: Hadoop HDFS > Issue Type: Improvement > Components: auto-failover >Affects Versions: 3.1.1, 2.9.2 >Reporter: Xiao Liang >Assignee: Xiao Liang >Priority: Major > Attachments: HDFS-14201.001.patch, HDFS-14201.002.patch, > HDFS-14201.003.patch, HDFS-14201.004.patch > > > Currently with HA, Namenode in safemode can be possibly selected as active, > for availability of both read and write, Namenodes not in safemode are better > choices to become active though. > It can take tens of minutes for a cold started Namenode to get out of > safemode, especially when there are large number of files and blocks in HDFS, > that means if a Namenode in safemode become active, the cluster will be not > fully functioning for quite a while, even if it can while there is some > Namenode not in safemode. > The proposal here is to add an option, to allow Namenode to report itself as > UNHEALTHY to ZKFC, if it's in safemode, so as to only allow fully functioning > Namenode to become active, improving the general availability of the cluster. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14305) Serial number in BlockTokenSecretManager could overlap between different namenodes
[ https://issues.apache.org/jira/browse/HDFS-14305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16780050#comment-16780050 ] He Xiaoqiao commented on HDFS-14305: Thanks [~xkrogen]. +1, LGTM. Is it necessary to mark that 64 namenodes limit scope just for single Namespace in release note? I think this message may be useful for Federation. FYI. > Serial number in BlockTokenSecretManager could overlap between different > namenodes > -- > > Key: HDFS-14305 > URL: https://issues.apache.org/jira/browse/HDFS-14305 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode, security >Reporter: Chao Sun >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-14305.001.patch, HDFS-14305.002.patch, > HDFS-14305.003.patch, HDFS-14305.004.patch, HDFS-14305.005.patch, > HDFS-14305.006.patch > > > Currently, a {{BlockTokenSecretManager}} starts with a random integer as the > initial serial number, and then use this formula to rotate it: > {code:java} > this.intRange = Integer.MAX_VALUE / numNNs; > this.nnRangeStart = intRange * nnIndex; > this.serialNo = (this.serialNo % intRange) + (nnRangeStart); > {code} > while {{numNNs}} is the total number of NameNodes in the cluster, and > {{nnIndex}} is the index of the current NameNode specified in the > configuration {{dfs.ha.namenodes.}}. > However, with this approach, different NameNode could have overlapping ranges > for serial number. For simplicity, let's assume {{Integer.MAX_VALUE}} is 100, > and we have 2 NameNodes {{nn1}} and {{nn2}} in configuration. Then the ranges > for these two are: > {code} > nn1 -> [-49, 49] > nn2 -> [1, 99] > {code} > This is because the initial serial number could be any negative integer. > Moreover, when the keys are updated, the serial number will again be updated > with the formula: > {code} > this.serialNo = (this.serialNo % intRange) + (nnRangeStart); > {code} > which means the new serial number could be updated to a range that belongs to > a different NameNode, thus increasing the chance of collision again. > When the collision happens, DataNodes could overwrite an existing key which > will cause clients to fail because of {{InvalidToken}} error. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14201) Ability to disallow safemode NN to become active
[ https://issues.apache.org/jira/browse/HDFS-14201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779063#comment-16779063 ] He Xiaoqiao commented on HDFS-14201: Thanks [~surmountian], [^HDFS-14201.004.patch] looks fine to me. Just a little worry about TestHASafeMode#testTransitionToActiveWhenSafeMode: create new MiniDFSCluster may cause local path conflict and write editlog failure, This problem has appeared by Jenkins when apply [^HDFS-14201.002.patch], refer:https://builds.apache.org/job/PreCommit-HDFS-Build/26315/testReport/, and I am sorry to point out so late. FYI. > Ability to disallow safemode NN to become active > > > Key: HDFS-14201 > URL: https://issues.apache.org/jira/browse/HDFS-14201 > Project: Hadoop HDFS > Issue Type: Improvement > Components: auto-failover >Affects Versions: 3.1.1, 2.9.2 >Reporter: Xiao Liang >Assignee: Xiao Liang >Priority: Major > Attachments: HDFS-14201.001.patch, HDFS-14201.002.patch, > HDFS-14201.003.patch, HDFS-14201.004.patch > > > Currently with HA, Namenode in safemode can be possibly selected as active, > for availability of both read and write, Namenodes not in safemode are better > choices to become active though. > It can take tens of minutes for a cold started Namenode to get out of > safemode, especially when there are large number of files and blocks in HDFS, > that means if a Namenode in safemode become active, the cluster will be not > fully functioning for quite a while, even if it can while there is some > Namenode not in safemode. > The proposal here is to add an option, to allow Namenode to report itself as > UNHEALTHY to ZKFC, if it's in safemode, so as to only allow fully functioning > Namenode to become active, improving the general availability of the cluster. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14314) fullBlockReportLeaseId should be reset after registering to NN
[ https://issues.apache.org/jira/browse/HDFS-14314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779013#comment-16779013 ] He Xiaoqiao commented on HDFS-14314: Thanks [~starphin] for your contribution, it seems to be getting close to the truth. Please fix code style following https://builds.apache.org/job/PreCommit-HDFS-Build/26332/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt and check failure unit test(https://builds.apache.org/job/PreCommit-HDFS-Build/26332/testReport/) if related to your patch, maybe running at local is good choice. FYI. BTW, all links I just mentioned are refer from Jenkins result as the last comment shows. > fullBlockReportLeaseId should be reset after registering to NN > -- > > Key: HDFS-14314 > URL: https://issues.apache.org/jira/browse/HDFS-14314 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.8.4 > Environment: > > >Reporter: star >Priority: Critical > Fix For: 2.8.4 > > Attachments: HDFS-14314-trunk.001.patch, HDFS-14314-trunk.001.patch, > HDFS-14314-trunk.002.patch, HDFS-14314.0.patch, HDFS-14314.2.patch, > HDFS-14314.patch > > > since HDFS-7923 ,to rate-limit DN block report, DN will ask for a full > block lease id from active NN before sending full block to NN. Then DN will > send full block report together with lease id. If the lease id is invalid, NN > will reject the full block report and log "not in the pending set". > In a case when DN is doing full block reporting while NN is restarted. > It happens that DN will later send a full block report with lease id > ,acquired from previous NN instance, which is invalid to the new NN instance. > Though DN recognized the new NN instance by heartbeat and reregister itself, > it did not reset the lease id from previous instance. > The issuse may cause DNs to temporarily go dead, making it unsafe to > restart NN especially in hadoop cluster which has large amount of DNs. > HDFS-12914 reported the issue without any clues why it occurred and remain > unsolved. > To make it clear, look at code below. We take it from method > offerService of class BPServiceActor. We eliminate some code to focus on > current issue. fullBlockReportLeaseId is a local variable to hold lease id > from NN. Exceptions will occur at blockReport call when NN restarting, which > will be caught by catch block in while loop. Thus fullBlockReportLeaseId will > not be set to 0. After NN restarted, DN will send full block report which > will be rejected by the new NN instance. DN will never send full block report > until the next full block report schedule, about an hour later. > Solution is simple, just reset fullBlockReportLeaseId to 0 after any > exception or after registering to NN. Thus it will ask for a valid > fullBlockReportLeaseId from new NN instance. > {code:java} > private void offerService() throws Exception { > long fullBlockReportLeaseId = 0; > // > // Now loop for a long time > // > while (shouldRun()) { > try { > final long startTime = scheduler.monotonicNow(); > // > // Every so often, send heartbeat or block-report > // > final boolean sendHeartbeat = scheduler.isHeartbeatDue(startTime); > HeartbeatResponse resp = null; > if (sendHeartbeat) { > > boolean requestBlockReportLease = (fullBlockReportLeaseId == 0) && > scheduler.isBlockReportDue(startTime); > scheduler.scheduleNextHeartbeat(); > if (!dn.areHeartbeatsDisabledForTests()) { > resp = sendHeartBeat(requestBlockReportLease); > assert resp != null; > if (resp.getFullBlockReportLeaseId() != 0) { > if (fullBlockReportLeaseId != 0) { > LOG.warn(nnAddr + " sent back a full block report lease " + > "ID of 0x" + > Long.toHexString(resp.getFullBlockReportLeaseId()) + > ", but we already have a lease ID of 0x" + > Long.toHexString(fullBlockReportLeaseId) + ". " + > "Overwriting old lease ID."); > } > fullBlockReportLeaseId = resp.getFullBlockReportLeaseId(); > } > > } > } > > > if ((fullBlockReportLeaseId != 0) || forceFullBr) { > //Exception occurred here when NN restarting > cmds = blockReport(fullBlockReportLeaseId); > fullBlockReportLeaseId = 0; > } > > } catch(RemoteException re) { > > } // while (shouldRun()) > } // offerService{code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HDFS-14305) Serial number in BlockTokenSecretManager could overlap between different namenodes
[ https://issues.apache.org/jira/browse/HDFS-14305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779006#comment-16779006 ] He Xiaoqiao commented on HDFS-14305: [^HDFS-14305.006.patch] fix code style, I try to run fail test TestBPOfferService#testTrySendErrorReportWhenNNThrowsIOException and TestEditLogTailer#testRollEditLogIOExceptionForRemoteNN at local and it passed, Please help to double check. Another failure unit test TestJournalNodeSync, I believe it is not related to this patch. FYI. > Serial number in BlockTokenSecretManager could overlap between different > namenodes > -- > > Key: HDFS-14305 > URL: https://issues.apache.org/jira/browse/HDFS-14305 > Project: Hadoop HDFS > Issue Type: Bug > Components: security >Reporter: Chao Sun >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-14305.001.patch, HDFS-14305.002.patch, > HDFS-14305.003.patch, HDFS-14305.004.patch, HDFS-14305.005.patch, > HDFS-14305.006.patch > > > Currently, a {{BlockTokenSecretManager}} starts with a random integer as the > initial serial number, and then use this formula to rotate it: > {code:java} > this.intRange = Integer.MAX_VALUE / numNNs; > this.nnRangeStart = intRange * nnIndex; > this.serialNo = (this.serialNo % intRange) + (nnRangeStart); > {code} > while {{numNNs}} is the total number of NameNodes in the cluster, and > {{nnIndex}} is the index of the current NameNode specified in the > configuration {{dfs.ha.namenodes.}}. > However, with this approach, different NameNode could have overlapping ranges > for serial number. For simplicity, let's assume {{Integer.MAX_VALUE}} is 100, > and we have 2 NameNodes {{nn1}} and {{nn2}} in configuration. Then the ranges > for these two are: > {code} > nn1 -> [-49, 49] > nn2 -> [1, 99] > {code} > This is because the initial serial number could be any negative integer. > Moreover, when the keys are updated, the serial number will again be updated > with the formula: > {code} > this.serialNo = (this.serialNo % intRange) + (nnRangeStart); > {code} > which means the new serial number could be updated to a range that belongs to > a different NameNode, thus increasing the chance of collision again. > When the collision happens, DataNodes could overwrite an existing key which > will cause clients to fail because of {{InvalidToken}} error. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14305) Serial number in BlockTokenSecretManager could overlap between different namenodes
[ https://issues.apache.org/jira/browse/HDFS-14305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Xiaoqiao updated HDFS-14305: --- Attachment: HDFS-14305.006.patch > Serial number in BlockTokenSecretManager could overlap between different > namenodes > -- > > Key: HDFS-14305 > URL: https://issues.apache.org/jira/browse/HDFS-14305 > Project: Hadoop HDFS > Issue Type: Bug > Components: security >Reporter: Chao Sun >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-14305.001.patch, HDFS-14305.002.patch, > HDFS-14305.003.patch, HDFS-14305.004.patch, HDFS-14305.005.patch, > HDFS-14305.006.patch > > > Currently, a {{BlockTokenSecretManager}} starts with a random integer as the > initial serial number, and then use this formula to rotate it: > {code:java} > this.intRange = Integer.MAX_VALUE / numNNs; > this.nnRangeStart = intRange * nnIndex; > this.serialNo = (this.serialNo % intRange) + (nnRangeStart); > {code} > while {{numNNs}} is the total number of NameNodes in the cluster, and > {{nnIndex}} is the index of the current NameNode specified in the > configuration {{dfs.ha.namenodes.}}. > However, with this approach, different NameNode could have overlapping ranges > for serial number. For simplicity, let's assume {{Integer.MAX_VALUE}} is 100, > and we have 2 NameNodes {{nn1}} and {{nn2}} in configuration. Then the ranges > for these two are: > {code} > nn1 -> [-49, 49] > nn2 -> [1, 99] > {code} > This is because the initial serial number could be any negative integer. > Moreover, when the keys are updated, the serial number will again be updated > with the formula: > {code} > this.serialNo = (this.serialNo % intRange) + (nnRangeStart); > {code} > which means the new serial number could be updated to a range that belongs to > a different NameNode, thus increasing the chance of collision again. > When the collision happens, DataNodes could overwrite an existing key which > will cause clients to fail because of {{InvalidToken}} error. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14305) Serial number in BlockTokenSecretManager could overlap between different namenodes
[ https://issues.apache.org/jira/browse/HDFS-14305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16778835#comment-16778835 ] He Xiaoqiao commented on HDFS-14305: Thanks [~vagarychen],[~csun],[~xkrogen] for your comments, update and upload new patch [^HDFS-14305.005.patch], pending jenkins. > Serial number in BlockTokenSecretManager could overlap between different > namenodes > -- > > Key: HDFS-14305 > URL: https://issues.apache.org/jira/browse/HDFS-14305 > Project: Hadoop HDFS > Issue Type: Bug > Components: security >Reporter: Chao Sun >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-14305.001.patch, HDFS-14305.002.patch, > HDFS-14305.003.patch, HDFS-14305.004.patch, HDFS-14305.005.patch > > > Currently, a {{BlockTokenSecretManager}} starts with a random integer as the > initial serial number, and then use this formula to rotate it: > {code:java} > this.intRange = Integer.MAX_VALUE / numNNs; > this.nnRangeStart = intRange * nnIndex; > this.serialNo = (this.serialNo % intRange) + (nnRangeStart); > {code} > while {{numNNs}} is the total number of NameNodes in the cluster, and > {{nnIndex}} is the index of the current NameNode specified in the > configuration {{dfs.ha.namenodes.}}. > However, with this approach, different NameNode could have overlapping ranges > for serial number. For simplicity, let's assume {{Integer.MAX_VALUE}} is 100, > and we have 2 NameNodes {{nn1}} and {{nn2}} in configuration. Then the ranges > for these two are: > {code} > nn1 -> [-49, 49] > nn2 -> [1, 99] > {code} > This is because the initial serial number could be any negative integer. > Moreover, when the keys are updated, the serial number will again be updated > with the formula: > {code} > this.serialNo = (this.serialNo % intRange) + (nnRangeStart); > {code} > which means the new serial number could be updated to a range that belongs to > a different NameNode, thus increasing the chance of collision again. > When the collision happens, DataNodes could overwrite an existing key which > will cause clients to fail because of {{InvalidToken}} error. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14305) Serial number in BlockTokenSecretManager could overlap between different namenodes
[ https://issues.apache.org/jira/browse/HDFS-14305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Xiaoqiao updated HDFS-14305: --- Attachment: HDFS-14305.005.patch > Serial number in BlockTokenSecretManager could overlap between different > namenodes > -- > > Key: HDFS-14305 > URL: https://issues.apache.org/jira/browse/HDFS-14305 > Project: Hadoop HDFS > Issue Type: Bug > Components: security >Reporter: Chao Sun >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-14305.001.patch, HDFS-14305.002.patch, > HDFS-14305.003.patch, HDFS-14305.004.patch, HDFS-14305.005.patch > > > Currently, a {{BlockTokenSecretManager}} starts with a random integer as the > initial serial number, and then use this formula to rotate it: > {code:java} > this.intRange = Integer.MAX_VALUE / numNNs; > this.nnRangeStart = intRange * nnIndex; > this.serialNo = (this.serialNo % intRange) + (nnRangeStart); > {code} > while {{numNNs}} is the total number of NameNodes in the cluster, and > {{nnIndex}} is the index of the current NameNode specified in the > configuration {{dfs.ha.namenodes.}}. > However, with this approach, different NameNode could have overlapping ranges > for serial number. For simplicity, let's assume {{Integer.MAX_VALUE}} is 100, > and we have 2 NameNodes {{nn1}} and {{nn2}} in configuration. Then the ranges > for these two are: > {code} > nn1 -> [-49, 49] > nn2 -> [1, 99] > {code} > This is because the initial serial number could be any negative integer. > Moreover, when the keys are updated, the serial number will again be updated > with the formula: > {code} > this.serialNo = (this.serialNo % intRange) + (nnRangeStart); > {code} > which means the new serial number could be updated to a range that belongs to > a different NameNode, thus increasing the chance of collision again. > When the collision happens, DataNodes could overwrite an existing key which > will cause clients to fail because of {{InvalidToken}} error. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14314) fullBlockReportLeaseId should be reset after registering to NN
[ https://issues.apache.org/jira/browse/HDFS-14314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16778046#comment-16778046 ] He Xiaoqiao commented on HDFS-14314: Hi [~starphin], 1. build failed since missing import package(org.apache.hadoop.hdfs.server.protocol.StorageBlockReport and org.apache.hadoop.hdfs.server.protocol.BlockReportContext), please correct it. 2. please delete some useless blank line as following and some other where like thus. {quote} @@ -188,6 +203,24 @@ public HeartbeatResponse answer(InvocationOnMock invocation) throws Throwable { } + private class HeartbeatRegisterAnswer implements Answer { {quote} 3. I just suggest add timeout parameter for {{testRefreshLeaseId}} such as {{@Test(timeout = 3)}} FYI. > fullBlockReportLeaseId should be reset after registering to NN > -- > > Key: HDFS-14314 > URL: https://issues.apache.org/jira/browse/HDFS-14314 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.8.4 > Environment: > > >Reporter: star >Priority: Critical > Fix For: 2.8.4 > > Attachments: HDFS-14314-trunk.001.patch, HDFS-14314-trunk.001.patch, > HDFS-14314.0.patch, HDFS-14314.2.patch, HDFS-14314.patch > > > since HDFS-7923 ,to rate-limit DN block report, DN will ask for a full > block lease id from active NN before sending full block to NN. Then DN will > send full block report together with lease id. If the lease id is invalid, NN > will reject the full block report and log "not in the pending set". > In a case when DN is doing full block reporting while NN is restarted. > It happens that DN will later send a full block report with lease id > ,acquired from previous NN instance, which is invalid to the new NN instance. > Though DN recognized the new NN instance by heartbeat and reregister itself, > it did not reset the lease id from previous instance. > The issuse may cause DNs to temporarily go dead, making it unsafe to > restart NN especially in hadoop cluster which has large amount of DNs. > HDFS-12914 reported the issue without any clues why it occurred and remain > unsolved. > To make it clear, look at code below. We take it from method > offerService of class BPServiceActor. We eliminate some code to focus on > current issue. fullBlockReportLeaseId is a local variable to hold lease id > from NN. Exceptions will occur at blockReport call when NN restarting, which > will be caught by catch block in while loop. Thus fullBlockReportLeaseId will > not be set to 0. After NN restarted, DN will send full block report which > will be rejected by the new NN instance. DN will never send full block report > until the next full block report schedule, about an hour later. > Solution is simple, just reset fullBlockReportLeaseId to 0 after any > exception or after registering to NN. Thus it will ask for a valid > fullBlockReportLeaseId from new NN instance. > {code:java} > private void offerService() throws Exception { > long fullBlockReportLeaseId = 0; > // > // Now loop for a long time > // > while (shouldRun()) { > try { > final long startTime = scheduler.monotonicNow(); > // > // Every so often, send heartbeat or block-report > // > final boolean sendHeartbeat = scheduler.isHeartbeatDue(startTime); > HeartbeatResponse resp = null; > if (sendHeartbeat) { > > boolean requestBlockReportLease = (fullBlockReportLeaseId == 0) && > scheduler.isBlockReportDue(startTime); > scheduler.scheduleNextHeartbeat(); > if (!dn.areHeartbeatsDisabledForTests()) { > resp = sendHeartBeat(requestBlockReportLease); > assert resp != null; > if (resp.getFullBlockReportLeaseId() != 0) { > if (fullBlockReportLeaseId != 0) { > LOG.warn(nnAddr + " sent back a full block report lease " + > "ID of 0x" + > Long.toHexString(resp.getFullBlockReportLeaseId()) + > ", but we already have a lease ID of 0x" + > Long.toHexString(fullBlockReportLeaseId) + ". " + > "Overwriting old lease ID."); > } > fullBlockReportLeaseId = resp.getFullBlockReportLeaseId(); > } > > } > } > > > if ((fullBlockReportLeaseId != 0) || forceFullBr) { > //Exception occurred here when NN restarting > cmds = blockReport(fullBlockReportLeaseId); > fullBlockReportLeaseId = 0; > } > > } catch(RemoteException re) { > > } // while (shouldRun()) > } // offerService{code} > -- This message was sent by Atlassian JIRA
[jira] [Commented] (HDFS-14314) fullBlockReportLeaseId should be reset after registering to NN
[ https://issues.apache.org/jira/browse/HDFS-14314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1672#comment-1672 ] He Xiaoqiao commented on HDFS-14314: To [~jojochuang], please help to add [~starphin] as contributor, To [~starphin], after that, please click `submit patch` and re-upload patch, then it will auto-trigger jenkins to run unittest. > fullBlockReportLeaseId should be reset after registering to NN > -- > > Key: HDFS-14314 > URL: https://issues.apache.org/jira/browse/HDFS-14314 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.8.4 > Environment: > > >Reporter: star >Priority: Critical > Fix For: 2.8.4 > > Attachments: HDFS-14314-trunk.001.patch, HDFS-14314.0.patch, > HDFS-14314.2.patch, HDFS-14314.patch > > > since HDFS-7923 ,to rate-limit DN block report, DN will ask for a full > block lease id from active NN before sending full block to NN. Then DN will > send full block report together with lease id. If the lease id is invalid, NN > will reject the full block report and log "not in the pending set". > In a case when DN is doing full block reporting while NN is restarted. > It happens that DN will later send a full block report with lease id > ,acquired from previous NN instance, which is invalid to the new NN instance. > Though DN recognized the new NN instance by heartbeat and reregister itself, > it did not reset the lease id from previous instance. > The issuse may cause DNs to temporarily go dead, making it unsafe to > restart NN especially in hadoop cluster which has large amount of DNs. > HDFS-12914 reported the issue without any clues why it occurred and remain > unsolved. > To make it clear, look at code below. We take it from method > offerService of class BPServiceActor. We eliminate some code to focus on > current issue. fullBlockReportLeaseId is a local variable to hold lease id > from NN. Exceptions will occur at blockReport call when NN restarting, which > will be caught by catch block in while loop. Thus fullBlockReportLeaseId will > not be set to 0. After NN restarted, DN will send full block report which > will be rejected by the new NN instance. DN will never send full block report > until the next full block report schedule, about an hour later. > Solution is simple, just reset fullBlockReportLeaseId to 0 after any > exception or after registering to NN. Thus it will ask for a valid > fullBlockReportLeaseId from new NN instance. > {code:java} > private void offerService() throws Exception { > long fullBlockReportLeaseId = 0; > // > // Now loop for a long time > // > while (shouldRun()) { > try { > final long startTime = scheduler.monotonicNow(); > // > // Every so often, send heartbeat or block-report > // > final boolean sendHeartbeat = scheduler.isHeartbeatDue(startTime); > HeartbeatResponse resp = null; > if (sendHeartbeat) { > > boolean requestBlockReportLease = (fullBlockReportLeaseId == 0) && > scheduler.isBlockReportDue(startTime); > scheduler.scheduleNextHeartbeat(); > if (!dn.areHeartbeatsDisabledForTests()) { > resp = sendHeartBeat(requestBlockReportLease); > assert resp != null; > if (resp.getFullBlockReportLeaseId() != 0) { > if (fullBlockReportLeaseId != 0) { > LOG.warn(nnAddr + " sent back a full block report lease " + > "ID of 0x" + > Long.toHexString(resp.getFullBlockReportLeaseId()) + > ", but we already have a lease ID of 0x" + > Long.toHexString(fullBlockReportLeaseId) + ". " + > "Overwriting old lease ID."); > } > fullBlockReportLeaseId = resp.getFullBlockReportLeaseId(); > } > > } > } > > > if ((fullBlockReportLeaseId != 0) || forceFullBr) { > //Exception occurred here when NN restarting > cmds = blockReport(fullBlockReportLeaseId); > fullBlockReportLeaseId = 0; > } > > } catch(RemoteException re) { > > } // while (shouldRun()) > } // offerService{code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14305) Serial number in BlockTokenSecretManager could overlap between different namenodes
[ https://issues.apache.org/jira/browse/HDFS-14305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16777652#comment-16777652 ] He Xiaoqiao commented on HDFS-14305: Hi [~csun],[~xkrogen],[~jojochuang], update [^HDFS-14305.004.patch] following review comments. Please give another review if you have some time. Thanks. > Serial number in BlockTokenSecretManager could overlap between different > namenodes > -- > > Key: HDFS-14305 > URL: https://issues.apache.org/jira/browse/HDFS-14305 > Project: Hadoop HDFS > Issue Type: Bug > Components: security >Reporter: Chao Sun >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-14305.001.patch, HDFS-14305.002.patch, > HDFS-14305.003.patch, HDFS-14305.004.patch > > > Currently, a {{BlockTokenSecretManager}} starts with a random integer as the > initial serial number, and then use this formula to rotate it: > {code:java} > this.intRange = Integer.MAX_VALUE / numNNs; > this.nnRangeStart = intRange * nnIndex; > this.serialNo = (this.serialNo % intRange) + (nnRangeStart); > {code} > while {{numNNs}} is the total number of NameNodes in the cluster, and > {{nnIndex}} is the index of the current NameNode specified in the > configuration {{dfs.ha.namenodes.}}. > However, with this approach, different NameNode could have overlapping ranges > for serial number. For simplicity, let's assume {{Integer.MAX_VALUE}} is 100, > and we have 2 NameNodes {{nn1}} and {{nn2}} in configuration. Then the ranges > for these two are: > {code} > nn1 -> [-49, 49] > nn2 -> [1, 99] > {code} > This is because the initial serial number could be any negative integer. > Moreover, when the keys are updated, the serial number will again be updated > with the formula: > {code} > this.serialNo = (this.serialNo % intRange) + (nnRangeStart); > {code} > which means the new serial number could be updated to a range that belongs to > a different NameNode, thus increasing the chance of collision again. > When the collision happens, DataNodes could overwrite an existing key which > will cause clients to fail because of {{InvalidToken}} error. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14305) Serial number in BlockTokenSecretManager could overlap between different namenodes
[ https://issues.apache.org/jira/browse/HDFS-14305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Xiaoqiao updated HDFS-14305: --- Attachment: HDFS-14305.004.patch > Serial number in BlockTokenSecretManager could overlap between different > namenodes > -- > > Key: HDFS-14305 > URL: https://issues.apache.org/jira/browse/HDFS-14305 > Project: Hadoop HDFS > Issue Type: Bug > Components: security >Reporter: Chao Sun >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-14305.001.patch, HDFS-14305.002.patch, > HDFS-14305.003.patch, HDFS-14305.004.patch > > > Currently, a {{BlockTokenSecretManager}} starts with a random integer as the > initial serial number, and then use this formula to rotate it: > {code:java} > this.intRange = Integer.MAX_VALUE / numNNs; > this.nnRangeStart = intRange * nnIndex; > this.serialNo = (this.serialNo % intRange) + (nnRangeStart); > {code} > while {{numNNs}} is the total number of NameNodes in the cluster, and > {{nnIndex}} is the index of the current NameNode specified in the > configuration {{dfs.ha.namenodes.}}. > However, with this approach, different NameNode could have overlapping ranges > for serial number. For simplicity, let's assume {{Integer.MAX_VALUE}} is 100, > and we have 2 NameNodes {{nn1}} and {{nn2}} in configuration. Then the ranges > for these two are: > {code} > nn1 -> [-49, 49] > nn2 -> [1, 99] > {code} > This is because the initial serial number could be any negative integer. > Moreover, when the keys are updated, the serial number will again be updated > with the formula: > {code} > this.serialNo = (this.serialNo % intRange) + (nnRangeStart); > {code} > which means the new serial number could be updated to a range that belongs to > a different NameNode, thus increasing the chance of collision again. > When the collision happens, DataNodes could overwrite an existing key which > will cause clients to fail because of {{InvalidToken}} error. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14314) fullBlockReportLeaseId should be reset after registering to NN
[ https://issues.apache.org/jira/browse/HDFS-14314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16777640#comment-16777640 ] He Xiaoqiao commented on HDFS-14314: [~starphin], Thanks for your contribution, some minor comment, a. please follow community code style such as indented by 4 spaces, delete extra blank line, and some requisite comment for new method, I thinks the format problem is only in the unit test. b.naming your patch following -..patch, refer: https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute#HowToContribute-Namingyourpatch. +1 after update. > fullBlockReportLeaseId should be reset after registering to NN > -- > > Key: HDFS-14314 > URL: https://issues.apache.org/jira/browse/HDFS-14314 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.8.4 > Environment: > > >Reporter: star >Priority: Critical > Fix For: 2.8.4 > > Attachments: HDFS-14314.0.patch, HDFS-14314.2.patch, HDFS-14314.patch > > > since HDFS-7923 ,to rate-limit DN block report, DN will ask for a full > block lease id from active NN before sending full block to NN. Then DN will > send full block report together with lease id. If the lease id is invalid, NN > will reject the full block report and log "not in the pending set". > In a case when DN is doing full block reporting while NN is restarted. > It happens that DN will later send a full block report with lease id > ,acquired from previous NN instance, which is invalid to the new NN instance. > Though DN recognized the new NN instance by heartbeat and reregister itself, > it did not reset the lease id from previous instance. > The issuse may cause DNs to temporarily go dead, making it unsafe to > restart NN especially in hadoop cluster which has large amount of DNs. > HDFS-12914 reported the issue without any clues why it occurred and remain > unsolved. > To make it clear, look at code below. We take it from method > offerService of class BPServiceActor. We eliminate some code to focus on > current issue. fullBlockReportLeaseId is a local variable to hold lease id > from NN. Exceptions will occur at blockReport call when NN restarting, which > will be caught by catch block in while loop. Thus fullBlockReportLeaseId will > not be set to 0. After NN restarted, DN will send full block report which > will be rejected by the new NN instance. DN will never send full block report > until the next full block report schedule, about an hour later. > Solution is simple, just reset fullBlockReportLeaseId to 0 after any > exception or after registering to NN. Thus it will ask for a valid > fullBlockReportLeaseId from new NN instance. > {code:java} > private void offerService() throws Exception { > long fullBlockReportLeaseId = 0; > // > // Now loop for a long time > // > while (shouldRun()) { > try { > final long startTime = scheduler.monotonicNow(); > // > // Every so often, send heartbeat or block-report > // > final boolean sendHeartbeat = scheduler.isHeartbeatDue(startTime); > HeartbeatResponse resp = null; > if (sendHeartbeat) { > > boolean requestBlockReportLease = (fullBlockReportLeaseId == 0) && > scheduler.isBlockReportDue(startTime); > scheduler.scheduleNextHeartbeat(); > if (!dn.areHeartbeatsDisabledForTests()) { > resp = sendHeartBeat(requestBlockReportLease); > assert resp != null; > if (resp.getFullBlockReportLeaseId() != 0) { > if (fullBlockReportLeaseId != 0) { > LOG.warn(nnAddr + " sent back a full block report lease " + > "ID of 0x" + > Long.toHexString(resp.getFullBlockReportLeaseId()) + > ", but we already have a lease ID of 0x" + > Long.toHexString(fullBlockReportLeaseId) + ". " + > "Overwriting old lease ID."); > } > fullBlockReportLeaseId = resp.getFullBlockReportLeaseId(); > } > > } > } > > > if ((fullBlockReportLeaseId != 0) || forceFullBr) { > //Exception occurred here when NN restarting > cmds = blockReport(fullBlockReportLeaseId); > fullBlockReportLeaseId = 0; > } > > } catch(RemoteException re) { > > } // while (shouldRun()) > } // offerService{code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14201) Ability to disallow safemode NN to become active
[ https://issues.apache.org/jira/browse/HDFS-14201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16777624#comment-16777624 ] He Xiaoqiao commented on HDFS-14201: Thanks [~surmountian] to push forward this issue. {quote} I think combining the logic in HDFS-14201.002.patch and HDFS-14201.003.patch could be an option. The same configuration item would be controlling these logic to be on/off. {quote} +1, it makes sense to me, Thanks again. > Ability to disallow safemode NN to become active > > > Key: HDFS-14201 > URL: https://issues.apache.org/jira/browse/HDFS-14201 > Project: Hadoop HDFS > Issue Type: Improvement > Components: auto-failover >Affects Versions: 3.1.1, 2.9.2 >Reporter: Xiao Liang >Assignee: Xiao Liang >Priority: Major > Attachments: HDFS-14201.001.patch, HDFS-14201.002.patch, > HDFS-14201.003.patch > > > Currently with HA, Namenode in safemode can be possibly selected as active, > for availability of both read and write, Namenodes not in safemode are better > choices to become active though. > It can take tens of minutes for a cold started Namenode to get out of > safemode, especially when there are large number of files and blocks in HDFS, > that means if a Namenode in safemode become active, the cluster will be not > fully functioning for quite a while, even if it can while there is some > Namenode not in safemode. > The proposal here is to add an option, to allow Namenode to report itself as > UNHEALTHY to ZKFC, if it's in safemode, so as to only allow fully functioning > Namenode to become active, improving the general availability of the cluster. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14305) Serial number in BlockTokenSecretManager could overlap between different namenodes
[ https://issues.apache.org/jira/browse/HDFS-14305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16777091#comment-16777091 ] He Xiaoqiao commented on HDFS-14305: Thanks [~xkrogen]. I will update code style and add some comment later. {quote}I think 10 bits for the mask seems a little high to me; I agree with Chao that I can't think of a situation where you would need more than 32 or 64, and fewer bits for the per-NN key space mean a higher chance of collision on a NameNode restart.{quote} Considering that there are total 32 bits of Integer and it is enough for rolling serial no using 22 bits. another side, fewer bits for mask more namenodes it could cover that avoid collision. So I choose 10 bits. Of course, it is OK for me if choose number of mask bits between 3~10. Thanks again. > Serial number in BlockTokenSecretManager could overlap between different > namenodes > -- > > Key: HDFS-14305 > URL: https://issues.apache.org/jira/browse/HDFS-14305 > Project: Hadoop HDFS > Issue Type: Bug > Components: security >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > Attachments: HDFS-14305.001.patch, HDFS-14305.002.patch, > HDFS-14305.003.patch > > > Currently, a {{BlockTokenSecretManager}} starts with a random integer as the > initial serial number, and then use this formula to rotate it: > {code:java} > this.intRange = Integer.MAX_VALUE / numNNs; > this.nnRangeStart = intRange * nnIndex; > this.serialNo = (this.serialNo % intRange) + (nnRangeStart); > {code} > while {{numNNs}} is the total number of NameNodes in the cluster, and > {{nnIndex}} is the index of the current NameNode specified in the > configuration {{dfs.ha.namenodes.}}. > However, with this approach, different NameNode could have overlapping ranges > for serial number. For simplicity, let's assume {{Integer.MAX_VALUE}} is 100, > and we have 2 NameNodes {{nn1}} and {{nn2}} in configuration. Then the ranges > for these two are: > {code} > nn1 -> [-49, 49] > nn2 -> [1, 99] > {code} > This is because the initial serial number could be any negative integer. > Moreover, when the keys are updated, the serial number will again be updated > with the formula: > {code} > this.serialNo = (this.serialNo % intRange) + (nnRangeStart); > {code} > which means the new serial number could be updated to a range that belongs to > a different NameNode, thus increasing the chance of collision again. > When the collision happens, DataNodes could overwrite an existing key which > will cause clients to fail because of {{InvalidToken}} error. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14201) Ability to disallow safemode NN to become active
[ https://issues.apache.org/jira/browse/HDFS-14201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16776513#comment-16776513 ] He Xiaoqiao commented on HDFS-14201: Thanks [~surmountian] for quick response, [^HDFS-14201.003.patch] LGTM for auto-failover using ZKFC, I am also going to concern if it can cover case about manual transition without ZKFC when namenode still in safemode. FYI. > Ability to disallow safemode NN to become active > > > Key: HDFS-14201 > URL: https://issues.apache.org/jira/browse/HDFS-14201 > Project: Hadoop HDFS > Issue Type: Improvement > Components: auto-failover >Affects Versions: 3.1.1, 2.9.2 >Reporter: Xiao Liang >Assignee: Xiao Liang >Priority: Major > Attachments: HDFS-14201.001.patch, HDFS-14201.002.patch, > HDFS-14201.003.patch > > > Currently with HA, Namenode in safemode can be possibly selected as active, > for availability of both read and write, Namenodes not in safemode are better > choices to become active though. > It can take tens of minutes for a cold started Namenode to get out of > safemode, especially when there are large number of files and blocks in HDFS, > that means if a Namenode in safemode become active, the cluster will be not > fully functioning for quite a while, even if it can while there is some > Namenode not in safemode. > The proposal here is to add an option, to allow Namenode to report itself as > UNHEALTHY to ZKFC, if it's in safemode, so as to only allow fully functioning > Namenode to become active, improving the general availability of the cluster. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14305) Serial number in BlockTokenSecretManager could overlap between different namenodes
[ https://issues.apache.org/jira/browse/HDFS-14305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16776511#comment-16776511 ] He Xiaoqiao commented on HDFS-14305: Fix bug about bit-shift and add new unit test [^HDFS-14305.003.patch]. trigger jenkins again. > Serial number in BlockTokenSecretManager could overlap between different > namenodes > -- > > Key: HDFS-14305 > URL: https://issues.apache.org/jira/browse/HDFS-14305 > Project: Hadoop HDFS > Issue Type: Bug > Components: security >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > Attachments: HDFS-14305.001.patch, HDFS-14305.002.patch, > HDFS-14305.003.patch > > > Currently, a {{BlockTokenSecretManager}} starts with a random integer as the > initial serial number, and then use this formula to rotate it: > {code:java} > this.intRange = Integer.MAX_VALUE / numNNs; > this.nnRangeStart = intRange * nnIndex; > this.serialNo = (this.serialNo % intRange) + (nnRangeStart); > {code} > while {{numNNs}} is the total number of NameNodes in the cluster, and > {{nnIndex}} is the index of the current NameNode specified in the > configuration {{dfs.ha.namenodes.}}. > However, with this approach, different NameNode could have overlapping ranges > for serial number. For simplicity, let's assume {{Integer.MAX_VALUE}} is 100, > and we have 2 NameNodes {{nn1}} and {{nn2}} in configuration. Then the ranges > for these two are: > {code} > nn1 -> [-49, 49] > nn2 -> [1, 99] > {code} > This is because the initial serial number could be any negative integer. > Moreover, when the keys are updated, the serial number will again be updated > with the formula: > {code} > this.serialNo = (this.serialNo % intRange) + (nnRangeStart); > {code} > which means the new serial number could be updated to a range that belongs to > a different NameNode, thus increasing the chance of collision again. > When the collision happens, DataNodes could overwrite an existing key which > will cause clients to fail because of {{InvalidToken}} error. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14305) Serial number in BlockTokenSecretManager could overlap between different namenodes
[ https://issues.apache.org/jira/browse/HDFS-14305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Xiaoqiao updated HDFS-14305: --- Attachment: HDFS-14305.003.patch > Serial number in BlockTokenSecretManager could overlap between different > namenodes > -- > > Key: HDFS-14305 > URL: https://issues.apache.org/jira/browse/HDFS-14305 > Project: Hadoop HDFS > Issue Type: Bug > Components: security >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > Attachments: HDFS-14305.001.patch, HDFS-14305.002.patch, > HDFS-14305.003.patch > > > Currently, a {{BlockTokenSecretManager}} starts with a random integer as the > initial serial number, and then use this formula to rotate it: > {code:java} > this.intRange = Integer.MAX_VALUE / numNNs; > this.nnRangeStart = intRange * nnIndex; > this.serialNo = (this.serialNo % intRange) + (nnRangeStart); > {code} > while {{numNNs}} is the total number of NameNodes in the cluster, and > {{nnIndex}} is the index of the current NameNode specified in the > configuration {{dfs.ha.namenodes.}}. > However, with this approach, different NameNode could have overlapping ranges > for serial number. For simplicity, let's assume {{Integer.MAX_VALUE}} is 100, > and we have 2 NameNodes {{nn1}} and {{nn2}} in configuration. Then the ranges > for these two are: > {code} > nn1 -> [-49, 49] > nn2 -> [1, 99] > {code} > This is because the initial serial number could be any negative integer. > Moreover, when the keys are updated, the serial number will again be updated > with the formula: > {code} > this.serialNo = (this.serialNo % intRange) + (nnRangeStart); > {code} > which means the new serial number could be updated to a range that belongs to > a different NameNode, thus increasing the chance of collision again. > When the collision happens, DataNodes could overwrite an existing key which > will cause clients to fail because of {{InvalidToken}} error. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14314) fullBlockReportLeaseId should be reset after registering to NN
[ https://issues.apache.org/jira/browse/HDFS-14314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16776238#comment-16776238 ] He Xiaoqiao commented on HDFS-14314: [~starphin], I do not think fail tests (TestJournalNodeSync) is related with this patch since it failed for some time. It may be better if add new unit test. FYI. > fullBlockReportLeaseId should be reset after registering to NN > -- > > Key: HDFS-14314 > URL: https://issues.apache.org/jira/browse/HDFS-14314 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.8.4 > Environment: > > >Reporter: star >Priority: Critical > Fix For: 2.8.4 > > Attachments: HDFS-14314.0.patch, HDFS-14314.patch > > > since HDFS-7923 ,to rate-limit DN block report, DN will ask for a full > block lease id from active NN before sending full block to NN. Then DN will > send full block report together with lease id. If the lease id is invalid, NN > will reject the full block report and log "not in the pending set". > In a case when DN is doing full block reporting while NN is restarted. > It happens that DN will later send a full block report with lease id > ,acquired from previous NN instance, which is invalid to the new NN instance. > Though DN recognized the new NN instance by heartbeat and reregister itself, > it did not reset the lease id from previous instance. > The issuse may cause DNs to temporarily go dead, making it unsafe to > restart NN especially in hadoop cluster which has large amount of DNs. > HDFS-12914 reported the issue without any clues why it occurred and remain > unsolved. > To make it clear, look at code below. We take it from method > offerService of class BPServiceActor. We eliminate some code to focus on > current issue. fullBlockReportLeaseId is a local variable to hold lease id > from NN. Exceptions will occur at blockReport call when NN restarting, which > will be caught by catch block in while loop. Thus fullBlockReportLeaseId will > not be set to 0. After NN restarted, DN will send full block report which > will be rejected by the new NN instance. DN will never send full block report > until the next full block report schedule, about an hour later. > Solution is simple, just reset fullBlockReportLeaseId to 0 after any > exception or after registering to NN. Thus it will ask for a valid > fullBlockReportLeaseId from new NN instance. > {code:java} > private void offerService() throws Exception { > long fullBlockReportLeaseId = 0; > // > // Now loop for a long time > // > while (shouldRun()) { > try { > final long startTime = scheduler.monotonicNow(); > // > // Every so often, send heartbeat or block-report > // > final boolean sendHeartbeat = scheduler.isHeartbeatDue(startTime); > HeartbeatResponse resp = null; > if (sendHeartbeat) { > > boolean requestBlockReportLease = (fullBlockReportLeaseId == 0) && > scheduler.isBlockReportDue(startTime); > scheduler.scheduleNextHeartbeat(); > if (!dn.areHeartbeatsDisabledForTests()) { > resp = sendHeartBeat(requestBlockReportLease); > assert resp != null; > if (resp.getFullBlockReportLeaseId() != 0) { > if (fullBlockReportLeaseId != 0) { > LOG.warn(nnAddr + " sent back a full block report lease " + > "ID of 0x" + > Long.toHexString(resp.getFullBlockReportLeaseId()) + > ", but we already have a lease ID of 0x" + > Long.toHexString(fullBlockReportLeaseId) + ". " + > "Overwriting old lease ID."); > } > fullBlockReportLeaseId = resp.getFullBlockReportLeaseId(); > } > > } > } > > > if ((fullBlockReportLeaseId != 0) || forceFullBr) { > //Exception occurred here when NN restarting > cmds = blockReport(fullBlockReportLeaseId); > fullBlockReportLeaseId = 0; > } > > } catch(RemoteException re) { > > } // while (shouldRun()) > } // offerService{code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14201) Ability to disallow safemode NN to become active
[ https://issues.apache.org/jira/browse/HDFS-14201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16776235#comment-16776235 ] He Xiaoqiao commented on HDFS-14201: [^HDFS-14201.002.patch] add new configuration element for safemode checking when transition to active. > Ability to disallow safemode NN to become active > > > Key: HDFS-14201 > URL: https://issues.apache.org/jira/browse/HDFS-14201 > Project: Hadoop HDFS > Issue Type: Improvement > Components: auto-failover >Affects Versions: 3.1.1, 2.9.2 >Reporter: Xiao Liang >Assignee: Xiao Liang >Priority: Major > Attachments: HDFS-14201.001.patch, HDFS-14201.002.patch > > > Currently with HA, Namenode in safemode can be possibly selected as active, > for availability of both read and write, Namenodes not in safemode are better > choices to become active though. > It can take tens of minutes for a cold started Namenode to get out of > safemode, especially when there are large number of files and blocks in HDFS, > that means if a Namenode in safemode become active, the cluster will be not > fully functioning for quite a while, even if it can while there is some > Namenode not in safemode. > The proposal here is to add an option, to allow Namenode to report itself as > UNHEALTHY to ZKFC, if it's in safemode, so as to only allow fully functioning > Namenode to become active, improving the general availability of the cluster. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14201) Ability to disallow safemode NN to become active
[ https://issues.apache.org/jira/browse/HDFS-14201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Xiaoqiao updated HDFS-14201: --- Attachment: HDFS-14201.002.patch > Ability to disallow safemode NN to become active > > > Key: HDFS-14201 > URL: https://issues.apache.org/jira/browse/HDFS-14201 > Project: Hadoop HDFS > Issue Type: Improvement > Components: auto-failover >Affects Versions: 3.1.1, 2.9.2 >Reporter: Xiao Liang >Assignee: Xiao Liang >Priority: Major > Attachments: HDFS-14201.001.patch, HDFS-14201.002.patch > > > Currently with HA, Namenode in safemode can be possibly selected as active, > for availability of both read and write, Namenodes not in safemode are better > choices to become active though. > It can take tens of minutes for a cold started Namenode to get out of > safemode, especially when there are large number of files and blocks in HDFS, > that means if a Namenode in safemode become active, the cluster will be not > fully functioning for quite a while, even if it can while there is some > Namenode not in safemode. > The proposal here is to add an option, to allow Namenode to report itself as > UNHEALTHY to ZKFC, if it's in safemode, so as to only allow fully functioning > Namenode to become active, improving the general availability of the cluster. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14305) Serial number in BlockTokenSecretManager could overlap between different namenodes
[ https://issues.apache.org/jira/browse/HDFS-14305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Xiaoqiao updated HDFS-14305: --- Attachment: HDFS-14305.002.patch Status: Patch Available (was: Open) [~csun],[~xkrogen], [^HDFS-14305.002.patch] using 10 bits to identify index of NameNode in the same namespace, and the remainder 22 bits auto-incr, which can cover <1024 namenodes in one namespace and fix serial No. overlap about {{BlockTokenSecretManager}} with the previous implementation without HDFS-6440. Please help to review at your convenience. > Serial number in BlockTokenSecretManager could overlap between different > namenodes > -- > > Key: HDFS-14305 > URL: https://issues.apache.org/jira/browse/HDFS-14305 > Project: Hadoop HDFS > Issue Type: Bug > Components: security >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > Attachments: HDFS-14305.001.patch, HDFS-14305.002.patch > > > Currently, a {{BlockTokenSecretManager}} starts with a random integer as the > initial serial number, and then use this formula to rotate it: > {code:java} > this.intRange = Integer.MAX_VALUE / numNNs; > this.nnRangeStart = intRange * nnIndex; > this.serialNo = (this.serialNo % intRange) + (nnRangeStart); > {code} > while {{numNNs}} is the total number of NameNodes in the cluster, and > {{nnIndex}} is the index of the current NameNode specified in the > configuration {{dfs.ha.namenodes.}}. > However, with this approach, different NameNode could have overlapping ranges > for serial number. For simplicity, let's assume {{Integer.MAX_VALUE}} is 100, > and we have 2 NameNodes {{nn1}} and {{nn2}} in configuration. Then the ranges > for these two are: > {code} > nn1 -> [-49, 49] > nn2 -> [1, 99] > {code} > This is because the initial serial number could be any negative integer. > Moreover, when the keys are updated, the serial number will again be updated > with the formula: > {code} > this.serialNo = (this.serialNo % intRange) + (nnRangeStart); > {code} > which means the new serial number could be updated to a range that belongs to > a different NameNode, thus increasing the chance of collision again. > When the collision happens, DataNodes could overwrite an existing key which > will cause clients to fail because of {{InvalidToken}} error. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14305) Serial number in BlockTokenSecretManager could overlap between different namenodes
[ https://issues.apache.org/jira/browse/HDFS-14305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16776199#comment-16776199 ] He Xiaoqiao commented on HDFS-14305: Thanks [~csun],[~xkrogen] for your quick response. {quote}One potential issue with the patch 001 is that when keys are updated (which will call setSerialNo), it could go to a range that belongs to a different NameNode{quote} To [~csun], with patch 001 I think serial no will not be overlap for different namenodes if fixed number of namenodes in the same namespace. But it will appear when add/remove namenodes (e.g. observers) and have to re-config and restart all namenode in the same namespace. I think you also mean that, right? {quote}Instead of 1 bit, we can either pre-allocate a fixed number of bits (e.g., 5), or calculate the number of bits needed from the total number of configured namenodes. {quote} I agree with pre-allocate a fixed number of bits for different namenodes. [~xkrogen],[~csun] any more suggestions. > Serial number in BlockTokenSecretManager could overlap between different > namenodes > -- > > Key: HDFS-14305 > URL: https://issues.apache.org/jira/browse/HDFS-14305 > Project: Hadoop HDFS > Issue Type: Bug > Components: security >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > Attachments: HDFS-14305.001.patch > > > Currently, a {{BlockTokenSecretManager}} starts with a random integer as the > initial serial number, and then use this formula to rotate it: > {code:java} > this.intRange = Integer.MAX_VALUE / numNNs; > this.nnRangeStart = intRange * nnIndex; > this.serialNo = (this.serialNo % intRange) + (nnRangeStart); > {code} > while {{numNNs}} is the total number of NameNodes in the cluster, and > {{nnIndex}} is the index of the current NameNode specified in the > configuration {{dfs.ha.namenodes.}}. > However, with this approach, different NameNode could have overlapping ranges > for serial number. For simplicity, let's assume {{Integer.MAX_VALUE}} is 100, > and we have 2 NameNodes {{nn1}} and {{nn2}} in configuration. Then the ranges > for these two are: > {code} > nn1 -> [-49, 49] > nn2 -> [1, 99] > {code} > This is because the initial serial number could be any negative integer. > Moreover, when the keys are updated, the serial number will again be updated > with the formula: > {code} > this.serialNo = (this.serialNo % intRange) + (nnRangeStart); > {code} > which means the new serial number could be updated to a range that belongs to > a different NameNode, thus increasing the chance of collision again. > When the collision happens, DataNodes could overwrite an existing key which > will cause clients to fail because of {{InvalidToken}} error. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14305) Serial number in BlockTokenSecretManager could overlap between different namenodes
[ https://issues.apache.org/jira/browse/HDFS-14305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16774944#comment-16774944 ] He Xiaoqiao commented on HDFS-14305: [~csun],[~jojochuang] I attached a quick-and-dirty demonstration patch without unittest [^HDFS-14305.001.patch]. Please correct me if there are something wrong. > Serial number in BlockTokenSecretManager could overlap between different > namenodes > -- > > Key: HDFS-14305 > URL: https://issues.apache.org/jira/browse/HDFS-14305 > Project: Hadoop HDFS > Issue Type: Bug > Components: security >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > Attachments: HDFS-14305.001.patch > > > Currently, a {{BlockTokenSecretManager}} starts with a random integer as the > initial serial number, and then use this formula to rotate it: > {code:java} > this.intRange = Integer.MAX_VALUE / numNNs; > this.nnRangeStart = intRange * nnIndex; > this.serialNo = (this.serialNo % intRange) + (nnRangeStart); > {code} > while {{numNNs}} is the total number of NameNodes in the cluster, and > {{nnIndex}} is the index of the current NameNode specified in the > configuration {{dfs.ha.namenodes.}}. > However, with this approach, different NameNode could have overlapping ranges > for serial number. For simplicity, let's assume {{Integer.MAX_VALUE}} is 100, > and we have 2 NameNodes {{nn1}} and {{nn2}} in configuration. Then the ranges > for these two are: > {code} > nn1 -> [-49, 49] > nn2 -> [1, 99] > {code} > This is because the initial serial number could be any negative integer. > Moreover, when the keys are updated, the serial number will again be updated > with the formula: > {code} > this.serialNo = (this.serialNo % intRange) + (nnRangeStart); > {code} > which means the new serial number could be updated to a range that belongs to > a different NameNode, thus increasing the chance of collision again. > When the collision happens, DataNodes could overwrite an existing key which > will cause clients to fail because of {{InvalidToken}} error. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14305) Serial number in BlockTokenSecretManager could overlap between different namenodes
[ https://issues.apache.org/jira/browse/HDFS-14305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Xiaoqiao updated HDFS-14305: --- Attachment: HDFS-14305.001.patch > Serial number in BlockTokenSecretManager could overlap between different > namenodes > -- > > Key: HDFS-14305 > URL: https://issues.apache.org/jira/browse/HDFS-14305 > Project: Hadoop HDFS > Issue Type: Bug > Components: security >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > Attachments: HDFS-14305.001.patch > > > Currently, a {{BlockTokenSecretManager}} starts with a random integer as the > initial serial number, and then use this formula to rotate it: > {code:java} > this.intRange = Integer.MAX_VALUE / numNNs; > this.nnRangeStart = intRange * nnIndex; > this.serialNo = (this.serialNo % intRange) + (nnRangeStart); > {code} > while {{numNNs}} is the total number of NameNodes in the cluster, and > {{nnIndex}} is the index of the current NameNode specified in the > configuration {{dfs.ha.namenodes.}}. > However, with this approach, different NameNode could have overlapping ranges > for serial number. For simplicity, let's assume {{Integer.MAX_VALUE}} is 100, > and we have 2 NameNodes {{nn1}} and {{nn2}} in configuration. Then the ranges > for these two are: > {code} > nn1 -> [-49, 49] > nn2 -> [1, 99] > {code} > This is because the initial serial number could be any negative integer. > Moreover, when the keys are updated, the serial number will again be updated > with the formula: > {code} > this.serialNo = (this.serialNo % intRange) + (nnRangeStart); > {code} > which means the new serial number could be updated to a range that belongs to > a different NameNode, thus increasing the chance of collision again. > When the collision happens, DataNodes could overwrite an existing key which > will cause clients to fail because of {{InvalidToken}} error. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14305) Serial number in BlockTokenSecretManager could overlap between different namenodes
[ https://issues.apache.org/jira/browse/HDFS-14305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16774934#comment-16774934 ] He Xiaoqiao commented on HDFS-14305: hi [~csun], I think this issue triggered only after HDFS-6440. Before that, it is work well in HA cluster with 2 NameNodes (based on branch-2.7). Check {{serialNo}} NO. scope and shows as following and no overlap between 2 namenodes: {quote}nnIndex=0: [0, 2147483647] nnIndex=1: [-2147483648, -1] {quote} HDFS-6440 used {{intRange}} + {{nnRangeStart}} replace {{nnIndex}}, and only distributed positive integer to different namenodes, but when initialize serialNo it could be negtive integer since invoke {{new SecureRandom().nextInt()}}, and cause serialno overlap between different namenodes in same namespace. In one words, the root cause is {{SecureRandom().nextInt()}}. I propose to use only positive integer as serialNo of BlockTokenSecretManager to avoid this issue. FYI. > Serial number in BlockTokenSecretManager could overlap between different > namenodes > -- > > Key: HDFS-14305 > URL: https://issues.apache.org/jira/browse/HDFS-14305 > Project: Hadoop HDFS > Issue Type: Bug > Components: security >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > > Currently, a {{BlockTokenSecretManager}} starts with a random integer as the > initial serial number, and then use this formula to rotate it: > {code:java} > this.intRange = Integer.MAX_VALUE / numNNs; > this.nnRangeStart = intRange * nnIndex; > this.serialNo = (this.serialNo % intRange) + (nnRangeStart); > {code} > while {{numNNs}} is the total number of NameNodes in the cluster, and > {{nnIndex}} is the index of the current NameNode specified in the > configuration {{dfs.ha.namenodes.}}. > However, with this approach, different NameNode could have overlapping ranges > for serial number. For simplicity, let's assume {{Integer.MAX_VALUE}} is 100, > and we have 2 NameNodes {{nn1}} and {{nn2}} in configuration. Then the ranges > for these two are: > {code} > nn1 -> [-49, 49] > nn2 -> [1, 99] > {code} > This is because the initial serial number could be any negative integer. > Moreover, when the keys are updated, the serial number will again be updated > with the formula: > {code} > this.serialNo = (this.serialNo % intRange) + (nnRangeStart); > {code} > which means the new serial number could be updated to a range that belongs to > a different NameNode, thus increasing the chance of collision again. > When the collision happens, DataNodes could overwrite an existing key which > will cause clients to fail because of {{InvalidToken}} error. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14309) name node fail over failed with ssh fence failed because of jsch login failed with key check
[ https://issues.apache.org/jira/browse/HDFS-14309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16774825#comment-16774825 ] He Xiaoqiao commented on HDFS-14309: Hi [~iamgd67], Thanks to report this issue. HADOOP-14100 has fixed it, and also merge into branch-2.7, I think you could backport HADOOP-14100 to your own branch. FYI. > name node fail over failed with ssh fence failed because of jsch login failed > with key check > > > Key: HDFS-14309 > URL: https://issues.apache.org/jira/browse/HDFS-14309 > Project: Hadoop HDFS > Issue Type: Bug > Components: auto-failover >Affects Versions: 2.7.3 > Environment: linux CentOS release 6.8 (Final) kernel > 2.6.32-642.6.1.el6.x86_64 > >Reporter: qiang Liu >Priority: Major > Attachments: HDFS-14309-branch-2.7.3.001.patch > > Original Estimate: 1m > Remaining Estimate: 1m > > name node fail over failed with ssh fence failed because of jsch login failed > with key check. > the loged error is "Algorithm negotiation fail" > update jsch to 0.1.54 works ok -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14201) Ability to disallow safemode NN to become active
[ https://issues.apache.org/jira/browse/HDFS-14201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16774799#comment-16774799 ] He Xiaoqiao commented on HDFS-14201: Hi [~surmountian], [^HDFS-14201.001.patch] is my improvement for online production env, and I have turn off ability about #transitionToActive when namenode is still in safemode. It looks work well for over half a years. FYI. > Ability to disallow safemode NN to become active > > > Key: HDFS-14201 > URL: https://issues.apache.org/jira/browse/HDFS-14201 > Project: Hadoop HDFS > Issue Type: Improvement > Components: auto-failover >Affects Versions: 3.1.1, 2.9.2 >Reporter: Xiao Liang >Assignee: Xiao Liang >Priority: Major > Attachments: HDFS-14201.001.patch > > > Currently with HA, Namenode in safemode can be possibly selected as active, > for availability of both read and write, Namenodes not in safemode are better > choices to become active though. > It can take tens of minutes for a cold started Namenode to get out of > safemode, especially when there are large number of files and blocks in HDFS, > that means if a Namenode in safemode become active, the cluster will be not > fully functioning for quite a while, even if it can while there is some > Namenode not in safemode. > The proposal here is to add an option, to allow Namenode to report itself as > UNHEALTHY to ZKFC, if it's in safemode, so as to only allow fully functioning > Namenode to become active, improving the general availability of the cluster. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14201) Ability to disallow safemode NN to become active
[ https://issues.apache.org/jira/browse/HDFS-14201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Xiaoqiao updated HDFS-14201: --- Attachment: HDFS-14201.001.patch > Ability to disallow safemode NN to become active > > > Key: HDFS-14201 > URL: https://issues.apache.org/jira/browse/HDFS-14201 > Project: Hadoop HDFS > Issue Type: Improvement > Components: auto-failover >Affects Versions: 3.1.1, 2.9.2 >Reporter: Xiao Liang >Assignee: Xiao Liang >Priority: Major > Attachments: HDFS-14201.001.patch > > > Currently with HA, Namenode in safemode can be possibly selected as active, > for availability of both read and write, Namenodes not in safemode are better > choices to become active though. > It can take tens of minutes for a cold started Namenode to get out of > safemode, especially when there are large number of files and blocks in HDFS, > that means if a Namenode in safemode become active, the cluster will be not > fully functioning for quite a while, even if it can while there is some > Namenode not in safemode. > The proposal here is to add an option, to allow Namenode to report itself as > UNHEALTHY to ZKFC, if it's in safemode, so as to only allow fully functioning > Namenode to become active, improving the general availability of the cluster. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-14186) blockreport storm slow down namenode restart seriously in large cluster
[ https://issues.apache.org/jira/browse/HDFS-14186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Xiaoqiao resolved HDFS-14186. Resolution: Won't Fix Thanks all for your helps, this issue has resolved with HADOOP-12173 + HDFS-9198 and it works very well as expect in our production env. Thanks again. > blockreport storm slow down namenode restart seriously in large cluster > --- > > Key: HDFS-14186 > URL: https://issues.apache.org/jira/browse/HDFS-14186 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 2.7.1 >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-14186.001.patch > > > In the current implementation, the datanode sends blockreport immediately > after register to namenode successfully when restart, and the blockreport > storm will make namenode high load to process them. One result is some > received RPC have to skip because queue time is timeout. If some datanodes' > heartbeat RPC are continually skipped for long times (default is > heartbeatExpireInterval=630s) it will be set DEAD, then datanode has to > re-register and send blockreport again, aggravate blockreport storm and trap > in a vicious circle, and slow down (more than one hour and even more) > namenode startup seriously in a large (several thousands of datanodes) and > busy cluster especially. Although there are many work to optimize namenode > startup, the issue still exists. > I propose to postpone dead datanode check when namenode have finished startup. > Any comments and suggestions are welcome. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14186) blockreport storm slow down namenode restart seriously in large cluster
[ https://issues.apache.org/jira/browse/HDFS-14186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16752055#comment-16752055 ] He Xiaoqiao commented on HDFS-14186: The issue about namenode restart time-consume seriously have solved by patch HADOOP-12173+HDFS-9198. The root cause is as following: A. NetworkTopology#toString is hot point only for hadoop-2.7.1. B. Serial BR processing affects performance when restart. Opt A cause process RPC #proceregisterDatanode cost long time, the worst case like: {quote}2019-01-21 18:08:06,303 DEBUG org.apache.hadoop.ipc.Server: Served: registerDatanode queueTime= 66079 procesingTime= 3266{quote} And QueueCall is always full, So some DataNode has to retry until register successfully. Stack trace like: {quote}"IPC Server handler 40 on 8040" #149 daemon prio=5 os_prio=0 tid=0x7f7ff571c800 nid=0x2a9dd runnable [0x7f19b10ce000] java.lang.Thread.State: RUNNABLE at org.apache.hadoop.net.NetworkTopology$InnerNode.getLeaf(NetworkTopology.java:340) at org.apache.hadoop.net.NetworkTopology$InnerNode.getLeaf(NetworkTopology.java:340) at org.apache.hadoop.net.NetworkTopology.toString(NetworkTopology.java:831) at org.apache.hadoop.net.NetworkTopology.add(NetworkTopology.java:403) at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:1029) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:4741) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:1487) at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode(DatanodeProtocolServerSideTranslatorPB.java:97) at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:33709) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:976) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:847) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:790) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1686) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2458){quote} Opt B is very understandable and quick fix by HDFS-9198 then BR RPC do not occupy CallQueue for long time anymore. On test env building by dynamometer, with 40K nodes, 1.5B inodes+blocks, NameNode restart can finished under 1.5H. However, another issue that mentioned above, there are still about 10min that service rpc CallQueue' load not decrease after NameNode safemode leave since BlockReport do not process completely. > blockreport storm slow down namenode restart seriously in large cluster > --- > > Key: HDFS-14186 > URL: https://issues.apache.org/jira/browse/HDFS-14186 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 2.7.1 >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-14186.001.patch > > > In the current implementation, the datanode sends blockreport immediately > after register to namenode successfully when restart, and the blockreport > storm will make namenode high load to process them. One result is some > received RPC have to skip because queue time is timeout. If some datanodes' > heartbeat RPC are continually skipped for long times (default is > heartbeatExpireInterval=630s) it will be set DEAD, then datanode has to > re-register and send blockreport again, aggravate blockreport storm and trap > in a vicious circle, and slow down (more than one hour and even more) > namenode startup seriously in a large (several thousands of datanodes) and > busy cluster especially. Although there are many work to optimize namenode > startup, the issue still exists. > I propose to postpone dead datanode check when namenode have finished startup. > Any comments and suggestions are welcome. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14186) blockreport storm slow down namenode restart seriously in large cluster
[ https://issues.apache.org/jira/browse/HDFS-14186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16750718#comment-16750718 ] He Xiaoqiao commented on HDFS-14186: Thanks [~daryn],[~kihwal] for your help, this issue is based on 2.7.1 and no async BR processing, I am just testing Patch HDFS-9198, I will report result timely when finish testing. > blockreport storm slow down namenode restart seriously in large cluster > --- > > Key: HDFS-14186 > URL: https://issues.apache.org/jira/browse/HDFS-14186 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 2.7.1 >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-14186.001.patch > > > In the current implementation, the datanode sends blockreport immediately > after register to namenode successfully when restart, and the blockreport > storm will make namenode high load to process them. One result is some > received RPC have to skip because queue time is timeout. If some datanodes' > heartbeat RPC are continually skipped for long times (default is > heartbeatExpireInterval=630s) it will be set DEAD, then datanode has to > re-register and send blockreport again, aggravate blockreport storm and trap > in a vicious circle, and slow down (more than one hour and even more) > namenode startup seriously in a large (several thousands of datanodes) and > busy cluster especially. Although there are many work to optimize namenode > startup, the issue still exists. > I propose to postpone dead datanode check when namenode have finished startup. > Any comments and suggestions are welcome. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14186) blockreport storm slow down namenode restart seriously in large cluster
[ https://issues.apache.org/jira/browse/HDFS-14186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Xiaoqiao updated HDFS-14186: --- Affects Version/s: 2.7.1 > blockreport storm slow down namenode restart seriously in large cluster > --- > > Key: HDFS-14186 > URL: https://issues.apache.org/jira/browse/HDFS-14186 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 2.7.1 >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-14186.001.patch > > > In the current implementation, the datanode sends blockreport immediately > after register to namenode successfully when restart, and the blockreport > storm will make namenode high load to process them. One result is some > received RPC have to skip because queue time is timeout. If some datanodes' > heartbeat RPC are continually skipped for long times (default is > heartbeatExpireInterval=630s) it will be set DEAD, then datanode has to > re-register and send blockreport again, aggravate blockreport storm and trap > in a vicious circle, and slow down (more than one hour and even more) > namenode startup seriously in a large (several thousands of datanodes) and > busy cluster especially. Although there are many work to optimize namenode > startup, the issue still exists. > I propose to postpone dead datanode check when namenode have finished startup. > Any comments and suggestions are welcome. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14211) [Consistent Observer Reads] Allow for configurable "always msync" mode
[ https://issues.apache.org/jira/browse/HDFS-14211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16745811#comment-16745811 ] He Xiaoqiao commented on HDFS-14211: [~xkrogen] Thanks for pushing this important phase forward. As description mentioned, a client will call #msync before every single read operation. Does that mean *every* read operation will be lead to request to Observer forever or just for 'Third-party Communication'? I think there should be a graceful strategy to decide how much request to ANN and the others to Observer? Since RPC request to Observer will brings extra latency based on how long to catch up state id, and IIUC it may reduce throughput when lots write operations but not reach ANN threshold (sorry for no benchmark numbers, just based on design docs and descriptions), FYI. > [Consistent Observer Reads] Allow for configurable "always msync" mode > -- > > Key: HDFS-14211 > URL: https://issues.apache.org/jira/browse/HDFS-14211 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Reporter: Erik Krogen >Priority: Major > > To allow for reads to be serviced from an ObserverNode (see HDFS-12943) in a > consistent way, an {{msync}} API was introduced (HDFS-13688) to allow for a > client to fetch the latest transaction ID from the Active NN, thereby > ensuring that subsequent reads from the ObserverNode will be up-to-date with > the current state of the Active. > Using this properly, however, requires application-side changes: for > examples, a NodeManager should call {{msync}} before localizing the resources > for a client, since it received notification of the existence of those > resources via communicate which is out-of-band to HDFS and thus could > potentially attempt to localize them prior to the availability of those > resources on the ObserverNode. > Until such application-side changes can be made, which will be a longer-term > effort, we need to provide a mechanism for unchanged clients to utilize the > ObserverNode without exposing such a client to inconsistencies. This is > essentially phase 3 of the roadmap outlined in the [design > document|https://issues.apache.org/jira/secure/attachment/12915990/ConsistentReadsFromStandbyNode.pdf] > for HDFS-12943. > The design document proposes some heuristics based on understanding of how > common applications (e.g. MR) use HDFS for resources. As an initial pass, we > can simply have a flag which tells a client to call {{msync}} before _every > single_ read operation. This may seem counterintuitive, as it turns every > read operation into two RPCs: {{msync}} to the Active following by an actual > read operation to the Observer. However, the {{msync}} operation is extremely > lightweight, as it does not acquire the {{FSNamesystemLock}}, and in > experiments we have found that this approach can easily scale to well over > 100,000 {{msync}} operations per second on the Active (while still servicing > approx. 10,000 write op/s). Combined with the fast-path edit log tailing for > standby/observer nodes (HDFS-13150), this "always msync" approach should > introduce only a few ms of extra latency to each read call. > Below are some experimental results collected from experiments which convert > a normal RPC workload into one in which all read operations are turned into > an {{msync}}. The baseline is a workload of 1.5k write op/s and 25k read op/s. > ||Rate Multiplier|2|4|6|8|| > ||RPC Queue Avg Time (ms)|14|53|110|125|| > ||RPC Queue NumOps Avg (k)|51|102|147|177|| > ||RPC Queue NumOps Max (k)|148|269|306|312|| > _(numbers are approximate and should be viewed primarily for their trends)_ > Results are promising up to between 4x and 6x of the baseline workload, which > is approx. 100-150k read op/s. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14186) blockreport storm slow down namenode restart seriously in large cluster
[ https://issues.apache.org/jira/browse/HDFS-14186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16743589#comment-16743589 ] He Xiaoqiao commented on HDFS-14186: [~elgoiri], thanks for your comments. {quote}In any case, I think that if the DN is not registered, it cannot be marked as DEAD.{quote} what I wanted to say was that lifeline could fix the case that datanode has registered when startup, but there is also another issue that datanode could not register successfully since namenode is overrun when namenode at startup progress especially in a large cluster. {quote}Ideally a stack trace of the thread that is holding the other requests.{quote} I will post stack trace soon. another side, I think namenode massive log about processing blockreport should be very telling. Thanks again. > blockreport storm slow down namenode restart seriously in large cluster > --- > > Key: HDFS-14186 > URL: https://issues.apache.org/jira/browse/HDFS-14186 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-14186.001.patch > > > In the current implementation, the datanode sends blockreport immediately > after register to namenode successfully when restart, and the blockreport > storm will make namenode high load to process them. One result is some > received RPC have to skip because queue time is timeout. If some datanodes' > heartbeat RPC are continually skipped for long times (default is > heartbeatExpireInterval=630s) it will be set DEAD, then datanode has to > re-register and send blockreport again, aggravate blockreport storm and trap > in a vicious circle, and slow down (more than one hour and even more) > namenode startup seriously in a large (several thousands of datanodes) and > busy cluster especially. Although there are many work to optimize namenode > startup, the issue still exists. > I propose to postpone dead datanode check when namenode have finished startup. > Any comments and suggestions are welcome. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13688) Introduce msync API call
[ https://issues.apache.org/jira/browse/HDFS-13688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16742910#comment-16742910 ] He Xiaoqiao commented on HDFS-13688: Thanks all for great work here. I am confused about the new API #msync, from the design docs it introduces an RPC call msync to ensure consistent read. IIUC application upon HDFS has to adapt the new API if open 'Consistent Read' feature, this changes involve complex works since there are more and more engine run on HDFS, I believe it is a gigantic project if all compute engines to match this change. So my question is any plan to restrain data consistent checking in DFSClient only? If I missed something or understood incorrectly please correct me. Thanks again. > Introduce msync API call > > > Key: HDFS-13688 > URL: https://issues.apache.org/jira/browse/HDFS-13688 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Chen Liang >Assignee: Chen Liang >Priority: Major > Fix For: HDFS-12943, 3.3.0 > > Attachments: HDFS-13688-HDFS-12943.001.patch, > HDFS-13688-HDFS-12943.002.patch, HDFS-13688-HDFS-12943.002.patch, > HDFS-13688-HDFS-12943.003.patch, HDFS-13688-HDFS-12943.004.patch, > HDFS-13688-HDFS-12943.005.patch, HDFS-13688-HDFS-12943.WIP.002.patch, > HDFS-13688-HDFS-12943.WIP.patch > > > As mentioned in the design doc in HDFS-12943, to ensure consistent read, we > need to introduce an RPC call {{msync}}. Specifically, client can issue a > msync call to Observer node along with a transactionID. The msync will only > return when the Observer's transactionID has caught up to the given ID. This > JIRA is to add this API. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14186) blockreport storm slow down namenode restart seriously in large cluster
[ https://issues.apache.org/jira/browse/HDFS-14186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16742816#comment-16742816 ] He Xiaoqiao commented on HDFS-14186: [~elgoiri] Thanks for your comments. sorry for missing something. just quick scan branch trunk again, IIUC #sendLifeline is lock free, and we can set a different port for the lifeline server, thus, DataNode may be not set DEAD when NameNode startup, but open lifeline feature is just effect after DataNode has registered successfully. As description above, if service port is overrun and some DataNode does not register successfully, lifeline could not fix it, FYI. Please correct me if there are something wrong. > blockreport storm slow down namenode restart seriously in large cluster > --- > > Key: HDFS-14186 > URL: https://issues.apache.org/jira/browse/HDFS-14186 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-14186.001.patch > > > In the current implementation, the datanode sends blockreport immediately > after register to namenode successfully when restart, and the blockreport > storm will make namenode high load to process them. One result is some > received RPC have to skip because queue time is timeout. If some datanodes' > heartbeat RPC are continually skipped for long times (default is > heartbeatExpireInterval=630s) it will be set DEAD, then datanode has to > re-register and send blockreport again, aggravate blockreport storm and trap > in a vicious circle, and slow down (more than one hour and even more) > namenode startup seriously in a large (several thousands of datanodes) and > busy cluster especially. Although there are many work to optimize namenode > startup, the issue still exists. > I propose to postpone dead datanode check when namenode have finished startup. > Any comments and suggestions are welcome. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14186) blockreport storm slow down namenode restart seriously in large cluster
[ https://issues.apache.org/jira/browse/HDFS-14186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16742740#comment-16742740 ] He Xiaoqiao commented on HDFS-14186: some more information, when I try to set {{dfs.namenode.safemode.min.datanodes}} to num of slaves and it works as expectation, and no datanode is set DEAD and re-register and resend block report again. > blockreport storm slow down namenode restart seriously in large cluster > --- > > Key: HDFS-14186 > URL: https://issues.apache.org/jira/browse/HDFS-14186 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-14186.001.patch > > > In the current implementation, the datanode sends blockreport immediately > after register to namenode successfully when restart, and the blockreport > storm will make namenode high load to process them. One result is some > received RPC have to skip because queue time is timeout. If some datanodes' > heartbeat RPC are continually skipped for long times (default is > heartbeatExpireInterval=630s) it will be set DEAD, then datanode has to > re-register and send blockreport again, aggravate blockreport storm and trap > in a vicious circle, and slow down (more than one hour and even more) > namenode startup seriously in a large (several thousands of datanodes) and > busy cluster especially. Although there are many work to optimize namenode > startup, the issue still exists. > I propose to postpone dead datanode check when namenode have finished startup. > Any comments and suggestions are welcome. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14186) blockreport storm slow down namenode restart seriously in large cluster
[ https://issues.apache.org/jira/browse/HDFS-14186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16742738#comment-16742738 ] He Xiaoqiao commented on HDFS-14186: Thanks further discussing. I would like to answer some doubts raised above. To [~kihwal], {quote}one thing to note is that the rpc processing time can be misleading in this case. {quote} >From namenode sample log as following. {quote}2019-01-14 22:32:35,383 INFO BlockStateChange: BLOCK* processReport: from storage DS-dd5c0397-3fcd-43fb-a71b-1eef6a2307f1 node DatanodeRegistration(datanodeip:50010, datanodeUuid=$datanodeuuid, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-57;cid=$clusterud;nsid=$nsid;c=0), blocks: 11847, hasStaleStorage: true, processing time: 15 msecs {quote} The processing time is from namenode log rather than rpc processing time metrics, and it is exact I believe. {quote}In 2.7 days, we ended up configuring datanodes breaking up block reports unconditionally and that helped NN startup performance. {quote} 30K~40K blocks at per datanode and taking less than 60ms average for processing per block report, above 15K slaves overall. I do not split block report per storage since the number blocks of datanode is not large enough. as mentioned above, average 30K~40K per datanode, it is not necessary to split any more. The item 'dfs.blockreport.split.threshold' of configuration looks work well using 2.7.1 based on tracing code, please correct If I missing something. {quote}we can have NN check whether all storage reports are received from all registered nodes. {quote} It is good suggestion. however, it is hard to collect all storage of the whole cluster when namenode startup, If only using registered nodes, maybe this issue could be not resolved completely since there may be some unregister datanode continue to report and the load of namenode could not release. to [~elgoiri] {quote}This is caused by namenode getting overwhelmed. Besides, the lifeline rpc will use the same service rpc port whose queue is constantly overrun in this case. For the lifeline server, one can set a different port so it should have a different RPC queue altogether, right? {quote} Thanks [~kihwal]'s detailed explain, another side, lifeline have unobviously effect when namenode startup due to the global lock of namenode and processing block report holds write lock, all register/report rpc have to queue and process one by one. In one word, NameNode have no remaining time and resource to process block report storm even if RPC can enqueue. > blockreport storm slow down namenode restart seriously in large cluster > --- > > Key: HDFS-14186 > URL: https://issues.apache.org/jira/browse/HDFS-14186 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-14186.001.patch > > > In the current implementation, the datanode sends blockreport immediately > after register to namenode successfully when restart, and the blockreport > storm will make namenode high load to process them. One result is some > received RPC have to skip because queue time is timeout. If some datanodes' > heartbeat RPC are continually skipped for long times (default is > heartbeatExpireInterval=630s) it will be set DEAD, then datanode has to > re-register and send blockreport again, aggravate blockreport storm and trap > in a vicious circle, and slow down (more than one hour and even more) > namenode startup seriously in a large (several thousands of datanodes) and > busy cluster especially. Although there are many work to optimize namenode > startup, the issue still exists. > I propose to postpone dead datanode check when namenode have finished startup. > Any comments and suggestions are welcome. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13473) DataNode update BlockKeys using mode PULL rather than PUSH from NameNode
[ https://issues.apache.org/jira/browse/HDFS-13473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Xiaoqiao updated HDFS-13473: --- Attachment: HDFS-13473-trunk.007.patch > DataNode update BlockKeys using mode PULL rather than PUSH from NameNode > > > Key: HDFS-13473 > URL: https://issues.apache.org/jira/browse/HDFS-13473 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-13473-trunk.001.patch, HDFS-13473-trunk.002.patch, > HDFS-13473-trunk.003.patch, HDFS-13473-trunk.004.patch, > HDFS-13473-trunk.005.patch, HDFS-13473-trunk.006.patch, > HDFS-13473-trunk.007.patch > > > It is passive behavior about updating Block keys for DataNode currently, and > it depends on if NameNode return #KeyUpdateCommand for heartbeat response. > There are several problems of this Block keys synchronization mode: > a. NameNode can't be sensed about if Block Keys reach DataNode successfully, > b. It is also not sensed for DataNode who meets some exception while receive > or process heartbeat response which include BlockKeyCommand, > such as HDFS-13441 and HDFS-12749 mentioned. > So I propose improve Push Block Keys from NameNode for DataNode to DataNode > Pull Block Keys. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13473) DataNode update BlockKeys using mode PULL rather than PUSH from NameNode
[ https://issues.apache.org/jira/browse/HDFS-13473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16741743#comment-16741743 ] He Xiaoqiao commented on HDFS-13473: v007 fix checkstyle, check fail unit test and it passed at local machine, I think it is not related to this patch. > DataNode update BlockKeys using mode PULL rather than PUSH from NameNode > > > Key: HDFS-13473 > URL: https://issues.apache.org/jira/browse/HDFS-13473 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-13473-trunk.001.patch, HDFS-13473-trunk.002.patch, > HDFS-13473-trunk.003.patch, HDFS-13473-trunk.004.patch, > HDFS-13473-trunk.005.patch, HDFS-13473-trunk.006.patch, > HDFS-13473-trunk.007.patch > > > It is passive behavior about updating Block keys for DataNode currently, and > it depends on if NameNode return #KeyUpdateCommand for heartbeat response. > There are several problems of this Block keys synchronization mode: > a. NameNode can't be sensed about if Block Keys reach DataNode successfully, > b. It is also not sensed for DataNode who meets some exception while receive > or process heartbeat response which include BlockKeyCommand, > such as HDFS-13441 and HDFS-12749 mentioned. > So I propose improve Push Block Keys from NameNode for DataNode to DataNode > Pull Block Keys. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13473) DataNode update BlockKeys using mode PULL rather than PUSH from NameNode
[ https://issues.apache.org/jira/browse/HDFS-13473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16741584#comment-16741584 ] He Xiaoqiao commented on HDFS-13473: [^HDFS-13473-trunk.006.patch] rebase branch trunk and trigger jenkins again. > DataNode update BlockKeys using mode PULL rather than PUSH from NameNode > > > Key: HDFS-13473 > URL: https://issues.apache.org/jira/browse/HDFS-13473 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-13473-trunk.001.patch, HDFS-13473-trunk.002.patch, > HDFS-13473-trunk.003.patch, HDFS-13473-trunk.004.patch, > HDFS-13473-trunk.005.patch, HDFS-13473-trunk.006.patch > > > It is passive behavior about updating Block keys for DataNode currently, and > it depends on if NameNode return #KeyUpdateCommand for heartbeat response. > There are several problems of this Block keys synchronization mode: > a. NameNode can't be sensed about if Block Keys reach DataNode successfully, > b. It is also not sensed for DataNode who meets some exception while receive > or process heartbeat response which include BlockKeyCommand, > such as HDFS-13441 and HDFS-12749 mentioned. > So I propose improve Push Block Keys from NameNode for DataNode to DataNode > Pull Block Keys. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13473) DataNode update BlockKeys using mode PULL rather than PUSH from NameNode
[ https://issues.apache.org/jira/browse/HDFS-13473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Xiaoqiao updated HDFS-13473: --- Attachment: HDFS-13473-trunk.006.patch > DataNode update BlockKeys using mode PULL rather than PUSH from NameNode > > > Key: HDFS-13473 > URL: https://issues.apache.org/jira/browse/HDFS-13473 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-13473-trunk.001.patch, HDFS-13473-trunk.002.patch, > HDFS-13473-trunk.003.patch, HDFS-13473-trunk.004.patch, > HDFS-13473-trunk.005.patch, HDFS-13473-trunk.006.patch > > > It is passive behavior about updating Block keys for DataNode currently, and > it depends on if NameNode return #KeyUpdateCommand for heartbeat response. > There are several problems of this Block keys synchronization mode: > a. NameNode can't be sensed about if Block Keys reach DataNode successfully, > b. It is also not sensed for DataNode who meets some exception while receive > or process heartbeat response which include BlockKeyCommand, > such as HDFS-13441 and HDFS-12749 mentioned. > So I propose improve Push Block Keys from NameNode for DataNode to DataNode > Pull Block Keys. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org