[jira] [Commented] (HDFS-14437) Exception happened when rollEditLog expects empty EditsDoubleBuffer.bufCurrent but not
[ https://issues.apache.org/jira/browse/HDFS-14437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823750#comment-16823750 ] star commented on HDFS-14437: - ok, Later I will take a view. > Exception happened when rollEditLog expects empty > EditsDoubleBuffer.bufCurrent but not > - > > Key: HDFS-14437 > URL: https://issues.apache.org/jira/browse/HDFS-14437 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, namenode, qjm >Reporter: angerszhu >Priority: Major > > For the problem mentioned in https://issues.apache.org/jira/browse/HDFS-10943 > , I have sort the process of write and flush EditLog and some important > function, I found the in the class FSEditLog class, the close() function > will call such process like below: > > {code:java} > waitForSyncToFinish(); > endCurrentLogSegment(true);{code} > since we have gain the object lock in the function close(), so when > waitForSyncToFish() method return, it mean all logSync job has done and all > data in bufReady has been flushed out, and since current thread has the lock > of this object, when call endCurrentLogSegment(), no other thread will gain > the lock so they can't write new editlog into currentBuf. > But when we don't call waitForSyncToFish() before endCurrentLogSegment(), > there may be some autoScheduled logSync()'s flush process is doing, since > this process don't need > synchronization since it has mention in the comment of logSync() method : > > {code:java} > /** > * Sync all modifications done by this thread. > * > * The internal concurrency design of this class is as follows: > * - Log items are written synchronized into an in-memory buffer, > * and each assigned a transaction ID. > * - When a thread (client) would like to sync all of its edits, logSync() > * uses a ThreadLocal transaction ID to determine what edit number must > * be synced to. > * - The isSyncRunning volatile boolean tracks whether a sync is currently > * under progress. > * > * The data is double-buffered within each edit log implementation so that > * in-memory writing can occur in parallel with the on-disk writing. > * > * Each sync occurs in three steps: > * 1. synchronized, it swaps the double buffer and sets the isSyncRunning > * flag. > * 2. unsynchronized, it flushes the data to storage > * 3. synchronized, it resets the flag and notifies anyone waiting on the > * sync. > * > * The lack of synchronization on step 2 allows other threads to continue > * to write into the memory buffer while the sync is in progress. > * Because this step is unsynchronized, actions that need to avoid > * concurrency with sync() should be synchronized and also call > * waitForSyncToFinish() before assuming they are running alone. > */ > public void logSync() { > long syncStart = 0; > // Fetch the transactionId of this thread. > long mytxid = myTransactionId.get().txid; > > boolean sync = false; > try { > EditLogOutputStream logStream = null; > synchronized (this) { > try { > printStatistics(false); > // if somebody is already syncing, then wait > while (mytxid > synctxid && isSyncRunning) { > try { > wait(1000); > } catch (InterruptedException ie) { > } > } > // > // If this transaction was already flushed, then nothing to do > // > if (mytxid <= synctxid) { > numTransactionsBatchedInSync++; > if (metrics != null) { > // Metrics is non-null only when used inside name node > metrics.incrTransactionsBatchedInSync(); > } > return; > } > > // now, this thread will do the sync > syncStart = txid; > isSyncRunning = true; > sync = true; > // swap buffers > try { > if (journalSet.isEmpty()) { > throw new IOException("No journals available to flush"); > } > editLogStream.setReadyToFlush(); > } catch (IOException e) { > final String msg = > "Could not sync enough journals to persistent storage " + > "due to " + e.getMessage() + ". " + > "Unsynced transactions: " + (txid - synctxid); > LOG.fatal(msg, new Exception()); > synchronized(journalSetLock) { > IOUtils.cleanup(LOG, journalSet); > } > terminate(1, msg); > } > } finally { > // Prevent RuntimeException from blocking other log edit write > doneWithAutoSyncScheduling(); > } > //editLogStream may become null, > //so store a local variable for flush. >
[jira] [Comment Edited] (HDFS-14437) Exception happened when rollEditLog expects empty EditsDoubleBuffer.bufCurrent but not
[ https://issues.apache.org/jira/browse/HDFS-14437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823714#comment-16823714 ] star edited comment on HDFS-14437 at 4/23/19 6:55 AM: -- [~angerszhuuu], I think I've got that. ||Thread0||thread1||Thread2||locked||isSyncRunning|| | |flush| |false|true| |endCurrentSegment#{color:#33}logSyncAll{color}| | |true|true| |wait| | |false|true| | |flush done| |false|false| | | |*logEdit*|true|false| |{color:#d04437}finalize error{color}| | |true|false| was (Author: starphin): [~angerszhuuu], I think I've got that. ||Thread0||thread1||Thread2||locked||isSyncRunning|| | |flush| |false|true| |endCurrentSegment#{color:#33}logSyncAll{color}| | |true|true| |wait| | |false|true| | |flush done| |false|false| | | |*logAppend*|true|false| |{color:#d04437}finalize error{color}| | |true|false| > Exception happened when rollEditLog expects empty > EditsDoubleBuffer.bufCurrent but not > - > > Key: HDFS-14437 > URL: https://issues.apache.org/jira/browse/HDFS-14437 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, namenode, qjm >Reporter: angerszhu >Priority: Major > > For the problem mentioned in https://issues.apache.org/jira/browse/HDFS-10943 > , I have sort the process of write and flush EditLog and some important > function, I found the in the class FSEditLog class, the close() function > will call such process like below: > > {code:java} > waitForSyncToFinish(); > endCurrentLogSegment(true);{code} > since we have gain the object lock in the function close(), so when > waitForSyncToFish() method return, it mean all logSync job has done and all > data in bufReady has been flushed out, and since current thread has the lock > of this object, when call endCurrentLogSegment(), no other thread will gain > the lock so they can't write new editlog into currentBuf. > But when we don't call waitForSyncToFish() before endCurrentLogSegment(), > there may be some autoScheduled logSync()'s flush process is doing, since > this process don't need > synchronization since it has mention in the comment of logSync() method : > > {code:java} > /** > * Sync all modifications done by this thread. > * > * The internal concurrency design of this class is as follows: > * - Log items are written synchronized into an in-memory buffer, > * and each assigned a transaction ID. > * - When a thread (client) would like to sync all of its edits, logSync() > * uses a ThreadLocal transaction ID to determine what edit number must > * be synced to. > * - The isSyncRunning volatile boolean tracks whether a sync is currently > * under progress. > * > * The data is double-buffered within each edit log implementation so that > * in-memory writing can occur in parallel with the on-disk writing. > * > * Each sync occurs in three steps: > * 1. synchronized, it swaps the double buffer and sets the isSyncRunning > * flag. > * 2. unsynchronized, it flushes the data to storage > * 3. synchronized, it resets the flag and notifies anyone waiting on the > * sync. > * > * The lack of synchronization on step 2 allows other threads to continue > * to write into the memory buffer while the sync is in progress. > * Because this step is unsynchronized, actions that need to avoid > * concurrency with sync() should be synchronized and also call > * waitForSyncToFinish() before assuming they are running alone. > */ > public void logSync() { > long syncStart = 0; > // Fetch the transactionId of this thread. > long mytxid = myTransactionId.get().txid; > > boolean sync = false; > try { > EditLogOutputStream logStream = null; > synchronized (this) { > try { > printStatistics(false); > // if somebody is already syncing, then wait > while (mytxid > synctxid && isSyncRunning) { > try { > wait(1000); > } catch (InterruptedException ie) { > } > } > // > // If this transaction was already flushed, then nothing to do > // > if (mytxid <= synctxid) { > numTransactionsBatchedInSync++; > if (metrics != null) { > // Metrics is non-null only when used inside name node > metrics.incrTransactionsBatchedInSync(); > } > return; > } > > // now, this thread will do the sync > syncStart = txid; > isSyncRunning = true; > sync = true; > // swap buffers > try { > if (journalSet.isEmpty()) { > throw new IOException("No journals available to flush"); > } > editLogStream.setReadyToFlus
[jira] [Commented] (HDFS-14437) Exception happened when rollEditLog expects empty EditsDoubleBuffer.bufCurrent but not
[ https://issues.apache.org/jira/browse/HDFS-14437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823746#comment-16823746 ] angerszhu commented on HDFS-14437: -- [~starphin] You can see my pull request, two way both can solve this problem , but I advise the way in pull request. > Exception happened when rollEditLog expects empty > EditsDoubleBuffer.bufCurrent but not > - > > Key: HDFS-14437 > URL: https://issues.apache.org/jira/browse/HDFS-14437 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, namenode, qjm >Reporter: angerszhu >Priority: Major > > For the problem mentioned in https://issues.apache.org/jira/browse/HDFS-10943 > , I have sort the process of write and flush EditLog and some important > function, I found the in the class FSEditLog class, the close() function > will call such process like below: > > {code:java} > waitForSyncToFinish(); > endCurrentLogSegment(true);{code} > since we have gain the object lock in the function close(), so when > waitForSyncToFish() method return, it mean all logSync job has done and all > data in bufReady has been flushed out, and since current thread has the lock > of this object, when call endCurrentLogSegment(), no other thread will gain > the lock so they can't write new editlog into currentBuf. > But when we don't call waitForSyncToFish() before endCurrentLogSegment(), > there may be some autoScheduled logSync()'s flush process is doing, since > this process don't need > synchronization since it has mention in the comment of logSync() method : > > {code:java} > /** > * Sync all modifications done by this thread. > * > * The internal concurrency design of this class is as follows: > * - Log items are written synchronized into an in-memory buffer, > * and each assigned a transaction ID. > * - When a thread (client) would like to sync all of its edits, logSync() > * uses a ThreadLocal transaction ID to determine what edit number must > * be synced to. > * - The isSyncRunning volatile boolean tracks whether a sync is currently > * under progress. > * > * The data is double-buffered within each edit log implementation so that > * in-memory writing can occur in parallel with the on-disk writing. > * > * Each sync occurs in three steps: > * 1. synchronized, it swaps the double buffer and sets the isSyncRunning > * flag. > * 2. unsynchronized, it flushes the data to storage > * 3. synchronized, it resets the flag and notifies anyone waiting on the > * sync. > * > * The lack of synchronization on step 2 allows other threads to continue > * to write into the memory buffer while the sync is in progress. > * Because this step is unsynchronized, actions that need to avoid > * concurrency with sync() should be synchronized and also call > * waitForSyncToFinish() before assuming they are running alone. > */ > public void logSync() { > long syncStart = 0; > // Fetch the transactionId of this thread. > long mytxid = myTransactionId.get().txid; > > boolean sync = false; > try { > EditLogOutputStream logStream = null; > synchronized (this) { > try { > printStatistics(false); > // if somebody is already syncing, then wait > while (mytxid > synctxid && isSyncRunning) { > try { > wait(1000); > } catch (InterruptedException ie) { > } > } > // > // If this transaction was already flushed, then nothing to do > // > if (mytxid <= synctxid) { > numTransactionsBatchedInSync++; > if (metrics != null) { > // Metrics is non-null only when used inside name node > metrics.incrTransactionsBatchedInSync(); > } > return; > } > > // now, this thread will do the sync > syncStart = txid; > isSyncRunning = true; > sync = true; > // swap buffers > try { > if (journalSet.isEmpty()) { > throw new IOException("No journals available to flush"); > } > editLogStream.setReadyToFlush(); > } catch (IOException e) { > final String msg = > "Could not sync enough journals to persistent storage " + > "due to " + e.getMessage() + ". " + > "Unsynced transactions: " + (txid - synctxid); > LOG.fatal(msg, new Exception()); > synchronized(journalSetLock) { > IOUtils.cleanup(LOG, journalSet); > } > terminate(1, msg); > } > } finally { > // Prevent RuntimeException from blocking other log edit write > doneWithAutoSyncScheduling(); >
[jira] [Commented] (HDFS-14437) Exception happened when rollEditLog expects empty EditsDoubleBuffer.bufCurrent but not
[ https://issues.apache.org/jira/browse/HDFS-14437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823742#comment-16823742 ] star commented on HDFS-14437: - [~angerszhuuu] Right. lock state fixed. {quote}But logAppend also need lock. {quote} > Exception happened when rollEditLog expects empty > EditsDoubleBuffer.bufCurrent but not > - > > Key: HDFS-14437 > URL: https://issues.apache.org/jira/browse/HDFS-14437 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, namenode, qjm >Reporter: angerszhu >Priority: Major > > For the problem mentioned in https://issues.apache.org/jira/browse/HDFS-10943 > , I have sort the process of write and flush EditLog and some important > function, I found the in the class FSEditLog class, the close() function > will call such process like below: > > {code:java} > waitForSyncToFinish(); > endCurrentLogSegment(true);{code} > since we have gain the object lock in the function close(), so when > waitForSyncToFish() method return, it mean all logSync job has done and all > data in bufReady has been flushed out, and since current thread has the lock > of this object, when call endCurrentLogSegment(), no other thread will gain > the lock so they can't write new editlog into currentBuf. > But when we don't call waitForSyncToFish() before endCurrentLogSegment(), > there may be some autoScheduled logSync()'s flush process is doing, since > this process don't need > synchronization since it has mention in the comment of logSync() method : > > {code:java} > /** > * Sync all modifications done by this thread. > * > * The internal concurrency design of this class is as follows: > * - Log items are written synchronized into an in-memory buffer, > * and each assigned a transaction ID. > * - When a thread (client) would like to sync all of its edits, logSync() > * uses a ThreadLocal transaction ID to determine what edit number must > * be synced to. > * - The isSyncRunning volatile boolean tracks whether a sync is currently > * under progress. > * > * The data is double-buffered within each edit log implementation so that > * in-memory writing can occur in parallel with the on-disk writing. > * > * Each sync occurs in three steps: > * 1. synchronized, it swaps the double buffer and sets the isSyncRunning > * flag. > * 2. unsynchronized, it flushes the data to storage > * 3. synchronized, it resets the flag and notifies anyone waiting on the > * sync. > * > * The lack of synchronization on step 2 allows other threads to continue > * to write into the memory buffer while the sync is in progress. > * Because this step is unsynchronized, actions that need to avoid > * concurrency with sync() should be synchronized and also call > * waitForSyncToFinish() before assuming they are running alone. > */ > public void logSync() { > long syncStart = 0; > // Fetch the transactionId of this thread. > long mytxid = myTransactionId.get().txid; > > boolean sync = false; > try { > EditLogOutputStream logStream = null; > synchronized (this) { > try { > printStatistics(false); > // if somebody is already syncing, then wait > while (mytxid > synctxid && isSyncRunning) { > try { > wait(1000); > } catch (InterruptedException ie) { > } > } > // > // If this transaction was already flushed, then nothing to do > // > if (mytxid <= synctxid) { > numTransactionsBatchedInSync++; > if (metrics != null) { > // Metrics is non-null only when used inside name node > metrics.incrTransactionsBatchedInSync(); > } > return; > } > > // now, this thread will do the sync > syncStart = txid; > isSyncRunning = true; > sync = true; > // swap buffers > try { > if (journalSet.isEmpty()) { > throw new IOException("No journals available to flush"); > } > editLogStream.setReadyToFlush(); > } catch (IOException e) { > final String msg = > "Could not sync enough journals to persistent storage " + > "due to " + e.getMessage() + ". " + > "Unsynced transactions: " + (txid - synctxid); > LOG.fatal(msg, new Exception()); > synchronized(journalSetLock) { > IOUtils.cleanup(LOG, journalSet); > } > terminate(1, msg); > } > } finally { > // Prevent RuntimeException from blocking other log edit write > doneWithAutoSyncScheduling(); > } > //editLogStream may become null,
[jira] [Comment Edited] (HDFS-14437) Exception happened when rollEditLog expects empty EditsDoubleBuffer.bufCurrent but not
[ https://issues.apache.org/jira/browse/HDFS-14437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823714#comment-16823714 ] star edited comment on HDFS-14437 at 4/23/19 6:38 AM: -- [~angerszhuuu], I think I've got that. ||Thread0||thread1||Thread2||locked||isSyncRunning|| | |flush| |false|true| |endCurrentSegment#{color:#33}logSyncAll{color}| | |true|true| |wait| | |false|true| | |flush done| |false|false| | | |*logAppend*|true|false| |{color:#d04437}finalize error{color}| | |true|false| was (Author: starphin): [~angerszhuuu], I think I've got that. ||Thread0||thread1||Thread2||locked||isSyncRunning|| | |flush| |false|true| |endCurrentSegment#{color:#33}logSyncAll{color}| | |true|true| |wait| | |false|true| | |flush done| |false|false| | | |*logAppend*|false|false| |{color:#d04437}finalize error{color}| | |true|false| > Exception happened when rollEditLog expects empty > EditsDoubleBuffer.bufCurrent but not > - > > Key: HDFS-14437 > URL: https://issues.apache.org/jira/browse/HDFS-14437 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, namenode, qjm >Reporter: angerszhu >Priority: Major > > For the problem mentioned in https://issues.apache.org/jira/browse/HDFS-10943 > , I have sort the process of write and flush EditLog and some important > function, I found the in the class FSEditLog class, the close() function > will call such process like below: > > {code:java} > waitForSyncToFinish(); > endCurrentLogSegment(true);{code} > since we have gain the object lock in the function close(), so when > waitForSyncToFish() method return, it mean all logSync job has done and all > data in bufReady has been flushed out, and since current thread has the lock > of this object, when call endCurrentLogSegment(), no other thread will gain > the lock so they can't write new editlog into currentBuf. > But when we don't call waitForSyncToFish() before endCurrentLogSegment(), > there may be some autoScheduled logSync()'s flush process is doing, since > this process don't need > synchronization since it has mention in the comment of logSync() method : > > {code:java} > /** > * Sync all modifications done by this thread. > * > * The internal concurrency design of this class is as follows: > * - Log items are written synchronized into an in-memory buffer, > * and each assigned a transaction ID. > * - When a thread (client) would like to sync all of its edits, logSync() > * uses a ThreadLocal transaction ID to determine what edit number must > * be synced to. > * - The isSyncRunning volatile boolean tracks whether a sync is currently > * under progress. > * > * The data is double-buffered within each edit log implementation so that > * in-memory writing can occur in parallel with the on-disk writing. > * > * Each sync occurs in three steps: > * 1. synchronized, it swaps the double buffer and sets the isSyncRunning > * flag. > * 2. unsynchronized, it flushes the data to storage > * 3. synchronized, it resets the flag and notifies anyone waiting on the > * sync. > * > * The lack of synchronization on step 2 allows other threads to continue > * to write into the memory buffer while the sync is in progress. > * Because this step is unsynchronized, actions that need to avoid > * concurrency with sync() should be synchronized and also call > * waitForSyncToFinish() before assuming they are running alone. > */ > public void logSync() { > long syncStart = 0; > // Fetch the transactionId of this thread. > long mytxid = myTransactionId.get().txid; > > boolean sync = false; > try { > EditLogOutputStream logStream = null; > synchronized (this) { > try { > printStatistics(false); > // if somebody is already syncing, then wait > while (mytxid > synctxid && isSyncRunning) { > try { > wait(1000); > } catch (InterruptedException ie) { > } > } > // > // If this transaction was already flushed, then nothing to do > // > if (mytxid <= synctxid) { > numTransactionsBatchedInSync++; > if (metrics != null) { > // Metrics is non-null only when used inside name node > metrics.incrTransactionsBatchedInSync(); > } > return; > } > > // now, this thread will do the sync > syncStart = txid; > isSyncRunning = true; > sync = true; > // swap buffers > try { > if (journalSet.isEmpty()) { > throw new IOException("No journals available to flush"); > } > editLogStream.setReadyTo
[jira] [Commented] (HDFS-14437) Exception happened when rollEditLog expects empty EditsDoubleBuffer.bufCurrent but not
[ https://issues.apache.org/jira/browse/HDFS-14437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823732#comment-16823732 ] angerszhu commented on HDFS-14437: -- [~starphin] But logAppend also need lock. What you show is just the situation I want to say. lol > Exception happened when rollEditLog expects empty > EditsDoubleBuffer.bufCurrent but not > - > > Key: HDFS-14437 > URL: https://issues.apache.org/jira/browse/HDFS-14437 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, namenode, qjm >Reporter: angerszhu >Priority: Major > > For the problem mentioned in https://issues.apache.org/jira/browse/HDFS-10943 > , I have sort the process of write and flush EditLog and some important > function, I found the in the class FSEditLog class, the close() function > will call such process like below: > > {code:java} > waitForSyncToFinish(); > endCurrentLogSegment(true);{code} > since we have gain the object lock in the function close(), so when > waitForSyncToFish() method return, it mean all logSync job has done and all > data in bufReady has been flushed out, and since current thread has the lock > of this object, when call endCurrentLogSegment(), no other thread will gain > the lock so they can't write new editlog into currentBuf. > But when we don't call waitForSyncToFish() before endCurrentLogSegment(), > there may be some autoScheduled logSync()'s flush process is doing, since > this process don't need > synchronization since it has mention in the comment of logSync() method : > > {code:java} > /** > * Sync all modifications done by this thread. > * > * The internal concurrency design of this class is as follows: > * - Log items are written synchronized into an in-memory buffer, > * and each assigned a transaction ID. > * - When a thread (client) would like to sync all of its edits, logSync() > * uses a ThreadLocal transaction ID to determine what edit number must > * be synced to. > * - The isSyncRunning volatile boolean tracks whether a sync is currently > * under progress. > * > * The data is double-buffered within each edit log implementation so that > * in-memory writing can occur in parallel with the on-disk writing. > * > * Each sync occurs in three steps: > * 1. synchronized, it swaps the double buffer and sets the isSyncRunning > * flag. > * 2. unsynchronized, it flushes the data to storage > * 3. synchronized, it resets the flag and notifies anyone waiting on the > * sync. > * > * The lack of synchronization on step 2 allows other threads to continue > * to write into the memory buffer while the sync is in progress. > * Because this step is unsynchronized, actions that need to avoid > * concurrency with sync() should be synchronized and also call > * waitForSyncToFinish() before assuming they are running alone. > */ > public void logSync() { > long syncStart = 0; > // Fetch the transactionId of this thread. > long mytxid = myTransactionId.get().txid; > > boolean sync = false; > try { > EditLogOutputStream logStream = null; > synchronized (this) { > try { > printStatistics(false); > // if somebody is already syncing, then wait > while (mytxid > synctxid && isSyncRunning) { > try { > wait(1000); > } catch (InterruptedException ie) { > } > } > // > // If this transaction was already flushed, then nothing to do > // > if (mytxid <= synctxid) { > numTransactionsBatchedInSync++; > if (metrics != null) { > // Metrics is non-null only when used inside name node > metrics.incrTransactionsBatchedInSync(); > } > return; > } > > // now, this thread will do the sync > syncStart = txid; > isSyncRunning = true; > sync = true; > // swap buffers > try { > if (journalSet.isEmpty()) { > throw new IOException("No journals available to flush"); > } > editLogStream.setReadyToFlush(); > } catch (IOException e) { > final String msg = > "Could not sync enough journals to persistent storage " + > "due to " + e.getMessage() + ". " + > "Unsynced transactions: " + (txid - synctxid); > LOG.fatal(msg, new Exception()); > synchronized(journalSetLock) { > IOUtils.cleanup(LOG, journalSet); > } > terminate(1, msg); > } > } finally { > // Prevent RuntimeException from blocking other log edit write > doneWithAutoSyncScheduling(); > } > //editL
[jira] [Commented] (HDFS-14437) Exception happened when rollEditLog expects empty EditsDoubleBuffer.bufCurrent but not
[ https://issues.apache.org/jira/browse/HDFS-14437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823726#comment-16823726 ] star commented on HDFS-14437: - locked is object lock of FSEditLog. > Exception happened when rollEditLog expects empty > EditsDoubleBuffer.bufCurrent but not > - > > Key: HDFS-14437 > URL: https://issues.apache.org/jira/browse/HDFS-14437 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, namenode, qjm >Reporter: angerszhu >Priority: Major > > For the problem mentioned in https://issues.apache.org/jira/browse/HDFS-10943 > , I have sort the process of write and flush EditLog and some important > function, I found the in the class FSEditLog class, the close() function > will call such process like below: > > {code:java} > waitForSyncToFinish(); > endCurrentLogSegment(true);{code} > since we have gain the object lock in the function close(), so when > waitForSyncToFish() method return, it mean all logSync job has done and all > data in bufReady has been flushed out, and since current thread has the lock > of this object, when call endCurrentLogSegment(), no other thread will gain > the lock so they can't write new editlog into currentBuf. > But when we don't call waitForSyncToFish() before endCurrentLogSegment(), > there may be some autoScheduled logSync()'s flush process is doing, since > this process don't need > synchronization since it has mention in the comment of logSync() method : > > {code:java} > /** > * Sync all modifications done by this thread. > * > * The internal concurrency design of this class is as follows: > * - Log items are written synchronized into an in-memory buffer, > * and each assigned a transaction ID. > * - When a thread (client) would like to sync all of its edits, logSync() > * uses a ThreadLocal transaction ID to determine what edit number must > * be synced to. > * - The isSyncRunning volatile boolean tracks whether a sync is currently > * under progress. > * > * The data is double-buffered within each edit log implementation so that > * in-memory writing can occur in parallel with the on-disk writing. > * > * Each sync occurs in three steps: > * 1. synchronized, it swaps the double buffer and sets the isSyncRunning > * flag. > * 2. unsynchronized, it flushes the data to storage > * 3. synchronized, it resets the flag and notifies anyone waiting on the > * sync. > * > * The lack of synchronization on step 2 allows other threads to continue > * to write into the memory buffer while the sync is in progress. > * Because this step is unsynchronized, actions that need to avoid > * concurrency with sync() should be synchronized and also call > * waitForSyncToFinish() before assuming they are running alone. > */ > public void logSync() { > long syncStart = 0; > // Fetch the transactionId of this thread. > long mytxid = myTransactionId.get().txid; > > boolean sync = false; > try { > EditLogOutputStream logStream = null; > synchronized (this) { > try { > printStatistics(false); > // if somebody is already syncing, then wait > while (mytxid > synctxid && isSyncRunning) { > try { > wait(1000); > } catch (InterruptedException ie) { > } > } > // > // If this transaction was already flushed, then nothing to do > // > if (mytxid <= synctxid) { > numTransactionsBatchedInSync++; > if (metrics != null) { > // Metrics is non-null only when used inside name node > metrics.incrTransactionsBatchedInSync(); > } > return; > } > > // now, this thread will do the sync > syncStart = txid; > isSyncRunning = true; > sync = true; > // swap buffers > try { > if (journalSet.isEmpty()) { > throw new IOException("No journals available to flush"); > } > editLogStream.setReadyToFlush(); > } catch (IOException e) { > final String msg = > "Could not sync enough journals to persistent storage " + > "due to " + e.getMessage() + ". " + > "Unsynced transactions: " + (txid - synctxid); > LOG.fatal(msg, new Exception()); > synchronized(journalSetLock) { > IOUtils.cleanup(LOG, journalSet); > } > terminate(1, msg); > } > } finally { > // Prevent RuntimeException from blocking other log edit write > doneWithAutoSyncScheduling(); > } > //editLogStream may become null, > //so store a local variable for flush. >
[jira] [Comment Edited] (HDFS-14437) Exception happened when rollEditLog expects empty EditsDoubleBuffer.bufCurrent but not
[ https://issues.apache.org/jira/browse/HDFS-14437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823726#comment-16823726 ] star edited comment on HDFS-14437 at 4/23/19 6:24 AM: -- locked is the state of object lock for FSEditLog. was (Author: starphin): locked is object lock of FSEditLog. > Exception happened when rollEditLog expects empty > EditsDoubleBuffer.bufCurrent but not > - > > Key: HDFS-14437 > URL: https://issues.apache.org/jira/browse/HDFS-14437 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, namenode, qjm >Reporter: angerszhu >Priority: Major > > For the problem mentioned in https://issues.apache.org/jira/browse/HDFS-10943 > , I have sort the process of write and flush EditLog and some important > function, I found the in the class FSEditLog class, the close() function > will call such process like below: > > {code:java} > waitForSyncToFinish(); > endCurrentLogSegment(true);{code} > since we have gain the object lock in the function close(), so when > waitForSyncToFish() method return, it mean all logSync job has done and all > data in bufReady has been flushed out, and since current thread has the lock > of this object, when call endCurrentLogSegment(), no other thread will gain > the lock so they can't write new editlog into currentBuf. > But when we don't call waitForSyncToFish() before endCurrentLogSegment(), > there may be some autoScheduled logSync()'s flush process is doing, since > this process don't need > synchronization since it has mention in the comment of logSync() method : > > {code:java} > /** > * Sync all modifications done by this thread. > * > * The internal concurrency design of this class is as follows: > * - Log items are written synchronized into an in-memory buffer, > * and each assigned a transaction ID. > * - When a thread (client) would like to sync all of its edits, logSync() > * uses a ThreadLocal transaction ID to determine what edit number must > * be synced to. > * - The isSyncRunning volatile boolean tracks whether a sync is currently > * under progress. > * > * The data is double-buffered within each edit log implementation so that > * in-memory writing can occur in parallel with the on-disk writing. > * > * Each sync occurs in three steps: > * 1. synchronized, it swaps the double buffer and sets the isSyncRunning > * flag. > * 2. unsynchronized, it flushes the data to storage > * 3. synchronized, it resets the flag and notifies anyone waiting on the > * sync. > * > * The lack of synchronization on step 2 allows other threads to continue > * to write into the memory buffer while the sync is in progress. > * Because this step is unsynchronized, actions that need to avoid > * concurrency with sync() should be synchronized and also call > * waitForSyncToFinish() before assuming they are running alone. > */ > public void logSync() { > long syncStart = 0; > // Fetch the transactionId of this thread. > long mytxid = myTransactionId.get().txid; > > boolean sync = false; > try { > EditLogOutputStream logStream = null; > synchronized (this) { > try { > printStatistics(false); > // if somebody is already syncing, then wait > while (mytxid > synctxid && isSyncRunning) { > try { > wait(1000); > } catch (InterruptedException ie) { > } > } > // > // If this transaction was already flushed, then nothing to do > // > if (mytxid <= synctxid) { > numTransactionsBatchedInSync++; > if (metrics != null) { > // Metrics is non-null only when used inside name node > metrics.incrTransactionsBatchedInSync(); > } > return; > } > > // now, this thread will do the sync > syncStart = txid; > isSyncRunning = true; > sync = true; > // swap buffers > try { > if (journalSet.isEmpty()) { > throw new IOException("No journals available to flush"); > } > editLogStream.setReadyToFlush(); > } catch (IOException e) { > final String msg = > "Could not sync enough journals to persistent storage " + > "due to " + e.getMessage() + ". " + > "Unsynced transactions: " + (txid - synctxid); > LOG.fatal(msg, new Exception()); > synchronized(journalSetLock) { > IOUtils.cleanup(LOG, journalSet); > } > terminate(1, msg); > } > } finally { > // Prevent RuntimeException from blocking other log edit write > done
[jira] [Comment Edited] (HDFS-14437) Exception happened when rollEditLog expects empty EditsDoubleBuffer.bufCurrent but not
[ https://issues.apache.org/jira/browse/HDFS-14437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823717#comment-16823717 ] angerszhu edited comment on HDFS-14437 at 4/23/19 6:17 AM: --- [~starphin] yeah , that is what i want to show , but in your table process, #locked means thread0 get lock, may make other confused. you can see my pull request. I think I should improve my skill of show technology logic was (Author: angerszhuuu): [~starphin] yeah , that is what i want to show . you can see my pull request. I think I should improve my skill of show technology logic > Exception happened when rollEditLog expects empty > EditsDoubleBuffer.bufCurrent but not > - > > Key: HDFS-14437 > URL: https://issues.apache.org/jira/browse/HDFS-14437 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, namenode, qjm >Reporter: angerszhu >Priority: Major > > For the problem mentioned in https://issues.apache.org/jira/browse/HDFS-10943 > , I have sort the process of write and flush EditLog and some important > function, I found the in the class FSEditLog class, the close() function > will call such process like below: > > {code:java} > waitForSyncToFinish(); > endCurrentLogSegment(true);{code} > since we have gain the object lock in the function close(), so when > waitForSyncToFish() method return, it mean all logSync job has done and all > data in bufReady has been flushed out, and since current thread has the lock > of this object, when call endCurrentLogSegment(), no other thread will gain > the lock so they can't write new editlog into currentBuf. > But when we don't call waitForSyncToFish() before endCurrentLogSegment(), > there may be some autoScheduled logSync()'s flush process is doing, since > this process don't need > synchronization since it has mention in the comment of logSync() method : > > {code:java} > /** > * Sync all modifications done by this thread. > * > * The internal concurrency design of this class is as follows: > * - Log items are written synchronized into an in-memory buffer, > * and each assigned a transaction ID. > * - When a thread (client) would like to sync all of its edits, logSync() > * uses a ThreadLocal transaction ID to determine what edit number must > * be synced to. > * - The isSyncRunning volatile boolean tracks whether a sync is currently > * under progress. > * > * The data is double-buffered within each edit log implementation so that > * in-memory writing can occur in parallel with the on-disk writing. > * > * Each sync occurs in three steps: > * 1. synchronized, it swaps the double buffer and sets the isSyncRunning > * flag. > * 2. unsynchronized, it flushes the data to storage > * 3. synchronized, it resets the flag and notifies anyone waiting on the > * sync. > * > * The lack of synchronization on step 2 allows other threads to continue > * to write into the memory buffer while the sync is in progress. > * Because this step is unsynchronized, actions that need to avoid > * concurrency with sync() should be synchronized and also call > * waitForSyncToFinish() before assuming they are running alone. > */ > public void logSync() { > long syncStart = 0; > // Fetch the transactionId of this thread. > long mytxid = myTransactionId.get().txid; > > boolean sync = false; > try { > EditLogOutputStream logStream = null; > synchronized (this) { > try { > printStatistics(false); > // if somebody is already syncing, then wait > while (mytxid > synctxid && isSyncRunning) { > try { > wait(1000); > } catch (InterruptedException ie) { > } > } > // > // If this transaction was already flushed, then nothing to do > // > if (mytxid <= synctxid) { > numTransactionsBatchedInSync++; > if (metrics != null) { > // Metrics is non-null only when used inside name node > metrics.incrTransactionsBatchedInSync(); > } > return; > } > > // now, this thread will do the sync > syncStart = txid; > isSyncRunning = true; > sync = true; > // swap buffers > try { > if (journalSet.isEmpty()) { > throw new IOException("No journals available to flush"); > } > editLogStream.setReadyToFlush(); > } catch (IOException e) { > final String msg = > "Could not sync enough journals to persistent storage " + > "due to " + e.getMessage() + ". " + > "Unsynced transactions: " + (txid - synctxid); >
[jira] [Commented] (HDFS-14437) Exception happened when rollEditLog expects empty EditsDoubleBuffer.bufCurrent but not
[ https://issues.apache.org/jira/browse/HDFS-14437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823717#comment-16823717 ] angerszhu commented on HDFS-14437: -- [~starphin] yeah , that is what i want to show . you can see my pull request. I think I should improve my skill of show technology logic > Exception happened when rollEditLog expects empty > EditsDoubleBuffer.bufCurrent but not > - > > Key: HDFS-14437 > URL: https://issues.apache.org/jira/browse/HDFS-14437 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, namenode, qjm >Reporter: angerszhu >Priority: Major > > For the problem mentioned in https://issues.apache.org/jira/browse/HDFS-10943 > , I have sort the process of write and flush EditLog and some important > function, I found the in the class FSEditLog class, the close() function > will call such process like below: > > {code:java} > waitForSyncToFinish(); > endCurrentLogSegment(true);{code} > since we have gain the object lock in the function close(), so when > waitForSyncToFish() method return, it mean all logSync job has done and all > data in bufReady has been flushed out, and since current thread has the lock > of this object, when call endCurrentLogSegment(), no other thread will gain > the lock so they can't write new editlog into currentBuf. > But when we don't call waitForSyncToFish() before endCurrentLogSegment(), > there may be some autoScheduled logSync()'s flush process is doing, since > this process don't need > synchronization since it has mention in the comment of logSync() method : > > {code:java} > /** > * Sync all modifications done by this thread. > * > * The internal concurrency design of this class is as follows: > * - Log items are written synchronized into an in-memory buffer, > * and each assigned a transaction ID. > * - When a thread (client) would like to sync all of its edits, logSync() > * uses a ThreadLocal transaction ID to determine what edit number must > * be synced to. > * - The isSyncRunning volatile boolean tracks whether a sync is currently > * under progress. > * > * The data is double-buffered within each edit log implementation so that > * in-memory writing can occur in parallel with the on-disk writing. > * > * Each sync occurs in three steps: > * 1. synchronized, it swaps the double buffer and sets the isSyncRunning > * flag. > * 2. unsynchronized, it flushes the data to storage > * 3. synchronized, it resets the flag and notifies anyone waiting on the > * sync. > * > * The lack of synchronization on step 2 allows other threads to continue > * to write into the memory buffer while the sync is in progress. > * Because this step is unsynchronized, actions that need to avoid > * concurrency with sync() should be synchronized and also call > * waitForSyncToFinish() before assuming they are running alone. > */ > public void logSync() { > long syncStart = 0; > // Fetch the transactionId of this thread. > long mytxid = myTransactionId.get().txid; > > boolean sync = false; > try { > EditLogOutputStream logStream = null; > synchronized (this) { > try { > printStatistics(false); > // if somebody is already syncing, then wait > while (mytxid > synctxid && isSyncRunning) { > try { > wait(1000); > } catch (InterruptedException ie) { > } > } > // > // If this transaction was already flushed, then nothing to do > // > if (mytxid <= synctxid) { > numTransactionsBatchedInSync++; > if (metrics != null) { > // Metrics is non-null only when used inside name node > metrics.incrTransactionsBatchedInSync(); > } > return; > } > > // now, this thread will do the sync > syncStart = txid; > isSyncRunning = true; > sync = true; > // swap buffers > try { > if (journalSet.isEmpty()) { > throw new IOException("No journals available to flush"); > } > editLogStream.setReadyToFlush(); > } catch (IOException e) { > final String msg = > "Could not sync enough journals to persistent storage " + > "due to " + e.getMessage() + ". " + > "Unsynced transactions: " + (txid - synctxid); > LOG.fatal(msg, new Exception()); > synchronized(journalSetLock) { > IOUtils.cleanup(LOG, journalSet); > } > terminate(1, msg); > } > } finally { > // Prevent RuntimeException from blocking other log edit write > doneWithAutoSy
[jira] [Commented] (HDFS-14437) Exception happened when rollEditLog expects empty EditsDoubleBuffer.bufCurrent but not
[ https://issues.apache.org/jira/browse/HDFS-14437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823714#comment-16823714 ] star commented on HDFS-14437: - [~angerszhuuu], I think I've got that. ||Thread0||thread1||Thread2||locked||isSyncRunning|| | |flush| |false|true| |endCurrentSegment#{color:#33}logSyncAll{color}| | |true|true| |wait| | |false|true| | |flush done| |false|false| | | |*logAppend*|false|false| |{color:#d04437}finalize error{color}| | |true|false| > Exception happened when rollEditLog expects empty > EditsDoubleBuffer.bufCurrent but not > - > > Key: HDFS-14437 > URL: https://issues.apache.org/jira/browse/HDFS-14437 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, namenode, qjm >Reporter: angerszhu >Priority: Major > > For the problem mentioned in https://issues.apache.org/jira/browse/HDFS-10943 > , I have sort the process of write and flush EditLog and some important > function, I found the in the class FSEditLog class, the close() function > will call such process like below: > > {code:java} > waitForSyncToFinish(); > endCurrentLogSegment(true);{code} > since we have gain the object lock in the function close(), so when > waitForSyncToFish() method return, it mean all logSync job has done and all > data in bufReady has been flushed out, and since current thread has the lock > of this object, when call endCurrentLogSegment(), no other thread will gain > the lock so they can't write new editlog into currentBuf. > But when we don't call waitForSyncToFish() before endCurrentLogSegment(), > there may be some autoScheduled logSync()'s flush process is doing, since > this process don't need > synchronization since it has mention in the comment of logSync() method : > > {code:java} > /** > * Sync all modifications done by this thread. > * > * The internal concurrency design of this class is as follows: > * - Log items are written synchronized into an in-memory buffer, > * and each assigned a transaction ID. > * - When a thread (client) would like to sync all of its edits, logSync() > * uses a ThreadLocal transaction ID to determine what edit number must > * be synced to. > * - The isSyncRunning volatile boolean tracks whether a sync is currently > * under progress. > * > * The data is double-buffered within each edit log implementation so that > * in-memory writing can occur in parallel with the on-disk writing. > * > * Each sync occurs in three steps: > * 1. synchronized, it swaps the double buffer and sets the isSyncRunning > * flag. > * 2. unsynchronized, it flushes the data to storage > * 3. synchronized, it resets the flag and notifies anyone waiting on the > * sync. > * > * The lack of synchronization on step 2 allows other threads to continue > * to write into the memory buffer while the sync is in progress. > * Because this step is unsynchronized, actions that need to avoid > * concurrency with sync() should be synchronized and also call > * waitForSyncToFinish() before assuming they are running alone. > */ > public void logSync() { > long syncStart = 0; > // Fetch the transactionId of this thread. > long mytxid = myTransactionId.get().txid; > > boolean sync = false; > try { > EditLogOutputStream logStream = null; > synchronized (this) { > try { > printStatistics(false); > // if somebody is already syncing, then wait > while (mytxid > synctxid && isSyncRunning) { > try { > wait(1000); > } catch (InterruptedException ie) { > } > } > // > // If this transaction was already flushed, then nothing to do > // > if (mytxid <= synctxid) { > numTransactionsBatchedInSync++; > if (metrics != null) { > // Metrics is non-null only when used inside name node > metrics.incrTransactionsBatchedInSync(); > } > return; > } > > // now, this thread will do the sync > syncStart = txid; > isSyncRunning = true; > sync = true; > // swap buffers > try { > if (journalSet.isEmpty()) { > throw new IOException("No journals available to flush"); > } > editLogStream.setReadyToFlush(); > } catch (IOException e) { > final String msg = > "Could not sync enough journals to persistent storage " + > "due to " + e.getMessage() + ". " + > "Unsynced transactions: " + (txid - synctxid); > LOG.fatal(msg, new Exception()); > synchronized(journalSetLock) { > IOUtils.cleanup(LOG, journalSet);
[jira] [Comment Edited] (HDFS-14437) Exception happened when rollEditLog expects empty EditsDoubleBuffer.bufCurrent but not
[ https://issues.apache.org/jira/browse/HDFS-14437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823669#comment-16823669 ] angerszhu edited comment on HDFS-14437 at 4/23/19 5:55 AM: --- [~hexiaoqiao] Since #logSync's step 2 can run without lock, if other thread is call #logSync because of bufCurrent is full and call autoSchedulerSync. when this thread run into step2[flush()], and at this moment, current thread call #rollEditLog You can see that when other thread is also run logsync, isSyncRunning == true.Current thread call #endCurrentLogSegment will go into while loop: {code:java} while (mytxid > synctxid && isSyncRunning) { try { wait(1000); } catch (InterruptedException ie) { } } {code} if other thread can't get lock, isSyncRunning won't be false, and synctxid won't change. the the current thread will always blocking in the while loop, this situation is not correct. My english is not very good . may have some mistake, if you need, send a mail to me and explain to you in Chinese. was (Author: angerszhuuu): [~hexiaoqiao] You can see that when other thread is also run logsync, isSyncRunning == true.Current thread call #endCurrentLogSegment will go into while loop: {code:java} while (mytxid > synctxid && isSyncRunning) { try { wait(1000); } catch (InterruptedException ie) { } } {code} if other thread can't get lock, isSyncRunning won't be false, and synctxid won't change. the the current thread will always blocking in the while loop, this situation is not correct. My english is not very good . may have some mistake, if you need, send a mail to me and explain to you in chinese. > Exception happened when rollEditLog expects empty > EditsDoubleBuffer.bufCurrent but not > - > > Key: HDFS-14437 > URL: https://issues.apache.org/jira/browse/HDFS-14437 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, namenode, qjm >Reporter: angerszhu >Priority: Major > > For the problem mentioned in https://issues.apache.org/jira/browse/HDFS-10943 > , I have sort the process of write and flush EditLog and some important > function, I found the in the class FSEditLog class, the close() function > will call such process like below: > > {code:java} > waitForSyncToFinish(); > endCurrentLogSegment(true);{code} > since we have gain the object lock in the function close(), so when > waitForSyncToFish() method return, it mean all logSync job has done and all > data in bufReady has been flushed out, and since current thread has the lock > of this object, when call endCurrentLogSegment(), no other thread will gain > the lock so they can't write new editlog into currentBuf. > But when we don't call waitForSyncToFish() before endCurrentLogSegment(), > there may be some autoScheduled logSync()'s flush process is doing, since > this process don't need > synchronization since it has mention in the comment of logSync() method : > > {code:java} > /** > * Sync all modifications done by this thread. > * > * The internal concurrency design of this class is as follows: > * - Log items are written synchronized into an in-memory buffer, > * and each assigned a transaction ID. > * - When a thread (client) would like to sync all of its edits, logSync() > * uses a ThreadLocal transaction ID to determine what edit number must > * be synced to. > * - The isSyncRunning volatile boolean tracks whether a sync is currently > * under progress. > * > * The data is double-buffered within each edit log implementation so that > * in-memory writing can occur in parallel with the on-disk writing. > * > * Each sync occurs in three steps: > * 1. synchronized, it swaps the double buffer and sets the isSyncRunning > * flag. > * 2. unsynchronized, it flushes the data to storage > * 3. synchronized, it resets the flag and notifies anyone waiting on the > * sync. > * > * The lack of synchronization on step 2 allows other threads to continue > * to write into the memory buffer while the sync is in progress. > * Because this step is unsynchronized, actions that need to avoid > * concurrency with sync() should be synchronized and also call > * waitForSyncToFinish() before assuming they are running alone. > */ > public void logSync() { > long syncStart = 0; > // Fetch the transactionId of this thread. > long mytxid = myTransactionId.get().txid; > > boolean sync = false; > try { > EditLogOutputStream logStream = null; > synchronized (this) { > try { > printStatistics(false); > // if somebody is already syncing, then wait > while (mytxid > synctxid && isSyncRunning) { > try { > wait(1000
[jira] [Commented] (HDDS-1448) RatisPipelineProvider should only consider open pipeline while excluding dn for pipeline allocation
[ https://issues.apache.org/jira/browse/HDDS-1448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823698#comment-16823698 ] Lokesh Jain commented on HDDS-1448: --- The changes required for this Jira would enable multiple three node pipelines in a datanode. It was implemented this way to make sure that a datanode is not part of more than one factor three pipeline. > RatisPipelineProvider should only consider open pipeline while excluding dn > for pipeline allocation > --- > > Key: HDDS-1448 > URL: https://issues.apache.org/jira/browse/HDDS-1448 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: SCM >Affects Versions: 0.3.0 >Reporter: Mukul Kumar Singh >Assignee: Aravindan Vijayan >Priority: Major > Labels: MiniOzoneChaosCluster > > While allocation pipelines, Ratis pipeline provider considers all the > pipelines irrespective of the state of the pipeline. This can lead to case > where all the datanodes are up but the pipelines are in closing state in SCM. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-1453) Fix unit test TestConfigurationFields broken on trunk
[ https://issues.apache.org/jira/browse/HDDS-1453?focusedWorklogId=231065&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-231065 ] ASF GitHub Bot logged work on HDDS-1453: Author: ASF GitHub Bot Created on: 23/Apr/19 05:37 Start Date: 23/Apr/19 05:37 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on issue #759: HDDS-1453. Fix unit test TestConfigurationFields broken on trunk. (swagle) URL: https://github.com/apache/hadoop/pull/759#issuecomment-485650589 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Comment | |::|--:|:|:| | 0 | reexec | 501 | Docker mode activated. | ||| _ Prechecks _ | | +1 | @author | 0 | The patch does not contain any @author tags. | | -1 | test4tests | 0 | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | ||| _ trunk Compile Tests _ | | +1 | mvninstall | 1043 | trunk passed | | +1 | compile | 48 | trunk passed | | +1 | checkstyle | 25 | trunk passed | | +1 | mvnsite | 42 | trunk passed | | +1 | shadedclient | 734 | branch has no errors when building and testing our client artifacts. | | +1 | findbugs | 70 | trunk passed | | +1 | javadoc | 42 | trunk passed | ||| _ Patch Compile Tests _ | | +1 | mvninstall | 42 | the patch passed | | +1 | compile | 33 | the patch passed | | +1 | javac | 33 | the patch passed | | +1 | checkstyle | 17 | the patch passed | | +1 | mvnsite | 35 | the patch passed | | +1 | whitespace | 0 | The patch has no whitespace issues. | | +1 | xml | 1 | The patch has no ill-formed XML file. | | +1 | shadedclient | 748 | patch has no errors when building and testing our client artifacts. | | +1 | findbugs | 75 | the patch passed | | +1 | javadoc | 39 | the patch passed | ||| _ Other Tests _ | | -1 | unit | 80 | common in the patch failed. | | +1 | asflicense | 30 | The patch does not generate ASF License warnings. | | | | 3685 | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdds.scm.net.TestNodeSchemaManager | | | hadoop.hdds.scm.net.TestNetworkTopologyImpl | | Subsystem | Report/Notes | |--:|:-| | Docker | Client=17.05.0-ce Server=17.05.0-ce base: https://builds.apache.org/job/hadoop-multibranch/job/PR-759/1/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/759 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml | | uname | Linux 77059421f0f1 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | personality/hadoop.sh | | git revision | trunk / f4ab937 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_191 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/hadoop-multibranch/job/PR-759/1/artifact/out/patch-unit-hadoop-hdds_common.txt | | Test Results | https://builds.apache.org/job/hadoop-multibranch/job/PR-759/1/testReport/ | | Max. process+thread count | 445 (vs. ulimit of 5500) | | modules | C: hadoop-hdds/common U: hadoop-hdds/common | | Console output | https://builds.apache.org/job/hadoop-multibranch/job/PR-759/1/console | | Powered by | Apache Yetus 0.9.0 http://yetus.apache.org | This message was automatically generated. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 231065) Time Spent: 20m (was: 10m) > Fix unit test TestConfigurationFields broken on trunk > - > > Key: HDDS-1453 > URL: https://issues.apache.org/jira/browse/HDDS-1453 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: SCM >Affects Versions: 0.5.0 >Reporter: Siddharth Wagle >Assignee: Siddharth Wagle >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Unit test failure:: > {code} > [INFO] Running org.apache.hadoop.ozone.TestOzoneConfigurationFields > [ERROR] Tests run: 4, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.772 > s <<< FAILURE! - in org.apache.ha
[jira] [Comment Edited] (HDFS-14426) RBF: Add delegation token total count as one of the federation metrics
[ https://issues.apache.org/jira/browse/HDFS-14426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823676#comment-16823676 ] Fengnan Li edited comment on HDFS-14426 at 4/23/19 5:26 AM: [Deleted some confusing comments] Once the change in https://issues.apache.org/jira/browse/HDFS-14374 is rebased, I will rebase my patch again. Also, I will also include changes in Namenode and KMS using a separate ticket since those can go to trunk. was (Author: fengnanli): [~crh] I think the commit https://issues.apache.org/jira/browse/HDFS-14374 is going to trunk. [~elgoiri] For the current ticket, since I am going to be based on CR's change in https://issues.apache.org/jira/browse/HDFS-14374 and merge it back to https://issues.apache.org/jira/browse/HDFS-13891, you probably need to cherry-pick that commit to HDFS-13891? Then I can rebase my patch again for this. Also, I will also include changes in Namenode and KMS using a separate ticket since those can go to trunk. > RBF: Add delegation token total count as one of the federation metrics > -- > > Key: HDFS-14426 > URL: https://issues.apache.org/jira/browse/HDFS-14426 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Fengnan Li >Assignee: Fengnan Li >Priority: Major > Attachments: HDFS-14426-HDFS-13891.001.patch, HDFS-14426.001.patch > > > Currently router doesn't report the total number of current valid delegation > tokens it has, but this piece of information is useful for monitoring and > understanding the real time situation of tokens. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14374) Expose total number of delegation tokens in AbstractDelegationTokenSecretManager
[ https://issues.apache.org/jira/browse/HDFS-14374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823677#comment-16823677 ] Fengnan Li commented on HDFS-14374: --- [~hexiaoqiao] I will actually take care of those jmx metrics. For router it is currently tracked in https://issues.apache.org/jira/browse/HDFS-14426 and Namenode/KMS will go to https://issues.apache.org/jira/browse/HDFS-14449 The first ticket will go to router branch and the latter will go to trunk. > Expose total number of delegation tokens in > AbstractDelegationTokenSecretManager > > > Key: HDFS-14374 > URL: https://issues.apache.org/jira/browse/HDFS-14374 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: CR Hota >Assignee: CR Hota >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-14374.001.patch, HDFS-14374.002.patch > > > AbstractDelegationTokenSecretManager should expose total number of active > delegation tokens for specific implementations to track for observability. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-14449) Expose total number of dt in jmx for KMS/Namenode
Fengnan Li created HDFS-14449: - Summary: Expose total number of dt in jmx for KMS/Namenode Key: HDFS-14449 URL: https://issues.apache.org/jira/browse/HDFS-14449 Project: Hadoop HDFS Issue Type: Improvement Reporter: Fengnan Li Assignee: Fengnan Li -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14426) RBF: Add delegation token total count as one of the federation metrics
[ https://issues.apache.org/jira/browse/HDFS-14426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823676#comment-16823676 ] Fengnan Li commented on HDFS-14426: --- [~crh] I think the commit https://issues.apache.org/jira/browse/HDFS-14374 is going to trunk. [~elgoiri] For the current ticket, since I am going to be based on CR's change in https://issues.apache.org/jira/browse/HDFS-14374 and merge it back to https://issues.apache.org/jira/browse/HDFS-13891, you probably need to cherry-pick that commit to HDFS-13891? Then I can rebase my patch again for this. Also, I will also include changes in Namenode and KMS using a separate ticket since those can go to trunk. > RBF: Add delegation token total count as one of the federation metrics > -- > > Key: HDFS-14426 > URL: https://issues.apache.org/jira/browse/HDFS-14426 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Fengnan Li >Assignee: Fengnan Li >Priority: Major > Attachments: HDFS-14426-HDFS-13891.001.patch, HDFS-14426.001.patch > > > Currently router doesn't report the total number of current valid delegation > tokens it has, but this piece of information is useful for monitoring and > understanding the real time situation of tokens. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14353) Erasure Coding: metrics xmitsInProgress become to negative.
[ https://issues.apache.org/jira/browse/HDFS-14353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823673#comment-16823673 ] Hadoop QA commented on HDFS-14353: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 33s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 47s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 4s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 52s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 37s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 7 unchanged - 0 fixed = 8 total (was 7) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 38s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 53s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}104m 50s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 40s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}166m 49s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.tools.TestDFSZKFailoverController | | | hadoop.hdfs.server.datanode.TestDataNodeErasureCodingMetrics | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e | | JIRA Issue | HDFS-14353 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12966674/HDFS-14353.003.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 9a838dcaec33 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / f4ab937 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_191 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/26684/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/26684/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/26684
[jira] [Commented] (HDFS-14437) Exception happened when rollEditLog expects empty EditsDoubleBuffer.bufCurrent but not
[ https://issues.apache.org/jira/browse/HDFS-14437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823669#comment-16823669 ] angerszhu commented on HDFS-14437: -- [~hexiaoqiao] You can see that when other thread is also run logsync, isSyncRunning == true.Current thread call #endCurrentLogSegment will go into while loop: {code:java} while (mytxid > synctxid && isSyncRunning) { try { wait(1000); } catch (InterruptedException ie) { } } {code} if other thread can't get lock, isSyncRunning won't be false, and synctxid won't change. the the current thread will always blocking in the while loop, this situation is not correct. My english is not very good . may have some mistake, if you need, send a mail to me and explain to you in chinese. > Exception happened when rollEditLog expects empty > EditsDoubleBuffer.bufCurrent but not > - > > Key: HDFS-14437 > URL: https://issues.apache.org/jira/browse/HDFS-14437 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, namenode, qjm >Reporter: angerszhu >Priority: Major > > For the problem mentioned in https://issues.apache.org/jira/browse/HDFS-10943 > , I have sort the process of write and flush EditLog and some important > function, I found the in the class FSEditLog class, the close() function > will call such process like below: > > {code:java} > waitForSyncToFinish(); > endCurrentLogSegment(true);{code} > since we have gain the object lock in the function close(), so when > waitForSyncToFish() method return, it mean all logSync job has done and all > data in bufReady has been flushed out, and since current thread has the lock > of this object, when call endCurrentLogSegment(), no other thread will gain > the lock so they can't write new editlog into currentBuf. > But when we don't call waitForSyncToFish() before endCurrentLogSegment(), > there may be some autoScheduled logSync()'s flush process is doing, since > this process don't need > synchronization since it has mention in the comment of logSync() method : > > {code:java} > /** > * Sync all modifications done by this thread. > * > * The internal concurrency design of this class is as follows: > * - Log items are written synchronized into an in-memory buffer, > * and each assigned a transaction ID. > * - When a thread (client) would like to sync all of its edits, logSync() > * uses a ThreadLocal transaction ID to determine what edit number must > * be synced to. > * - The isSyncRunning volatile boolean tracks whether a sync is currently > * under progress. > * > * The data is double-buffered within each edit log implementation so that > * in-memory writing can occur in parallel with the on-disk writing. > * > * Each sync occurs in three steps: > * 1. synchronized, it swaps the double buffer and sets the isSyncRunning > * flag. > * 2. unsynchronized, it flushes the data to storage > * 3. synchronized, it resets the flag and notifies anyone waiting on the > * sync. > * > * The lack of synchronization on step 2 allows other threads to continue > * to write into the memory buffer while the sync is in progress. > * Because this step is unsynchronized, actions that need to avoid > * concurrency with sync() should be synchronized and also call > * waitForSyncToFinish() before assuming they are running alone. > */ > public void logSync() { > long syncStart = 0; > // Fetch the transactionId of this thread. > long mytxid = myTransactionId.get().txid; > > boolean sync = false; > try { > EditLogOutputStream logStream = null; > synchronized (this) { > try { > printStatistics(false); > // if somebody is already syncing, then wait > while (mytxid > synctxid && isSyncRunning) { > try { > wait(1000); > } catch (InterruptedException ie) { > } > } > // > // If this transaction was already flushed, then nothing to do > // > if (mytxid <= synctxid) { > numTransactionsBatchedInSync++; > if (metrics != null) { > // Metrics is non-null only when used inside name node > metrics.incrTransactionsBatchedInSync(); > } > return; > } > > // now, this thread will do the sync > syncStart = txid; > isSyncRunning = true; > sync = true; > // swap buffers > try { > if (journalSet.isEmpty()) { > throw new IOException("No journals available to flush"); > } > editLogStream.setReadyToFlush(); > } catch (IOException e) { > final String msg = > "Could not sync enough jo
[jira] [Updated] (HDDS-1453) Fix unit test TestConfigurationFields broken on trunk
[ https://issues.apache.org/jira/browse/HDDS-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Wagle updated HDDS-1453: -- Status: Patch Available (was: Open) > Fix unit test TestConfigurationFields broken on trunk > - > > Key: HDDS-1453 > URL: https://issues.apache.org/jira/browse/HDDS-1453 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: SCM >Affects Versions: 0.5.0 >Reporter: Siddharth Wagle >Assignee: Siddharth Wagle >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 10m > Remaining Estimate: 0h > > Unit test failure:: > {code} > [INFO] Running org.apache.hadoop.ozone.TestOzoneConfigurationFields > [ERROR] Tests run: 4, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.772 > s <<< FAILURE! - in org.apache.hadoop.ozone.TestOzoneConfigurationFields > [ERROR] > testCompareConfigurationClassAgainstXml(org.apache.hadoop.ozone.TestOzoneConfigurationFields) > Time elapsed: 0.052 s <<< FAILURE! > java.lang.AssertionError: class org.apache.hadoop.ozone.OzoneConfigKeys class > org.apache.hadoop.hdds.scm.ScmConfigKeys class > org.apache.hadoop.ozone.om.OMConfigKeys class > org.apache.hadoop.hdds.HddsConfigKeys class > org.apache.hadoop.ozone.recon.ReconServerConfigKeys class > org.apache.hadoop.ozone.s3.S3GatewayConfigKeys has 1 variables missing in > ozone-default.xml Entries: ozone.scm.network.topology.schema.file.type > expected:<0> but was:<1> > at org.junit.Assert.fail(Assert.java:88) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-1453) Fix unit test TestConfigurationFields broken on trunk
[ https://issues.apache.org/jira/browse/HDDS-1453?focusedWorklogId=231045&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-231045 ] ASF GitHub Bot logged work on HDDS-1453: Author: ASF GitHub Bot Created on: 23/Apr/19 04:34 Start Date: 23/Apr/19 04:34 Worklog Time Spent: 10m Work Description: swagle commented on pull request #759: HDDS-1453. Fix unit test TestConfigurationFields broken on trunk. (swagle) URL: https://github.com/apache/hadoop/pull/759 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 231045) Time Spent: 10m Remaining Estimate: 0h > Fix unit test TestConfigurationFields broken on trunk > - > > Key: HDDS-1453 > URL: https://issues.apache.org/jira/browse/HDDS-1453 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: SCM >Affects Versions: 0.5.0 >Reporter: Siddharth Wagle >Assignee: Siddharth Wagle >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 10m > Remaining Estimate: 0h > > Unit test failure:: > {code} > [INFO] Running org.apache.hadoop.ozone.TestOzoneConfigurationFields > [ERROR] Tests run: 4, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.772 > s <<< FAILURE! - in org.apache.hadoop.ozone.TestOzoneConfigurationFields > [ERROR] > testCompareConfigurationClassAgainstXml(org.apache.hadoop.ozone.TestOzoneConfigurationFields) > Time elapsed: 0.052 s <<< FAILURE! > java.lang.AssertionError: class org.apache.hadoop.ozone.OzoneConfigKeys class > org.apache.hadoop.hdds.scm.ScmConfigKeys class > org.apache.hadoop.ozone.om.OMConfigKeys class > org.apache.hadoop.hdds.HddsConfigKeys class > org.apache.hadoop.ozone.recon.ReconServerConfigKeys class > org.apache.hadoop.ozone.s3.S3GatewayConfigKeys has 1 variables missing in > ozone-default.xml Entries: ozone.scm.network.topology.schema.file.type > expected:<0> but was:<1> > at org.junit.Assert.fail(Assert.java:88) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1453) Fix unit test TestConfigurationFields broken on trunk
[ https://issues.apache.org/jira/browse/HDDS-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-1453: - Labels: pull-request-available (was: ) > Fix unit test TestConfigurationFields broken on trunk > - > > Key: HDDS-1453 > URL: https://issues.apache.org/jira/browse/HDDS-1453 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: SCM >Affects Versions: 0.5.0 >Reporter: Siddharth Wagle >Assignee: Siddharth Wagle >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > > Unit test failure:: > {code} > [INFO] Running org.apache.hadoop.ozone.TestOzoneConfigurationFields > [ERROR] Tests run: 4, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.772 > s <<< FAILURE! - in org.apache.hadoop.ozone.TestOzoneConfigurationFields > [ERROR] > testCompareConfigurationClassAgainstXml(org.apache.hadoop.ozone.TestOzoneConfigurationFields) > Time elapsed: 0.052 s <<< FAILURE! > java.lang.AssertionError: class org.apache.hadoop.ozone.OzoneConfigKeys class > org.apache.hadoop.hdds.scm.ScmConfigKeys class > org.apache.hadoop.ozone.om.OMConfigKeys class > org.apache.hadoop.hdds.HddsConfigKeys class > org.apache.hadoop.ozone.recon.ReconServerConfigKeys class > org.apache.hadoop.ozone.s3.S3GatewayConfigKeys has 1 variables missing in > ozone-default.xml Entries: ozone.scm.network.topology.schema.file.type > expected:<0> but was:<1> > at org.junit.Assert.fail(Assert.java:88) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-1453) Fix unit test TestConfigurationFields broken on trunk
Siddharth Wagle created HDDS-1453: - Summary: Fix unit test TestConfigurationFields broken on trunk Key: HDDS-1453 URL: https://issues.apache.org/jira/browse/HDDS-1453 Project: Hadoop Distributed Data Store Issue Type: Bug Components: SCM Affects Versions: 0.5.0 Reporter: Siddharth Wagle Assignee: Siddharth Wagle Fix For: 0.5.0 Unit test failure:: {code} [INFO] Running org.apache.hadoop.ozone.TestOzoneConfigurationFields [ERROR] Tests run: 4, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.772 s <<< FAILURE! - in org.apache.hadoop.ozone.TestOzoneConfigurationFields [ERROR] testCompareConfigurationClassAgainstXml(org.apache.hadoop.ozone.TestOzoneConfigurationFields) Time elapsed: 0.052 s <<< FAILURE! java.lang.AssertionError: class org.apache.hadoop.ozone.OzoneConfigKeys class org.apache.hadoop.hdds.scm.ScmConfigKeys class org.apache.hadoop.ozone.om.OMConfigKeys class org.apache.hadoop.hdds.HddsConfigKeys class org.apache.hadoop.ozone.recon.ReconServerConfigKeys class org.apache.hadoop.ozone.s3.S3GatewayConfigKeys has 1 variables missing in ozone-default.xml Entries: ozone.scm.network.topology.schema.file.type expected:<0> but was:<1> at org.junit.Assert.fail(Assert.java:88) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14437) Exception happened when rollEditLog expects empty EditsDoubleBuffer.bufCurrent but not
[ https://issues.apache.org/jira/browse/HDFS-14437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823660#comment-16823660 ] He Xiaoqiao commented on HDFS-14437: Thanks [~angerszhuuu] for your more information. {quote}When #rollEditLog#endCurrentLogSegment call logSync, if there is some other thread is calling logSync, it will #wait() in the while loop. then other thread can get the lock.{quote} Sorry, I still don't understand why other threads could acquire the FSEditLog Lock and run #logSync or #LogEdit (these two method also need acquire FSEditLog Lock) when execute endCurrentLogSegment#logSync which is synchronized. > Exception happened when rollEditLog expects empty > EditsDoubleBuffer.bufCurrent but not > - > > Key: HDFS-14437 > URL: https://issues.apache.org/jira/browse/HDFS-14437 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, namenode, qjm >Reporter: angerszhu >Priority: Major > > For the problem mentioned in https://issues.apache.org/jira/browse/HDFS-10943 > , I have sort the process of write and flush EditLog and some important > function, I found the in the class FSEditLog class, the close() function > will call such process like below: > > {code:java} > waitForSyncToFinish(); > endCurrentLogSegment(true);{code} > since we have gain the object lock in the function close(), so when > waitForSyncToFish() method return, it mean all logSync job has done and all > data in bufReady has been flushed out, and since current thread has the lock > of this object, when call endCurrentLogSegment(), no other thread will gain > the lock so they can't write new editlog into currentBuf. > But when we don't call waitForSyncToFish() before endCurrentLogSegment(), > there may be some autoScheduled logSync()'s flush process is doing, since > this process don't need > synchronization since it has mention in the comment of logSync() method : > > {code:java} > /** > * Sync all modifications done by this thread. > * > * The internal concurrency design of this class is as follows: > * - Log items are written synchronized into an in-memory buffer, > * and each assigned a transaction ID. > * - When a thread (client) would like to sync all of its edits, logSync() > * uses a ThreadLocal transaction ID to determine what edit number must > * be synced to. > * - The isSyncRunning volatile boolean tracks whether a sync is currently > * under progress. > * > * The data is double-buffered within each edit log implementation so that > * in-memory writing can occur in parallel with the on-disk writing. > * > * Each sync occurs in three steps: > * 1. synchronized, it swaps the double buffer and sets the isSyncRunning > * flag. > * 2. unsynchronized, it flushes the data to storage > * 3. synchronized, it resets the flag and notifies anyone waiting on the > * sync. > * > * The lack of synchronization on step 2 allows other threads to continue > * to write into the memory buffer while the sync is in progress. > * Because this step is unsynchronized, actions that need to avoid > * concurrency with sync() should be synchronized and also call > * waitForSyncToFinish() before assuming they are running alone. > */ > public void logSync() { > long syncStart = 0; > // Fetch the transactionId of this thread. > long mytxid = myTransactionId.get().txid; > > boolean sync = false; > try { > EditLogOutputStream logStream = null; > synchronized (this) { > try { > printStatistics(false); > // if somebody is already syncing, then wait > while (mytxid > synctxid && isSyncRunning) { > try { > wait(1000); > } catch (InterruptedException ie) { > } > } > // > // If this transaction was already flushed, then nothing to do > // > if (mytxid <= synctxid) { > numTransactionsBatchedInSync++; > if (metrics != null) { > // Metrics is non-null only when used inside name node > metrics.incrTransactionsBatchedInSync(); > } > return; > } > > // now, this thread will do the sync > syncStart = txid; > isSyncRunning = true; > sync = true; > // swap buffers > try { > if (journalSet.isEmpty()) { > throw new IOException("No journals available to flush"); > } > editLogStream.setReadyToFlush(); > } catch (IOException e) { > final String msg = > "Could not sync enough journals to persistent storage " + > "due to " + e.getMessage() + ". " + > "Unsynced transactions: "
[jira] [Commented] (HDFS-14374) Expose total number of delegation tokens in AbstractDelegationTokenSecretManager
[ https://issues.apache.org/jira/browse/HDFS-14374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823658#comment-16823658 ] He Xiaoqiao commented on HDFS-14374: Thanks [~crh] and [~elgoiri] working on this, and sorry for missing this ticket's progress. As [~elgoiri] mentioned, I just mean expose the number to JMX for monitoring. Actually, it is very useful to diagnose token issues even memory leak (just for some older version) in our experience so I deploy this feature for long time for NameNode/KMS Server. Fortunately, I have seen there are some new JIRAs to push this forward. [~crh] would you like to fix that for KMS Server/NameNode/Router together? > Expose total number of delegation tokens in > AbstractDelegationTokenSecretManager > > > Key: HDFS-14374 > URL: https://issues.apache.org/jira/browse/HDFS-14374 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: CR Hota >Assignee: CR Hota >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-14374.001.patch, HDFS-14374.002.patch > > > AbstractDelegationTokenSecretManager should expose total number of active > delegation tokens for specific implementations to track for observability. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14421) HDFS block two replicas exist in one DataNode
[ https://issues.apache.org/jira/browse/HDFS-14421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823640#comment-16823640 ] Yuanbo Liu commented on HDFS-14421: --- Sorry, quite busy this week. Will comment as soon as I figure out. > HDFS block two replicas exist in one DataNode > - > > Key: HDFS-14421 > URL: https://issues.apache.org/jira/browse/HDFS-14421 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Yuanbo Liu >Priority: Major > Attachments: 326942161.log > > > We're using Hadoop-2.7.0. > There is a file in the cluster and it's replication factor is 2. Those two > replicas exist in one Datande. the fsck info is here: > {color:#707070}BP-499819267-xx.xxx.131.201-1452072365222:blk_1400651575_326942161 > len=484045 repl=2 > [DatanodeInfoWithStorage[xx.xxx.80.205:50010,DS-d321be27-cbd4-4edd-81ad-29b3d021ee82,DISK], > > DatanodeInfoWithStorage[xx.xx.80.205:50010,DS-d321be27-cbd4-4edd-81ad-29b3d021ee82,DISK]].{color} > and this is the exception from xx.xx.80.205 > {color:#707070}org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: > Replica not found for > BP-499819267-xx.xxx.131.201-1452072365222:blk_1400651575_326942161{color} > It's confusing that why NameNode doesn't update block map after exception. > What's the reason of two replicas existing in one Datande? > Hope to get your comments. Thanks in advance. > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-1065) OM and DN should persist SCM certificate as the trust root.
[ https://issues.apache.org/jira/browse/HDDS-1065?focusedWorklogId=231016&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-231016 ] ASF GitHub Bot logged work on HDDS-1065: Author: ASF GitHub Bot Created on: 23/Apr/19 02:22 Start Date: 23/Apr/19 02:22 Worklog Time Spent: 10m Work Description: ajayydv commented on pull request #754: HDDS-1065. OM and DN should persist SCM certificate as the trust root. Contributed by Ajay Kumar. URL: https://github.com/apache/hadoop/pull/754#discussion_r277498610 ## File path: hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/security/x509/certificate/client/DefaultCertificateClient.java ## @@ -80,6 +80,7 @@ public abstract class DefaultCertificateClient implements CertificateClient { private static final String CERT_FILE_NAME_FORMAT = "%s.crt"; + private static final String CA_CERT_PREFIX = "CA-"; Review comment: yes for block token and DT validation. It will be used to establish trust of chain. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 231016) Time Spent: 1h 10m (was: 1h) > OM and DN should persist SCM certificate as the trust root. > --- > > Key: HDDS-1065 > URL: https://issues.apache.org/jira/browse/HDDS-1065 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Ajay Kumar >Assignee: Ajay Kumar >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > OM and DN should persist SCM certificate as the trust root. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-1065) OM and DN should persist SCM certificate as the trust root.
[ https://issues.apache.org/jira/browse/HDDS-1065?focusedWorklogId=230992&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-230992 ] ASF GitHub Bot logged work on HDDS-1065: Author: ASF GitHub Bot Created on: 23/Apr/19 01:59 Start Date: 23/Apr/19 01:59 Worklog Time Spent: 10m Work Description: ajayydv commented on pull request #754: HDDS-1065. OM and DN should persist SCM certificate as the trust root. Contributed by Ajay Kumar. URL: https://github.com/apache/hadoop/pull/754#discussion_r277495379 ## File path: hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/HddsDatanodeService.java ## @@ -268,10 +268,13 @@ private void getSCMSignedCert(OzoneConfiguration config) { String pemEncodedCert = secureScmClient.getDataNodeCertificate( datanodeDetails.getProtoBufMessage(), getEncodedString(csr)); - dnCertClient.storeCertificate(pemEncodedCert, true); + dnCertClient.storeCertificate(pemEncodedCert, true, false); datanodeDetails.setCertSerialId(getX509Certificate(pemEncodedCert). getSerialNumber().toString()); persistDatanodeDetails(datanodeDetails); + // Get SCM CA certificate and store it in filesystem. + String pemEncodedRootCert = secureScmClient.getCACertificate(); Review comment: As of now we don't have functionality to look up certificates by subject or scm id. getCACertificate returns default certificate for SCM who signed it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 230992) Time Spent: 1h (was: 50m) > OM and DN should persist SCM certificate as the trust root. > --- > > Key: HDDS-1065 > URL: https://issues.apache.org/jira/browse/HDDS-1065 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Ajay Kumar >Assignee: Ajay Kumar >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > OM and DN should persist SCM certificate as the trust root. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14353) Erasure Coding: metrics xmitsInProgress become to negative.
[ https://issues.apache.org/jira/browse/HDFS-14353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823615#comment-16823615 ] maobaolong commented on HDFS-14353: --- [~elgoiri] Of curse, thank you for remind me, i upload a new patch, PTAL after the jenkins report. > Erasure Coding: metrics xmitsInProgress become to negative. > --- > > Key: HDFS-14353 > URL: https://issues.apache.org/jira/browse/HDFS-14353 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, erasure-coding >Affects Versions: 3.3.0 >Reporter: maobaolong >Assignee: maobaolong >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-14353.001.patch, HDFS-14353.002.patch, > HDFS-14353.003.patch, screenshot-1.png > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14353) Erasure Coding: metrics xmitsInProgress become to negative.
[ https://issues.apache.org/jira/browse/HDFS-14353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] maobaolong updated HDFS-14353: -- Attachment: HDFS-14353.003.patch > Erasure Coding: metrics xmitsInProgress become to negative. > --- > > Key: HDFS-14353 > URL: https://issues.apache.org/jira/browse/HDFS-14353 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, erasure-coding >Affects Versions: 3.3.0 >Reporter: maobaolong >Assignee: maobaolong >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-14353.001.patch, HDFS-14353.002.patch, > HDFS-14353.003.patch, screenshot-1.png > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-999) Make the DNS resolution in OzoneManager more resilient
[ https://issues.apache.org/jira/browse/HDDS-999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823610#comment-16823610 ] Hadoop QA commented on HDDS-999: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 31s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green} No case conflicting files found. {color} | | {color:blue}0{color} | {color:blue} yamllint {color} | {color:blue} 0m 0s{color} | {color:blue} yamllint was not available. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 44s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 3m 5s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 48s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 22s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 56s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 3m 50s{color} | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 6m 38s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 3m 8s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 3m 8s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 49s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 7m 43s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 1m 6s{color} | {color:red} hadoop-hdds in the patch failed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 27m 17s{color} | {color:red} hadoop-ozone in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 34s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 95m 5s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdds.scm.net.TestNodeSchemaManager | | | hadoop.hdds.scm.net.TestNetworkTopologyImpl | | | hadoop.ozone.TestOzoneConfigurationFields | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce base: https://builds.apache.org/
[jira] [Created] (HDFS-14448) Provide an appropriate Message when Secondary Namenode is not bootstrap initialized
Sailesh Patel created HDFS-14448: Summary: Provide an appropriate Message when Secondary Namenode is not bootstrap initialized Key: HDFS-14448 URL: https://issues.apache.org/jira/browse/HDFS-14448 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs Affects Versions: 2.6.0 Reporter: Sailesh Patel After HDFS is running for some time and then it is enabled for HDFS-HA, if the secondary bootstrap fails ( say because active NN was down), and then it is attempted to be started, the message in secondary NN says "NameNode is not formatted." e.g. 2019-04-16 19:43:27,951 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode. java.io.IOException: NameNode is not formatted. at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:232) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1144) Can this message be improved to say: "Secondary NameNode is not initialized. Please Bootstrap Secondary Namenode" or "NameNode is not formatted/Secondary Namenode not Bootstrapped(Initialized)" This is to avoid customers mistakenly formatting namenode and there by losing data. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-999) Make the DNS resolution in OzoneManager more resilient
[ https://issues.apache.org/jira/browse/HDDS-999?focusedWorklogId=230980&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-230980 ] ASF GitHub Bot logged work on HDDS-999: --- Author: ASF GitHub Bot Created on: 23/Apr/19 00:50 Start Date: 23/Apr/19 00:50 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on issue #758: HDDS-999. Make the DNS resolution in OzoneManager more resilient. (swagle) URL: https://github.com/apache/hadoop/pull/758#issuecomment-485601623 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Comment | |::|--:|:|:| | 0 | reexec | 26 | Docker mode activated. | ||| _ Prechecks _ | | 0 | yamllint | 0 | yamllint was not available. | | +1 | @author | 0 | The patch does not contain any @author tags. | | -1 | test4tests | 0 | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | ||| _ trunk Compile Tests _ | | 0 | mvndep | 42 | Maven dependency ordering for branch | | +1 | mvninstall | 1084 | trunk passed | | +1 | compile | 130 | trunk passed | | +1 | checkstyle | 53 | trunk passed | | +1 | mvnsite | 66 | trunk passed | | +1 | shadedclient | 787 | branch has no errors when building and testing our client artifacts. | | 0 | findbugs | 1 | Skipped patched modules with no Java source: hadoop-ozone/dist | | +1 | findbugs | 44 | trunk passed | | +1 | javadoc | 38 | trunk passed | ||| _ Patch Compile Tests _ | | 0 | mvndep | 13 | Maven dependency ordering for patch | | -1 | mvninstall | 18 | dist in the patch failed. | | +1 | compile | 114 | the patch passed | | +1 | javac | 114 | the patch passed | | +1 | checkstyle | 24 | the patch passed | | +1 | mvnsite | 45 | the patch passed | | +1 | whitespace | 0 | The patch has no whitespace issues. | | +1 | shadedclient | 726 | patch has no errors when building and testing our client artifacts. | | 0 | findbugs | 0 | Skipped patched modules with no Java source: hadoop-ozone/dist | | +1 | findbugs | 47 | the patch passed | | +1 | javadoc | 40 | the patch passed | ||| _ Other Tests _ | | +1 | unit | 43 | ozone-manager in the patch passed. | | +1 | unit | 24 | dist in the patch passed. | | +1 | asflicense | 30 | The patch does not generate ASF License warnings. | | | | 3482 | | | Subsystem | Report/Notes | |--:|:-| | Docker | Client=17.05.0-ce Server=17.05.0-ce base: https://builds.apache.org/job/hadoop-multibranch/job/PR-758/1/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/758 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient yamllint findbugs checkstyle | | uname | Linux d3c23ad8d320 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | personality/hadoop.sh | | git revision | trunk / f4ab937 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_191 | | findbugs | v3.1.0-RC1 | | mvninstall | https://builds.apache.org/job/hadoop-multibranch/job/PR-758/1/artifact/out/patch-mvninstall-hadoop-ozone_dist.txt | | Test Results | https://builds.apache.org/job/hadoop-multibranch/job/PR-758/1/testReport/ | | Max. process+thread count | 411 (vs. ulimit of 5500) | | modules | C: hadoop-ozone/ozone-manager hadoop-ozone/dist U: hadoop-ozone | | Console output | https://builds.apache.org/job/hadoop-multibranch/job/PR-758/1/console | | Powered by | Apache Yetus 0.9.0 http://yetus.apache.org | This message was automatically generated. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 230980) Time Spent: 20m (was: 10m) > Make the DNS resolution in OzoneManager more resilient > -- > > Key: HDDS-999 > URL: https://issues.apache.org/jira/browse/HDDS-999 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Reporter: Elek, Marton >Assignee: Siddharth Wagle >Priority: Major > Labels: pull-request-available > Attachments: HDDS-999.01.patch > > Time Spent: 20m > Remaining Estimate: 0h > > If t
[jira] [Work logged] (HDDS-1441) Remove usage of getRetryFailureException
[ https://issues.apache.org/jira/browse/HDDS-1441?focusedWorklogId=230966&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-230966 ] ASF GitHub Bot logged work on HDDS-1441: Author: ASF GitHub Bot Created on: 22/Apr/19 23:55 Start Date: 22/Apr/19 23:55 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on issue #745: HDDS-1441. Remove usage of getRetryFailureException. (swagle) URL: https://github.com/apache/hadoop/pull/745#issuecomment-485592408 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Comment | |::|--:|:|:| | 0 | reexec | 61 | Docker mode activated. | ||| _ Prechecks _ | | +1 | @author | 1 | The patch does not contain any @author tags. | | -1 | test4tests | 0 | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | ||| _ trunk Compile Tests _ | | 0 | mvndep | 314 | Maven dependency ordering for branch | | +1 | mvninstall | 1437 | trunk passed | | +1 | compile | 1384 | trunk passed | | +1 | checkstyle | 167 | trunk passed | | +1 | mvnsite | 461 | trunk passed | | +1 | shadedclient | 1411 | branch has no errors when building and testing our client artifacts. | | 0 | findbugs | 0 | Skipped patched modules with no Java source: hadoop-hdds hadoop-ozone | | +1 | findbugs | 95 | trunk passed | | +1 | javadoc | 221 | trunk passed | ||| _ Patch Compile Tests _ | | 0 | mvndep | 25 | Maven dependency ordering for patch | | -1 | mvninstall | 39 | hadoop-hdds in the patch failed. | | -1 | mvninstall | 12 | client in the patch failed. | | -1 | mvninstall | 16 | hadoop-ozone in the patch failed. | | -1 | mvninstall | 12 | ozone-manager in the patch failed. | | +1 | compile | 915 | the patch passed | | +1 | javac | 915 | the patch passed | | +1 | checkstyle | 141 | the patch passed | | -1 | mvnsite | 26 | hadoop-hdds in the patch failed. | | -1 | mvnsite | 21 | client in the patch failed. | | -1 | mvnsite | 26 | hadoop-ozone in the patch failed. | | -1 | mvnsite | 22 | ozone-manager in the patch failed. | | +1 | whitespace | 0 | The patch has no whitespace issues. | | +1 | xml | 2 | The patch has no ill-formed XML file. | | +1 | shadedclient | 707 | patch has no errors when building and testing our client artifacts. | | 0 | findbugs | 0 | Skipped patched modules with no Java source: hadoop-hdds hadoop-ozone | | -1 | findbugs | 21 | client in the patch failed. | | -1 | findbugs | 21 | ozone-manager in the patch failed. | | -1 | javadoc | 23 | hadoop-hdds in the patch failed. | | -1 | javadoc | 21 | client in the patch failed. | | -1 | javadoc | 23 | hadoop-ozone in the patch failed. | | -1 | javadoc | 22 | ozone-manager in the patch failed. | ||| _ Other Tests _ | | -1 | unit | 28 | hadoop-hdds in the patch failed. | | -1 | unit | 23 | client in the patch failed. | | -1 | unit | 24 | hadoop-ozone in the patch failed. | | -1 | unit | 22 | ozone-manager in the patch failed. | | +1 | asflicense | 42 | The patch does not generate ASF License warnings. | | | | 7423 | | | Subsystem | Report/Notes | |--:|:-| | Docker | Client=17.05.0-ce Server=17.05.0-ce base: https://builds.apache.org/job/hadoop-multibranch/job/PR-745/5/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/745 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml | | uname | Linux 76c5899e4a1b 4.4.0-143-generic #169~14.04.2-Ubuntu SMP Wed Feb 13 15:00:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | personality/hadoop.sh | | git revision | trunk / a54c1e3 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_191 | | findbugs | v3.1.0-RC1 | | mvninstall | https://builds.apache.org/job/hadoop-multibranch/job/PR-745/5/artifact/out/patch-mvninstall-hadoop-hdds.txt | | mvninstall | https://builds.apache.org/job/hadoop-multibranch/job/PR-745/5/artifact/out/patch-mvninstall-hadoop-hdds_client.txt | | mvninstall | https://builds.apache.org/job/hadoop-multibranch/job/PR-745/5/artifact/out/patch-mvninstall-hadoop-ozone.txt | | mvninstall | https://builds.apache.org/job/hadoop-multibranch/job/PR-745/5/artifact/out/patch-mvninstall-hadoop-ozone_ozone-manager.txt | | mvnsite | https://builds.apache.org/job/hadoop-multibranch/job/PR-745/5/artifact/out/patch-mvnsite-hadoop-hdds.txt | | mvnsite | https://builds.apache.org/job/hadoop-multibranch/job/PR-745/5/artifact/out/patch-mvnsite-hadoop-hdds_client.txt | | mvnsite | https
[jira] [Updated] (HDDS-999) Make the DNS resolution in OzoneManager more resilient
[ https://issues.apache.org/jira/browse/HDDS-999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Wagle updated HDDS-999: - Status: Patch Available (was: Open) cc: [~arpitagarwal] / [~elek] > Make the DNS resolution in OzoneManager more resilient > -- > > Key: HDDS-999 > URL: https://issues.apache.org/jira/browse/HDDS-999 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Reporter: Elek, Marton >Assignee: Siddharth Wagle >Priority: Major > Labels: pull-request-available > Attachments: HDDS-999.01.patch > > Time Spent: 10m > Remaining Estimate: 0h > > If the OzoneManager is started before scm the scm dns may not be available. > In this case the om should retry and re-resolve the dns, but as of now it > throws an exception: > {code:java} > 2019-01-23 17:14:25 ERROR OzoneManager:593 - Failed to start the OzoneManager. > java.net.SocketException: Call From om-0.om to null:0 failed on socket > exception: java.net.SocketException: Unresolved address; For more details > see: http://wiki.apache.org/hadoop/SocketException > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831) > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:798) > at org.apache.hadoop.ipc.Server.bind(Server.java:566) > at org.apache.hadoop.ipc.Server$Listener.(Server.java:1042) > at org.apache.hadoop.ipc.Server.(Server.java:2815) > at org.apache.hadoop.ipc.RPC$Server.(RPC.java:994) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server.(ProtobufRpcEngine.java:421) > at > org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:342) > at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:804) > at > org.apache.hadoop.ozone.om.OzoneManager.startRpcServer(OzoneManager.java:563) > at > org.apache.hadoop.ozone.om.OzoneManager.getRpcServer(OzoneManager.java:927) > at org.apache.hadoop.ozone.om.OzoneManager.(OzoneManager.java:265) > at org.apache.hadoop.ozone.om.OzoneManager.createOm(OzoneManager.java:674) > at org.apache.hadoop.ozone.om.OzoneManager.main(OzoneManager.java:587) > Caused by: java.net.SocketException: Unresolved address > at sun.nio.ch.Net.translateToSocketException(Net.java:131) > at sun.nio.ch.Net.translateException(Net.java:157) > at sun.nio.ch.Net.translateException(Net.java:163) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:76) > at org.apache.hadoop.ipc.Server.bind(Server.java:549) > ... 11 more > Caused by: java.nio.channels.UnresolvedAddressException > at sun.nio.ch.Net.checkAddress(Net.java:101) > at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:218) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) > ... 12 more{code} > It should be fixed. (See also HDDS-421 which fixed the same problem in > datanode side and HDDS-907 which is the workaround while this issue is not > resolved). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-999) Make the DNS resolution in OzoneManager more resilient
[ https://issues.apache.org/jira/browse/HDDS-999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Wagle updated HDDS-999: - Attachment: HDDS-999.01.patch > Make the DNS resolution in OzoneManager more resilient > -- > > Key: HDDS-999 > URL: https://issues.apache.org/jira/browse/HDDS-999 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Reporter: Elek, Marton >Assignee: Siddharth Wagle >Priority: Major > Labels: pull-request-available > Attachments: HDDS-999.01.patch > > Time Spent: 10m > Remaining Estimate: 0h > > If the OzoneManager is started before scm the scm dns may not be available. > In this case the om should retry and re-resolve the dns, but as of now it > throws an exception: > {code:java} > 2019-01-23 17:14:25 ERROR OzoneManager:593 - Failed to start the OzoneManager. > java.net.SocketException: Call From om-0.om to null:0 failed on socket > exception: java.net.SocketException: Unresolved address; For more details > see: http://wiki.apache.org/hadoop/SocketException > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831) > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:798) > at org.apache.hadoop.ipc.Server.bind(Server.java:566) > at org.apache.hadoop.ipc.Server$Listener.(Server.java:1042) > at org.apache.hadoop.ipc.Server.(Server.java:2815) > at org.apache.hadoop.ipc.RPC$Server.(RPC.java:994) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server.(ProtobufRpcEngine.java:421) > at > org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:342) > at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:804) > at > org.apache.hadoop.ozone.om.OzoneManager.startRpcServer(OzoneManager.java:563) > at > org.apache.hadoop.ozone.om.OzoneManager.getRpcServer(OzoneManager.java:927) > at org.apache.hadoop.ozone.om.OzoneManager.(OzoneManager.java:265) > at org.apache.hadoop.ozone.om.OzoneManager.createOm(OzoneManager.java:674) > at org.apache.hadoop.ozone.om.OzoneManager.main(OzoneManager.java:587) > Caused by: java.net.SocketException: Unresolved address > at sun.nio.ch.Net.translateToSocketException(Net.java:131) > at sun.nio.ch.Net.translateException(Net.java:157) > at sun.nio.ch.Net.translateException(Net.java:163) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:76) > at org.apache.hadoop.ipc.Server.bind(Server.java:549) > ... 11 more > Caused by: java.nio.channels.UnresolvedAddressException > at sun.nio.ch.Net.checkAddress(Net.java:101) > at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:218) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) > ... 12 more{code} > It should be fixed. (See also HDDS-421 which fixed the same problem in > datanode side and HDDS-907 which is the workaround while this issue is not > resolved). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-999) Make the DNS resolution in OzoneManager more resilient
[ https://issues.apache.org/jira/browse/HDDS-999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-999: Labels: pull-request-available (was: ) > Make the DNS resolution in OzoneManager more resilient > -- > > Key: HDDS-999 > URL: https://issues.apache.org/jira/browse/HDDS-999 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Reporter: Elek, Marton >Assignee: Siddharth Wagle >Priority: Major > Labels: pull-request-available > Attachments: HDDS-999.01.patch > > > If the OzoneManager is started before scm the scm dns may not be available. > In this case the om should retry and re-resolve the dns, but as of now it > throws an exception: > {code:java} > 2019-01-23 17:14:25 ERROR OzoneManager:593 - Failed to start the OzoneManager. > java.net.SocketException: Call From om-0.om to null:0 failed on socket > exception: java.net.SocketException: Unresolved address; For more details > see: http://wiki.apache.org/hadoop/SocketException > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831) > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:798) > at org.apache.hadoop.ipc.Server.bind(Server.java:566) > at org.apache.hadoop.ipc.Server$Listener.(Server.java:1042) > at org.apache.hadoop.ipc.Server.(Server.java:2815) > at org.apache.hadoop.ipc.RPC$Server.(RPC.java:994) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server.(ProtobufRpcEngine.java:421) > at > org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:342) > at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:804) > at > org.apache.hadoop.ozone.om.OzoneManager.startRpcServer(OzoneManager.java:563) > at > org.apache.hadoop.ozone.om.OzoneManager.getRpcServer(OzoneManager.java:927) > at org.apache.hadoop.ozone.om.OzoneManager.(OzoneManager.java:265) > at org.apache.hadoop.ozone.om.OzoneManager.createOm(OzoneManager.java:674) > at org.apache.hadoop.ozone.om.OzoneManager.main(OzoneManager.java:587) > Caused by: java.net.SocketException: Unresolved address > at sun.nio.ch.Net.translateToSocketException(Net.java:131) > at sun.nio.ch.Net.translateException(Net.java:157) > at sun.nio.ch.Net.translateException(Net.java:163) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:76) > at org.apache.hadoop.ipc.Server.bind(Server.java:549) > ... 11 more > Caused by: java.nio.channels.UnresolvedAddressException > at sun.nio.ch.Net.checkAddress(Net.java:101) > at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:218) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) > ... 12 more{code} > It should be fixed. (See also HDDS-421 which fixed the same problem in > datanode side and HDDS-907 which is the workaround while this issue is not > resolved). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-999) Make the DNS resolution in OzoneManager more resilient
[ https://issues.apache.org/jira/browse/HDDS-999?focusedWorklogId=230965&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-230965 ] ASF GitHub Bot logged work on HDDS-999: --- Author: ASF GitHub Bot Created on: 22/Apr/19 23:51 Start Date: 22/Apr/19 23:51 Worklog Time Spent: 10m Work Description: swagle commented on pull request #758: HDDS-999. Make the DNS resolution in OzoneManager more resilient. (swagle) URL: https://github.com/apache/hadoop/pull/758 Brought back change from HDDS-776 with the retriable task, OM will now wait for at least 50 seconds before giving up. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 230965) Time Spent: 10m Remaining Estimate: 0h > Make the DNS resolution in OzoneManager more resilient > -- > > Key: HDDS-999 > URL: https://issues.apache.org/jira/browse/HDDS-999 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Reporter: Elek, Marton >Assignee: Siddharth Wagle >Priority: Major > Labels: pull-request-available > Attachments: HDDS-999.01.patch > > Time Spent: 10m > Remaining Estimate: 0h > > If the OzoneManager is started before scm the scm dns may not be available. > In this case the om should retry and re-resolve the dns, but as of now it > throws an exception: > {code:java} > 2019-01-23 17:14:25 ERROR OzoneManager:593 - Failed to start the OzoneManager. > java.net.SocketException: Call From om-0.om to null:0 failed on socket > exception: java.net.SocketException: Unresolved address; For more details > see: http://wiki.apache.org/hadoop/SocketException > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831) > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:798) > at org.apache.hadoop.ipc.Server.bind(Server.java:566) > at org.apache.hadoop.ipc.Server$Listener.(Server.java:1042) > at org.apache.hadoop.ipc.Server.(Server.java:2815) > at org.apache.hadoop.ipc.RPC$Server.(RPC.java:994) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server.(ProtobufRpcEngine.java:421) > at > org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:342) > at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:804) > at > org.apache.hadoop.ozone.om.OzoneManager.startRpcServer(OzoneManager.java:563) > at > org.apache.hadoop.ozone.om.OzoneManager.getRpcServer(OzoneManager.java:927) > at org.apache.hadoop.ozone.om.OzoneManager.(OzoneManager.java:265) > at org.apache.hadoop.ozone.om.OzoneManager.createOm(OzoneManager.java:674) > at org.apache.hadoop.ozone.om.OzoneManager.main(OzoneManager.java:587) > Caused by: java.net.SocketException: Unresolved address > at sun.nio.ch.Net.translateToSocketException(Net.java:131) > at sun.nio.ch.Net.translateException(Net.java:157) > at sun.nio.ch.Net.translateException(Net.java:163) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:76) > at org.apache.hadoop.ipc.Server.bind(Server.java:549) > ... 11 more > Caused by: java.nio.channels.UnresolvedAddressException > at sun.nio.ch.Net.checkAddress(Net.java:101) > at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:218) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) > ... 12 more{code} > It should be fixed. (See also HDDS-421 which fixed the same problem in > datanode side and HDDS-907 which is the workaround while this issue is not > resolved). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-1301) Optimize recursive ozone filesystem apis
[ https://issues.apache.org/jira/browse/HDDS-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823567#comment-16823567 ] Anu Engineer commented on HDDS-1301: Thank you, I will sync with you, but I will write and post a design document that explains all these changes and the possible changes in the output committer. Then based on your feedback, we can shape the Ozone manager API. > Optimize recursive ozone filesystem apis > > > Key: HDDS-1301 > URL: https://issues.apache.org/jira/browse/HDDS-1301 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Lokesh Jain >Assignee: Lokesh Jain >Priority: Major > Labels: pull-request-available > Attachments: HDDS-1301.001.patch > > Time Spent: 2h 10m > Remaining Estimate: 0h > > This Jira aims to optimise recursive apis in ozone file system. These are the > apis which have a recursive flag which requires an operation to be performed > on all the children of the directory. The Jira would add support for > recursive apis in Ozone manager in order to reduce the number of rpc calls to > Ozone Manager. Also currently these operations are not atomic. This Jira > would make all the operations in ozone filesystem atomic. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-999) Make the DNS resolution in OzoneManager more resilient
[ https://issues.apache.org/jira/browse/HDDS-999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823562#comment-16823562 ] Siddharth Wagle commented on HDDS-999: -- I am able to easily reproduce this on the latest trunk, working on a patch to get HDDS-776 changes effective. {code} om_1| 2019-04-22 22:30:15 ERROR OzoneManager:888 - Failed to start the OzoneManager. om_1| java.io.IOException: Invalid host name: local host is: (unknown); destination host is: "scm":9863; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost om_1| at org.apache.hadoop.hdds.scm.protocolPB.ScmBlockLocationProtocolClientSideTranslatorPB.transformServiceException(ScmBlockLocationProtocolClientSideTranslatorPB.java:173) om_1| at org.apache.hadoop.hdds.scm.protocolPB.ScmBlockLocationProtocolClientSideTranslatorPB.getScmInfo(ScmBlockLocationProtocolClientSideTranslatorPB.java:197) om_1| at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) om_1| at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) om_1| at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) om_1| at java.base/java.lang.reflect.Method.invoke(Method.java:566) om_1| at org.apache.hadoop.hdds.tracing.TraceAllMethod.invoke(TraceAllMethod.java:66) om_1| at com.sun.proxy.$Proxy32.getScmInfo(Unknown Source) om_1| at org.apache.hadoop.ozone.om.OzoneManager.(OzoneManager.java:305) om_1| at org.apache.hadoop.ozone.om.OzoneManager.createOm(OzoneManager.java:964) om_1| at org.apache.hadoop.ozone.om.OzoneManager.main(OzoneManager.java:882) om_1| Caused by: java.net.UnknownHostException om_1| at org.apache.hadoop.ipc.Client$Connection.(Client.java:450) om_1| at org.apache.hadoop.ipc.Client.getConnection(Client.java:1552) om_1| at org.apache.hadoop.ipc.Client.call(Client.java:1403) om_1| at org.apache.hadoop.ipc.Client.call(Client.java:1367) om_1| at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228) om_1| at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116) om_1| at com.sun.proxy.$Proxy31.getScmInfo(Unknown Source) om_1| at org.apache.hadoop.hdds.scm.protocolPB.ScmBlockLocationProtocolClientSideTranslatorPB.getScmInfo(ScmBlockLocationProtocolClientSideTranslatorPB.java:195) om_1| ... 9 more om_1| 2019-04-22 22:30:15 INFO ExitUtil:210 - Exiting with status 1: java.io.IOException: Invalid host name: local host is: (unknown); destination host is: "scm":9863; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost om_1| 2019-04-22 22:30:15 INFO OzoneManager:51 - SHUTDOWN_MSG: om_1| / om_1| SHUTDOWN_MSG: Shutting down OzoneManager at 989273176ea2/172.21.0.2 om_1| / {code} > Make the DNS resolution in OzoneManager more resilient > -- > > Key: HDDS-999 > URL: https://issues.apache.org/jira/browse/HDDS-999 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Reporter: Elek, Marton >Assignee: Siddharth Wagle >Priority: Major > > If the OzoneManager is started before scm the scm dns may not be available. > In this case the om should retry and re-resolve the dns, but as of now it > throws an exception: > {code:java} > 2019-01-23 17:14:25 ERROR OzoneManager:593 - Failed to start the OzoneManager. > java.net.SocketException: Call From om-0.om to null:0 failed on socket > exception: java.net.SocketException: Unresolved address; For more details > see: http://wiki.apache.org/hadoop/SocketException > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831) > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:798) > at org.apache.hadoop.ipc.Server.bind(Server.java:566) > at org.apache.hadoop.ipc.Server$Listener.(Server.java:1042) > at org.apache.hadoop.ipc.Server.(Server.java:2815) > at org.apache.hadoop.ipc.RPC$Server.(RPC.java:994) > at >
[jira] [Work logged] (HDDS-1450) Fix nightly run failures after HDDS-976
[ https://issues.apache.org/jira/browse/HDDS-1450?focusedWorklogId=230928&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-230928 ] ASF GitHub Bot logged work on HDDS-1450: Author: ASF GitHub Bot Created on: 22/Apr/19 22:45 Start Date: 22/Apr/19 22:45 Worklog Time Spent: 10m Work Description: cjjnjust commented on issue #757: HDDS-1450. Fix nightly run failures after HDDS-976. Contributed by Xi… URL: https://github.com/apache/hadoop/pull/757#issuecomment-485578266 Thanks @xiaoyuyao, Does this fix the nightly build? It seems not fix "the good.xml file not found" issue, how does it work? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 230928) Time Spent: 40m (was: 0.5h) > Fix nightly run failures after HDDS-976 > --- > > Key: HDDS-1450 > URL: https://issues.apache.org/jira/browse/HDDS-1450 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao >Priority: Minor > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > [https://ci.anzix.net/job/ozone-nightly/72/testReport/] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14426) RBF: Add delegation token total count as one of the federation metrics
[ https://issues.apache.org/jira/browse/HDFS-14426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823492#comment-16823492 ] CR Hota commented on HDFS-14426: [~elgoiri] Strangely I don't see the commit in HDFS-13891 branch yet. > RBF: Add delegation token total count as one of the federation metrics > -- > > Key: HDFS-14426 > URL: https://issues.apache.org/jira/browse/HDFS-14426 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Fengnan Li >Assignee: Fengnan Li >Priority: Major > Attachments: HDFS-14426-HDFS-13891.001.patch, HDFS-14426.001.patch > > > Currently router doesn't report the total number of current valid delegation > tokens it has, but this piece of information is useful for monitoring and > understanding the real time situation of tokens. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14435) ObserverReadProxyProvider is unable to properly fetch HAState from Standby NNs
[ https://issues.apache.org/jira/browse/HDFS-14435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823484#comment-16823484 ] Hudson commented on HDFS-14435: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #16448 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/16448/]) HDFS-14435. [SBN Read] Enable ObserverReadProxyProvider to gracefully (xkrogen: rev 174b7d3126e215c519b1c4a74892c7020712f9df) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/HATestUtil.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestDelegationTokensWithHA.java * (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/ObserverReadProxyProvider.java > ObserverReadProxyProvider is unable to properly fetch HAState from Standby NNs > -- > > Key: HDFS-14435 > URL: https://issues.apache.org/jira/browse/HDFS-14435 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, nn >Affects Versions: 3.3.0 >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-14435.000.patch, HDFS-14435.001.patch, > HDFS-14435.002.patch, HDFS-14435.003.patch, HDFS-14435.004.patch, > HDFS-14435.005.patch, dt_stack_trace.png > > > We have been seeing issues during testing of the Consistent Read from Standby > feature that indicate that ORPP is unable to call {{getHAServiceState}} on > Standby NNs, as they are rejected with a {{StandbyException}}. Upon further > investigation, we realized that although the Standby allows the > {{getHAServiceState()}} call, reading a delegation token is not allowed in > Standby state, thus the call will fail when using DT-based authentication. > This hasn't caused issues in practice, since ORPP assumes that the state is > Standby if it is unable to fetch the state, but we should fix the logic to > properly handle this scenario. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14374) Expose total number of delegation tokens in AbstractDelegationTokenSecretManager
[ https://issues.apache.org/jira/browse/HDFS-14374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823485#comment-16823485 ] Hudson commented on HDFS-14374: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #16448 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/16448/]) HDFS-14374. Expose total number of delegation tokens in (inigoiri: rev fb1c5491398bbdac181e867022881fe2ff73c884) * (edit) hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/security/token/delegation/TestDelegationToken.java * (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/token/delegation/AbstractDelegationTokenSecretManager.java > Expose total number of delegation tokens in > AbstractDelegationTokenSecretManager > > > Key: HDFS-14374 > URL: https://issues.apache.org/jira/browse/HDFS-14374 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: CR Hota >Assignee: CR Hota >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-14374.001.patch, HDFS-14374.002.patch > > > AbstractDelegationTokenSecretManager should expose total number of active > delegation tokens for specific implementations to track for observability. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14426) RBF: Add delegation token total count as one of the federation metrics
[ https://issues.apache.org/jira/browse/HDFS-14426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823480#comment-16823480 ] Íñigo Goiri commented on HDFS-14426: It should already be rebased. > RBF: Add delegation token total count as one of the federation metrics > -- > > Key: HDFS-14426 > URL: https://issues.apache.org/jira/browse/HDFS-14426 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Fengnan Li >Assignee: Fengnan Li >Priority: Major > Attachments: HDFS-14426-HDFS-13891.001.patch, HDFS-14426.001.patch > > > Currently router doesn't report the total number of current valid delegation > tokens it has, but this piece of information is useful for monitoring and > understanding the real time situation of tokens. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14426) RBF: Add delegation token total count as one of the federation metrics
[ https://issues.apache.org/jira/browse/HDFS-14426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823460#comment-16823460 ] CR Hota commented on HDFS-14426: [~elgoiri] Could you please help rebase HDFS-13891? > RBF: Add delegation token total count as one of the federation metrics > -- > > Key: HDFS-14426 > URL: https://issues.apache.org/jira/browse/HDFS-14426 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Fengnan Li >Assignee: Fengnan Li >Priority: Major > Attachments: HDFS-14426-HDFS-13891.001.patch, HDFS-14426.001.patch > > > Currently router doesn't report the total number of current valid delegation > tokens it has, but this piece of information is useful for monitoring and > understanding the real time situation of tokens. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14374) Expose total number of delegation tokens in AbstractDelegationTokenSecretManager
[ https://issues.apache.org/jira/browse/HDFS-14374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823456#comment-16823456 ] CR Hota commented on HDFS-14374: [~elgoiri] Thanks for the review and commit. > Expose total number of delegation tokens in > AbstractDelegationTokenSecretManager > > > Key: HDFS-14374 > URL: https://issues.apache.org/jira/browse/HDFS-14374 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: CR Hota >Assignee: CR Hota >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-14374.001.patch, HDFS-14374.002.patch > > > AbstractDelegationTokenSecretManager should expose total number of active > delegation tokens for specific implementations to track for observability. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14445) TestTrySendErrorReportWhenNNThrowsIOException fails in trunk
[ https://issues.apache.org/jira/browse/HDFS-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823450#comment-16823450 ] Hudson commented on HDFS-14445: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #16447 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/16447/]) HDFS-14445. TestTrySendErrorReportWhenNNThrowsIOException fails in (inigoiri: rev 5321235fe8d89f01fe2c141fdef5d8186a6b20dd) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBPOfferService.java > TestTrySendErrorReportWhenNNThrowsIOException fails in trunk > > > Key: HDFS-14445 > URL: https://issues.apache.org/jira/browse/HDFS-14445 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-14445-01.patch, HDFS-14445-02.patch > > > {noformat} > Active namenode didn't add the report back to the queue when errorReport > threw IOException > {noformat} > Reference ::: > https://builds.apache.org/job/PreCommit-HDFS-Build/26676/testReport/org.apache.hadoop.hdfs.server.datanode/TestBPOfferService/testTrySendErrorReportWhenNNThrowsIOException/ > https://builds.apache.org/job/PreCommit-HDFS-Build/26662/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestBPOfferService/testTrySendErrorReportWhenNNThrowsIOException/ > https://builds.apache.org/job/PreCommit-HDFS-Build/26661/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestBPOfferService/testTrySendErrorReportWhenNNThrowsIOException/ > https://builds.apache.org/job/PreCommit-HDFS-Build/26644/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestBPOfferService/testTrySendErrorReportWhenNNThrowsIOException/ -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14374) Expose total number of delegation tokens in AbstractDelegationTokenSecretManager
[ https://issues.apache.org/jira/browse/HDFS-14374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823439#comment-16823439 ] Íñigo Goiri commented on HDFS-14374: Thanks [~crh] for the patch. Committed to trunk. > Expose total number of delegation tokens in > AbstractDelegationTokenSecretManager > > > Key: HDFS-14374 > URL: https://issues.apache.org/jira/browse/HDFS-14374 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: CR Hota >Assignee: CR Hota >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-14374.001.patch, HDFS-14374.002.patch > > > AbstractDelegationTokenSecretManager should expose total number of active > delegation tokens for specific implementations to track for observability. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14374) Expose total number of delegation tokens in AbstractDelegationTokenSecretManager
[ https://issues.apache.org/jira/browse/HDFS-14374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Íñigo Goiri updated HDFS-14374: --- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 3.3.0 Status: Resolved (was: Patch Available) > Expose total number of delegation tokens in > AbstractDelegationTokenSecretManager > > > Key: HDFS-14374 > URL: https://issues.apache.org/jira/browse/HDFS-14374 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: CR Hota >Assignee: CR Hota >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-14374.001.patch, HDFS-14374.002.patch > > > AbstractDelegationTokenSecretManager should expose total number of active > delegation tokens for specific implementations to track for observability. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14435) ObserverReadProxyProvider is unable to properly fetch HAState from Standby NNs
[ https://issues.apache.org/jira/browse/HDFS-14435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen updated HDFS-14435: --- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 3.3.0 Status: Resolved (was: Patch Available) > ObserverReadProxyProvider is unable to properly fetch HAState from Standby NNs > -- > > Key: HDFS-14435 > URL: https://issues.apache.org/jira/browse/HDFS-14435 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, nn >Affects Versions: 3.3.0 >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-14435.000.patch, HDFS-14435.001.patch, > HDFS-14435.002.patch, HDFS-14435.003.patch, HDFS-14435.004.patch, > HDFS-14435.005.patch, dt_stack_trace.png > > > We have been seeing issues during testing of the Consistent Read from Standby > feature that indicate that ORPP is unable to call {{getHAServiceState}} on > Standby NNs, as they are rejected with a {{StandbyException}}. Upon further > investigation, we realized that although the Standby allows the > {{getHAServiceState()}} call, reading a delegation token is not allowed in > Standby state, thus the call will fail when using DT-based authentication. > This hasn't caused issues in practice, since ORPP assumes that the state is > Standby if it is unable to fetch the state, but we should fix the logic to > properly handle this scenario. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14435) ObserverReadProxyProvider is unable to properly fetch HAState from Standby NNs
[ https://issues.apache.org/jira/browse/HDFS-14435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823438#comment-16823438 ] Erik Krogen commented on HDFS-14435: Thanks all! Committed to trunk. > ObserverReadProxyProvider is unable to properly fetch HAState from Standby NNs > -- > > Key: HDFS-14435 > URL: https://issues.apache.org/jira/browse/HDFS-14435 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, nn >Affects Versions: 3.3.0 >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-14435.000.patch, HDFS-14435.001.patch, > HDFS-14435.002.patch, HDFS-14435.003.patch, HDFS-14435.004.patch, > HDFS-14435.005.patch, dt_stack_trace.png > > > We have been seeing issues during testing of the Consistent Read from Standby > feature that indicate that ORPP is unable to call {{getHAServiceState}} on > Standby NNs, as they are rejected with a {{StandbyException}}. Upon further > investigation, we realized that although the Standby allows the > {{getHAServiceState()}} call, reading a delegation token is not allowed in > Standby state, thus the call will fail when using DT-based authentication. > This hasn't caused issues in practice, since ORPP assumes that the state is > Standby if it is unable to fetch the state, but we should fix the logic to > properly handle this scenario. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14374) Expose total number of delegation tokens in AbstractDelegationTokenSecretManager
[ https://issues.apache.org/jira/browse/HDFS-14374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823437#comment-16823437 ] Íñigo Goiri commented on HDFS-14374: +1 on [^HDFS-14374.002.patch]. [~hexiaoqiao] we will open a follow up JIRA to expose this metric in the Router and the Namenode. At that point, we should discuss what are the implications in terms of information being leaked with this. I think it should be fine but worth mentioning. > Expose total number of delegation tokens in > AbstractDelegationTokenSecretManager > > > Key: HDFS-14374 > URL: https://issues.apache.org/jira/browse/HDFS-14374 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: CR Hota >Assignee: CR Hota >Priority: Major > Attachments: HDFS-14374.001.patch, HDFS-14374.002.patch > > > AbstractDelegationTokenSecretManager should expose total number of active > delegation tokens for specific implementations to track for observability. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14435) ObserverReadProxyProvider is unable to properly fetch HAState from Standby NNs
[ https://issues.apache.org/jira/browse/HDFS-14435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823432#comment-16823432 ] Íñigo Goiri commented on HDFS-14435: +1 on [^HDFS-14435.004.patch]. > ObserverReadProxyProvider is unable to properly fetch HAState from Standby NNs > -- > > Key: HDFS-14435 > URL: https://issues.apache.org/jira/browse/HDFS-14435 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, nn >Affects Versions: 3.3.0 >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Attachments: HDFS-14435.000.patch, HDFS-14435.001.patch, > HDFS-14435.002.patch, HDFS-14435.003.patch, HDFS-14435.004.patch, > HDFS-14435.005.patch, dt_stack_trace.png > > > We have been seeing issues during testing of the Consistent Read from Standby > feature that indicate that ORPP is unable to call {{getHAServiceState}} on > Standby NNs, as they are rejected with a {{StandbyException}}. Upon further > investigation, we realized that although the Standby allows the > {{getHAServiceState()}} call, reading a delegation token is not allowed in > Standby state, thus the call will fail when using DT-based authentication. > This hasn't caused issues in practice, since ORPP assumes that the state is > Standby if it is unable to fetch the state, but we should fix the logic to > properly handle this scenario. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14445) TestTrySendErrorReportWhenNNThrowsIOException fails in trunk
[ https://issues.apache.org/jira/browse/HDFS-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Íñigo Goiri updated HDFS-14445: --- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 3.3.0 Status: Resolved (was: Patch Available) > TestTrySendErrorReportWhenNNThrowsIOException fails in trunk > > > Key: HDFS-14445 > URL: https://issues.apache.org/jira/browse/HDFS-14445 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-14445-01.patch, HDFS-14445-02.patch > > > {noformat} > Active namenode didn't add the report back to the queue when errorReport > threw IOException > {noformat} > Reference ::: > https://builds.apache.org/job/PreCommit-HDFS-Build/26676/testReport/org.apache.hadoop.hdfs.server.datanode/TestBPOfferService/testTrySendErrorReportWhenNNThrowsIOException/ > https://builds.apache.org/job/PreCommit-HDFS-Build/26662/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestBPOfferService/testTrySendErrorReportWhenNNThrowsIOException/ > https://builds.apache.org/job/PreCommit-HDFS-Build/26661/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestBPOfferService/testTrySendErrorReportWhenNNThrowsIOException/ > https://builds.apache.org/job/PreCommit-HDFS-Build/26644/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestBPOfferService/testTrySendErrorReportWhenNNThrowsIOException/ -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14445) TestTrySendErrorReportWhenNNThrowsIOException fails in trunk
[ https://issues.apache.org/jira/browse/HDFS-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823430#comment-16823430 ] Íñigo Goiri commented on HDFS-14445: Thanks [~ayushtkn] for the patch. Committed to trunk. > TestTrySendErrorReportWhenNNThrowsIOException fails in trunk > > > Key: HDFS-14445 > URL: https://issues.apache.org/jira/browse/HDFS-14445 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-14445-01.patch, HDFS-14445-02.patch > > > {noformat} > Active namenode didn't add the report back to the queue when errorReport > threw IOException > {noformat} > Reference ::: > https://builds.apache.org/job/PreCommit-HDFS-Build/26676/testReport/org.apache.hadoop.hdfs.server.datanode/TestBPOfferService/testTrySendErrorReportWhenNNThrowsIOException/ > https://builds.apache.org/job/PreCommit-HDFS-Build/26662/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestBPOfferService/testTrySendErrorReportWhenNNThrowsIOException/ > https://builds.apache.org/job/PreCommit-HDFS-Build/26661/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestBPOfferService/testTrySendErrorReportWhenNNThrowsIOException/ > https://builds.apache.org/job/PreCommit-HDFS-Build/26644/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestBPOfferService/testTrySendErrorReportWhenNNThrowsIOException/ -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14445) TestTrySendErrorReportWhenNNThrowsIOException fails in trunk
[ https://issues.apache.org/jira/browse/HDFS-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823428#comment-16823428 ] Íñigo Goiri commented on HDFS-14445: Yetus reported failed unit tests but the report shows none. Anyway this patch shouldn't have any impact on any other unit test. +1 on [^HDFS-14445-02.patch]. > TestTrySendErrorReportWhenNNThrowsIOException fails in trunk > > > Key: HDFS-14445 > URL: https://issues.apache.org/jira/browse/HDFS-14445 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Attachments: HDFS-14445-01.patch, HDFS-14445-02.patch > > > {noformat} > Active namenode didn't add the report back to the queue when errorReport > threw IOException > {noformat} > Reference ::: > https://builds.apache.org/job/PreCommit-HDFS-Build/26676/testReport/org.apache.hadoop.hdfs.server.datanode/TestBPOfferService/testTrySendErrorReportWhenNNThrowsIOException/ > https://builds.apache.org/job/PreCommit-HDFS-Build/26662/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestBPOfferService/testTrySendErrorReportWhenNNThrowsIOException/ > https://builds.apache.org/job/PreCommit-HDFS-Build/26661/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestBPOfferService/testTrySendErrorReportWhenNNThrowsIOException/ > https://builds.apache.org/job/PreCommit-HDFS-Build/26644/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestBPOfferService/testTrySendErrorReportWhenNNThrowsIOException/ -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-1452) All chunks should happen to a single file for a block in datanode
[ https://issues.apache.org/jira/browse/HDDS-1452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823407#comment-16823407 ] Anu Engineer commented on HDDS-1452: {quote}Why impose such a requirement? Doing EC of 1 KB files is probably a terrible idea even from the perspective of disk usage {quote} We will not be doing EC on the 1KB files, we will be doing EC at the data file. That is, if you store lots of data into a data file, we can EC that file. It is irrelevant if what the size is from the Ozone point of view. HDDS can do the EC at the data files level, and be completely independent of the sizes in question. Now if we have 1GB of data – so some arbitrary number that is large, then EC makes sense. This is one of the advantages of Ozone over HDFS. > All chunks should happen to a single file for a block in datanode > - > > Key: HDDS-1452 > URL: https://issues.apache.org/jira/browse/HDDS-1452 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Affects Versions: 0.5.0 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 0.5.0 > > > Currently, all chunks of a block happen to individual chunk files in > datanode. This idea here is to write all individual chunks to a single file > in datanode. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-1452) All chunks should happen to a single file for a block in datanode
[ https://issues.apache.org/jira/browse/HDDS-1452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823404#comment-16823404 ] Arpit Agarwal commented on HDDS-1452: - bq. In the ozone world, it should not matter. Especially since we plan to EC at the level of data or containers. The actual EC would not work on RockDB, but it should work on all containers, irrespective of the data size of the actual keys. Why impose such a requirement? Doing EC of 1 KB files is probably a terrible idea even from the perspective of disk usage. > All chunks should happen to a single file for a block in datanode > - > > Key: HDDS-1452 > URL: https://issues.apache.org/jira/browse/HDDS-1452 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Affects Versions: 0.5.0 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 0.5.0 > > > Currently, all chunks of a block happen to individual chunk files in > datanode. This idea here is to write all individual chunks to a single file > in datanode. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14435) ObserverReadProxyProvider is unable to properly fetch HAState from Standby NNs
[ https://issues.apache.org/jira/browse/HDFS-14435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823381#comment-16823381 ] Chao Sun commented on HDFS-14435: - +1 from me as well! > ObserverReadProxyProvider is unable to properly fetch HAState from Standby NNs > -- > > Key: HDFS-14435 > URL: https://issues.apache.org/jira/browse/HDFS-14435 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, nn >Affects Versions: 3.3.0 >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Attachments: HDFS-14435.000.patch, HDFS-14435.001.patch, > HDFS-14435.002.patch, HDFS-14435.003.patch, HDFS-14435.004.patch, > HDFS-14435.005.patch, dt_stack_trace.png > > > We have been seeing issues during testing of the Consistent Read from Standby > feature that indicate that ORPP is unable to call {{getHAServiceState}} on > Standby NNs, as they are rejected with a {{StandbyException}}. Upon further > investigation, we realized that although the Standby allows the > {{getHAServiceState()}} call, reading a delegation token is not allowed in > Standby state, thus the call will fail when using DT-based authentication. > This hasn't caused issues in practice, since ORPP assumes that the state is > Standby if it is unable to fetch the state, but we should fix the logic to > properly handle this scenario. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-1452) All chunks should happen to a single file for a block in datanode
[ https://issues.apache.org/jira/browse/HDDS-1452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823378#comment-16823378 ] Anu Engineer commented on HDDS-1452: {quote}If a container is full of 1KB files it may not be a good candidate for Erasure Coding. If your entire {quote} In the ozone world, it should not matter. Especially since we plan to EC at the level of data or containers. The actual EC would not work on RockDB, but it should work on all containers, irrespective of the data size of the actual keys. {quote}cluster is full of 1KB files then we have other serious problems, of course. {quote} Hopefully, Ozone will be just be able to handle this scenario, we might need many Ozone Managers, but a single SCM and few data nodes. I am not advising this model, but it is something that I am sure we will run into eventually, especially since we are an object store; the HDFS use case is different; but in the ozone world I think we will have to be prepared for this eventuality. > All chunks should happen to a single file for a block in datanode > - > > Key: HDDS-1452 > URL: https://issues.apache.org/jira/browse/HDDS-1452 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Affects Versions: 0.5.0 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 0.5.0 > > > Currently, all chunks of a block happen to individual chunk files in > datanode. This idea here is to write all individual chunks to a single file > in datanode. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14435) ObserverReadProxyProvider is unable to properly fetch HAState from Standby NNs
[ https://issues.apache.org/jira/browse/HDFS-14435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823376#comment-16823376 ] Chen Liang edited comment on HDFS-14435 at 4/22/19 7:38 PM: [~xkrogen] sorry for the late response... +1 from me. Feel free to commit this yourself. :) was (Author: vagarychen): [~xkrogen] sorry for the late response... +1 from me. Feel free to commit yourself. :) > ObserverReadProxyProvider is unable to properly fetch HAState from Standby NNs > -- > > Key: HDFS-14435 > URL: https://issues.apache.org/jira/browse/HDFS-14435 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, nn >Affects Versions: 3.3.0 >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Attachments: HDFS-14435.000.patch, HDFS-14435.001.patch, > HDFS-14435.002.patch, HDFS-14435.003.patch, HDFS-14435.004.patch, > HDFS-14435.005.patch, dt_stack_trace.png > > > We have been seeing issues during testing of the Consistent Read from Standby > feature that indicate that ORPP is unable to call {{getHAServiceState}} on > Standby NNs, as they are rejected with a {{StandbyException}}. Upon further > investigation, we realized that although the Standby allows the > {{getHAServiceState()}} call, reading a delegation token is not allowed in > Standby state, thus the call will fail when using DT-based authentication. > This hasn't caused issues in practice, since ORPP assumes that the state is > Standby if it is unable to fetch the state, but we should fix the logic to > properly handle this scenario. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14435) ObserverReadProxyProvider is unable to properly fetch HAState from Standby NNs
[ https://issues.apache.org/jira/browse/HDFS-14435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823376#comment-16823376 ] Chen Liang commented on HDFS-14435: --- [~xkrogen] sorry for the late response... +1 from me. Feel free to commit yourself. :) > ObserverReadProxyProvider is unable to properly fetch HAState from Standby NNs > -- > > Key: HDFS-14435 > URL: https://issues.apache.org/jira/browse/HDFS-14435 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, nn >Affects Versions: 3.3.0 >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Attachments: HDFS-14435.000.patch, HDFS-14435.001.patch, > HDFS-14435.002.patch, HDFS-14435.003.patch, HDFS-14435.004.patch, > HDFS-14435.005.patch, dt_stack_trace.png > > > We have been seeing issues during testing of the Consistent Read from Standby > feature that indicate that ORPP is unable to call {{getHAServiceState}} on > Standby NNs, as they are rejected with a {{StandbyException}}. Upon further > investigation, we realized that although the Standby allows the > {{getHAServiceState()}} call, reading a delegation token is not allowed in > Standby state, thus the call will fail when using DT-based authentication. > This hasn't caused issues in practice, since ORPP assumes that the state is > Standby if it is unable to fetch the state, but we should fix the logic to > properly handle this scenario. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-1452) All chunks should happen to a single file for a block in datanode
[ https://issues.apache.org/jira/browse/HDDS-1452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823375#comment-16823375 ] Arpit Agarwal commented on HDDS-1452: - Yeah we will have to benchmark it. If a container is full of 1KB files it may not be a good candidate for Erasure Coding. If your entire cluster is full of 1KB files then we have other serious problems, of course. The one downside of putting multiple blocks in the same file (can we call it a superblock?) is that deletes become harder. We will need to do some kind of background GC/compaction of the superblocks. > All chunks should happen to a single file for a block in datanode > - > > Key: HDDS-1452 > URL: https://issues.apache.org/jira/browse/HDDS-1452 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Affects Versions: 0.5.0 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 0.5.0 > > > Currently, all chunks of a block happen to individual chunk files in > datanode. This idea here is to write all individual chunks to a single file > in datanode. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-1452) All chunks should happen to a single file for a block in datanode
[ https://issues.apache.org/jira/browse/HDDS-1452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823374#comment-16823374 ] Anu Engineer commented on HDDS-1452: Possibly; I don't know what would be a good option; one single large file or RocksDB. Either way, when we do this, we need to make sure that it is not a single block to a single file mapping that exists. It is better to have the ability to control the data size of the files. One down side with the 1KB files in RockDB is the erasure coding might become harder, since we can take a closed container and erasure code all data files and leave metadata in RockDB with erasure coding. That is my only concern with leaving 1 KB inside RockDB; and also we will have to benchmark how it will work out. > All chunks should happen to a single file for a block in datanode > - > > Key: HDDS-1452 > URL: https://issues.apache.org/jira/browse/HDDS-1452 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Affects Versions: 0.5.0 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 0.5.0 > > > Currently, all chunks of a block happen to individual chunk files in > datanode. This idea here is to write all individual chunks to a single file > in datanode. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14435) ObserverReadProxyProvider is unable to properly fetch HAState from Standby NNs
[ https://issues.apache.org/jira/browse/HDFS-14435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823373#comment-16823373 ] Erik Krogen edited comment on HDFS-14435 at 4/22/19 7:33 PM: - Hey [~vagarychen], [~elgoiri] -- can I get a binding +1 from either of you? Want to make sure it's ok for me (or you! :) ) to commit this. was (Author: xkrogen): Hey [~csun], [~elgoiri] -- can I get a binding +1 from either of you? Want to make sure it's ok for me (or you! :) ) to commit this. > ObserverReadProxyProvider is unable to properly fetch HAState from Standby NNs > -- > > Key: HDFS-14435 > URL: https://issues.apache.org/jira/browse/HDFS-14435 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, nn >Affects Versions: 3.3.0 >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Attachments: HDFS-14435.000.patch, HDFS-14435.001.patch, > HDFS-14435.002.patch, HDFS-14435.003.patch, HDFS-14435.004.patch, > HDFS-14435.005.patch, dt_stack_trace.png > > > We have been seeing issues during testing of the Consistent Read from Standby > feature that indicate that ORPP is unable to call {{getHAServiceState}} on > Standby NNs, as they are rejected with a {{StandbyException}}. Upon further > investigation, we realized that although the Standby allows the > {{getHAServiceState()}} call, reading a delegation token is not allowed in > Standby state, thus the call will fail when using DT-based authentication. > This hasn't caused issues in practice, since ORPP assumes that the state is > Standby if it is unable to fetch the state, but we should fix the logic to > properly handle this scenario. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14435) ObserverReadProxyProvider is unable to properly fetch HAState from Standby NNs
[ https://issues.apache.org/jira/browse/HDFS-14435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823373#comment-16823373 ] Erik Krogen commented on HDFS-14435: Hey [~csun], [~elgoiri] -- can I get a binding +1 from either of you? Want to make sure it's ok for me (or you! :) ) to commit this. > ObserverReadProxyProvider is unable to properly fetch HAState from Standby NNs > -- > > Key: HDFS-14435 > URL: https://issues.apache.org/jira/browse/HDFS-14435 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, nn >Affects Versions: 3.3.0 >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Attachments: HDFS-14435.000.patch, HDFS-14435.001.patch, > HDFS-14435.002.patch, HDFS-14435.003.patch, HDFS-14435.004.patch, > HDFS-14435.005.patch, dt_stack_trace.png > > > We have been seeing issues during testing of the Consistent Read from Standby > feature that indicate that ORPP is unable to call {{getHAServiceState}} on > Standby NNs, as they are rejected with a {{StandbyException}}. Upon further > investigation, we realized that although the Standby allows the > {{getHAServiceState()}} call, reading a delegation token is not allowed in > Standby state, thus the call will fail when using DT-based authentication. > This hasn't caused issues in practice, since ORPP assumes that the state is > Standby if it is unable to fetch the state, but we should fix the logic to > properly handle this scenario. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-1452) All chunks should happen to a single file for a block in datanode
[ https://issues.apache.org/jira/browse/HDDS-1452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823368#comment-16823368 ] Arpit Agarwal commented on HDDS-1452: - Thanks for filing this [~shashikant]. 1KB files can probably just go into RocksDB! > All chunks should happen to a single file for a block in datanode > - > > Key: HDDS-1452 > URL: https://issues.apache.org/jira/browse/HDDS-1452 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Affects Versions: 0.5.0 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 0.5.0 > > > Currently, all chunks of a block happen to individual chunk files in > datanode. This idea here is to write all individual chunks to a single file > in datanode. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-1403) KeyOutputStream writes fails after max retries while writing to a closed container
[ https://issues.apache.org/jira/browse/HDDS-1403?focusedWorklogId=230851&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-230851 ] ASF GitHub Bot logged work on HDDS-1403: Author: ASF GitHub Bot Created on: 22/Apr/19 19:21 Start Date: 22/Apr/19 19:21 Worklog Time Spent: 10m Work Description: arp7 commented on pull request #753: HDDS-1403. KeyOutputStream writes fails after max retries while writing to a closed container URL: https://github.com/apache/hadoop/pull/753#discussion_r276863308 ## File path: hadoop-hdds/common/src/main/resources/ozone-default.xml ## @@ -429,12 +429,20 @@ ozone.client.max.retries -5 +100 OZONE, CLIENT Maximum number of retries by Ozone Client on encountering exception while writing a key. + +ozone.client.retry.interval.ms Review comment: Don't hardcode the unit (ms). We can specify the unit with the config key. See Configuration#getTimeDuration. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 230851) Time Spent: 50m (was: 40m) > KeyOutputStream writes fails after max retries while writing to a closed > container > -- > > Key: HDDS-1403 > URL: https://issues.apache.org/jira/browse/HDDS-1403 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: SCM >Affects Versions: 0.4.0 >Reporter: Mukul Kumar Singh >Assignee: Hanisha Koneru >Priority: Major > Labels: MiniOzoneChaosCluster, pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > Currently a Ozone Client retries a write operation 5 times. It is possible > that the container being written to is already closed by the time it is > written to. The key write will fail after retrying multiple times with this > error. This needs to be fixed as this is an internal error. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14445) TestTrySendErrorReportWhenNNThrowsIOException fails in trunk
[ https://issues.apache.org/jira/browse/HDFS-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823360#comment-16823360 ] Hadoop QA commented on HDFS-14445: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 23s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 5s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 3m 36s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 58s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 50s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 28s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}102m 38s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 35s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}149m 49s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e | | JIRA Issue | HDFS-14445 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12966637/HDFS-14445-02.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 9d00c30db3b3 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 96e3027 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_191 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/26683/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/26683/testReport/ | | Max. process+thread count | 3144 (vs. ulimit of 1) | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/26683/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > TestTrySendErrorReportWhenNNThrowsIOExcept
[jira] [Commented] (HDDS-1452) All chunks should happen to a single file for a block in datanode
[ https://issues.apache.org/jira/browse/HDDS-1452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823359#comment-16823359 ] Anu Engineer commented on HDDS-1452: Just a thought: Would it make sense to write to Data files until they become say 1GB? so we can take any chunk write to a file until it is large enough. This addresses the uses case where we are writing say 1 KB Ozone Keys. In the current proposal, if I write all 1 KB would we end up having 1 KB block files ? Just a thought since you are planning to address this issue. > All chunks should happen to a single file for a block in datanode > - > > Key: HDDS-1452 > URL: https://issues.apache.org/jira/browse/HDDS-1452 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Affects Versions: 0.5.0 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 0.5.0 > > > Currently, all chunks of a block happen to individual chunk files in > datanode. This idea here is to write all individual chunks to a single file > in datanode. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-1065) OM and DN should persist SCM certificate as the trust root.
[ https://issues.apache.org/jira/browse/HDDS-1065?focusedWorklogId=230849&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-230849 ] ASF GitHub Bot logged work on HDDS-1065: Author: ASF GitHub Bot Created on: 22/Apr/19 19:16 Start Date: 22/Apr/19 19:16 Worklog Time Spent: 10m Work Description: xiaoyuyao commented on pull request #754: HDDS-1065. OM and DN should persist SCM certificate as the trust root. Contributed by Ajay Kumar. URL: https://github.com/apache/hadoop/pull/754#discussion_r277401501 ## File path: hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/security/x509/certificate/client/DefaultCertificateClient.java ## @@ -80,6 +80,7 @@ public abstract class DefaultCertificateClient implements CertificateClient { private static final String CERT_FILE_NAME_FORMAT = "%s.crt"; + private static final String CA_CERT_PREFIX = "CA-"; Review comment: Can you remind me where do we actually use this root CA certificate in the code? I don't see reference in this patch. Should we use it in Block token verification? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 230849) Time Spent: 50m (was: 40m) > OM and DN should persist SCM certificate as the trust root. > --- > > Key: HDDS-1065 > URL: https://issues.apache.org/jira/browse/HDDS-1065 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Ajay Kumar >Assignee: Ajay Kumar >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > OM and DN should persist SCM certificate as the trust root. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-1403) KeyOutputStream writes fails after max retries while writing to a closed container
[ https://issues.apache.org/jira/browse/HDDS-1403?focusedWorklogId=230846&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-230846 ] ASF GitHub Bot logged work on HDDS-1403: Author: ASF GitHub Bot Created on: 22/Apr/19 19:13 Start Date: 22/Apr/19 19:13 Worklog Time Spent: 10m Work Description: bshashikant commented on issue #753: HDDS-1403. KeyOutputStream writes fails after max retries while writing to a closed container URL: https://github.com/apache/hadoop/pull/753#issuecomment-485518604 Thanks Hanisha for . The patch adds a retry interval while doing a retry of a client write request. But, this may not address the problem holistically, as client can still get allocated blocks from a container and while the actual write happens to the datanode, the container might get closed. The problem gets aggravated if we have large no of preallocated blocks, but client write happens much later. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 230846) Time Spent: 0.5h (was: 20m) > KeyOutputStream writes fails after max retries while writing to a closed > container > -- > > Key: HDDS-1403 > URL: https://issues.apache.org/jira/browse/HDDS-1403 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: SCM >Affects Versions: 0.4.0 >Reporter: Mukul Kumar Singh >Assignee: Hanisha Koneru >Priority: Major > Labels: MiniOzoneChaosCluster, pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > Currently a Ozone Client retries a write operation 5 times. It is possible > that the container being written to is already closed by the time it is > written to. The key write will fail after retrying multiple times with this > error. This needs to be fixed as this is an internal error. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-1065) OM and DN should persist SCM certificate as the trust root.
[ https://issues.apache.org/jira/browse/HDDS-1065?focusedWorklogId=230848&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-230848 ] ASF GitHub Bot logged work on HDDS-1065: Author: ASF GitHub Bot Created on: 22/Apr/19 19:13 Start Date: 22/Apr/19 19:13 Worklog Time Spent: 10m Work Description: xiaoyuyao commented on pull request #754: HDDS-1065. OM and DN should persist SCM certificate as the trust root. Contributed by Ajay Kumar. URL: https://github.com/apache/hadoop/pull/754#discussion_r277400775 ## File path: hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/HddsDatanodeService.java ## @@ -268,10 +268,13 @@ private void getSCMSignedCert(OzoneConfiguration config) { String pemEncodedCert = secureScmClient.getDataNodeCertificate( datanodeDetails.getProtoBufMessage(), getEncodedString(csr)); - dnCertClient.storeCertificate(pemEncodedCert, true); + dnCertClient.storeCertificate(pemEncodedCert, true, false); datanodeDetails.setCertSerialId(getX509Certificate(pemEncodedCert). getSerialNumber().toString()); persistDatanodeDetails(datanodeDetails); + // Get SCM CA certificate and store it in filesystem. + String pemEncodedRootCert = secureScmClient.getCACertificate(); Review comment: Should we get the CA certificate based on the DN certificate signed by CA? Does that container a signer certificate id? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 230848) Time Spent: 40m (was: 0.5h) > OM and DN should persist SCM certificate as the trust root. > --- > > Key: HDDS-1065 > URL: https://issues.apache.org/jira/browse/HDDS-1065 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Ajay Kumar >Assignee: Ajay Kumar >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > OM and DN should persist SCM certificate as the trust root. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-1403) KeyOutputStream writes fails after max retries while writing to a closed container
[ https://issues.apache.org/jira/browse/HDDS-1403?focusedWorklogId=230847&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-230847 ] ASF GitHub Bot logged work on HDDS-1403: Author: ASF GitHub Bot Created on: 22/Apr/19 19:13 Start Date: 22/Apr/19 19:13 Worklog Time Spent: 10m Work Description: bshashikant commented on issue #753: HDDS-1403. KeyOutputStream writes fails after max retries while writing to a closed container URL: https://github.com/apache/hadoop/pull/753#issuecomment-485518604 Thanks Hanisha for updating the patch. The patch adds a retry interval while doing a retry of a client write request. But, this may not address the problem holistically, as client can still get allocated blocks from a container and while the actual write happens to the datanode, the container might get closed. The problem gets aggravated if we have large no of preallocated blocks, but client write happens much later. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 230847) Time Spent: 40m (was: 0.5h) > KeyOutputStream writes fails after max retries while writing to a closed > container > -- > > Key: HDDS-1403 > URL: https://issues.apache.org/jira/browse/HDDS-1403 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: SCM >Affects Versions: 0.4.0 >Reporter: Mukul Kumar Singh >Assignee: Hanisha Koneru >Priority: Major > Labels: MiniOzoneChaosCluster, pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > Currently a Ozone Client retries a write operation 5 times. It is possible > that the container being written to is already closed by the time it is > written to. The key write will fail after retrying multiple times with this > error. This needs to be fixed as this is an internal error. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-1449) JVM Exit in datanode while committing a key
[ https://issues.apache.org/jira/browse/HDDS-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee reassigned HDDS-1449: - Assignee: Shashikant Banerjee > JVM Exit in datanode while committing a key > --- > > Key: HDDS-1449 > URL: https://issues.apache.org/jira/browse/HDDS-1449 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Affects Versions: 0.3.0 >Reporter: Mukul Kumar Singh >Assignee: Shashikant Banerjee >Priority: Major > Labels: MiniOzoneChaosCluster > Attachments: 2019-04-22--20-23-56-IST.MiniOzoneChaosCluster.log, > hs_err_pid67466.log > > > Saw the following trace in MiniOzoneChaosCluster run. > {code} > C [librocksdbjni17271331491728127.jnilib+0x9755c] > Java_org_rocksdb_RocksDB_write0+0x1c > J 13917 org.rocksdb.RocksDB.write0(JJJ)V (0 bytes) @ 0x0001102ff62e > [0x0001102ff580+0xae] > J 17167 C2 > org.apache.hadoop.utils.RocksDBStore.writeBatch(Lorg/apache/hadoop/utils/BatchOperation;)V > (260 bytes) @ 0x000111bbd01c [0x000111bbcde0+0x23c] > J 20434 C1 > org.apache.hadoop.ozone.container.keyvalue.impl.BlockManagerImpl.putBlock(Lorg/apache/hadoop/ozone/container/common/interfaces/Container;Lorg/apache/hadoop/ozone/container/common/helpers/BlockData;)J > (261 bytes) @ 0x000111c267ac [0x000111c25640+0x116c] > J 19262 C2 > org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(Lorg/apache/hadoop/hdds/protocol/datanode/proto/ContainerProtos$ContainerCommandRequestProto;Lorg/apache/hadoop/ozone/container/common/transport/server/ratis/DispatcherContext;)Lorg/apache/hadoop/hdds/protocol/datanode/proto/ContainerProtos$ContainerCommandResponseProto; > (866 bytes) @ 0x0001125c5aa0 [0x0001125c1560+0x4540] > J 15095 C2 > org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(Lorg/apache/hadoop/hdds/protocol/datanode/proto/ContainerProtos$ContainerCommandRequestProto;Lorg/apache/hadoop/ozone/container/common/transport/server/ratis/DispatcherContext;)Lorg/apache/hadoop/hdds/protocol/datanode/proto/ContainerProtos$ContainerCommandResponseProto; > (142 bytes) @ 0x000110ffc940 [0x000110ffc0c0+0x880] > J 19301 C2 > org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatchCommand(Lorg/apache/hadoop/hdds/protocol/datanode/proto/ContainerProtos$ContainerCommandRequestProto;Lorg/apache/hadoop/ozone/container/common/transport/server/ratis/DispatcherContext;)Lorg/apache/hadoop/hdds/protocol/datanode/proto/ContainerProtos$ContainerCommandResponseProto; > (146 bytes) @ 0x000111396144 [0x000111395e60+0x2e4] > J 15997 C2 > org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine$$Lambda$776.get()Ljava/lang/Object; > (16 bytes) @ 0x000110138e54 [0x000110138d80+0xd4] > J 15970 C2 java.util.concurrent.CompletableFuture$AsyncSupply.run()V (61 > bytes) @ 0x00010fc80094 [0x00010fc8+0x94] > J 17368 C2 > java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V > (225 bytes) @ 0x000110b0a7a0 [0x000110b0a5a0+0x200] > J 7389 C1 java.util.concurrent.ThreadPoolExecutor$Worker.run()V (9 bytes) @ > 0x00011012a004 [0x000110129f00+0x104] > J 6837 C1 java.lang.Thread.run()V (17 bytes) @ 0x00011002b144 > [0x00011002b000+0x144] > v ~StubRoutines::call_stub > V [libjvm.dylib+0x2ef1f6] JavaCalls::call_helper(JavaValue*, methodHandle*, > JavaCallArguments*, Thread*)+0x6ae > V [libjvm.dylib+0x2ef99a] JavaCalls::call_virtual(JavaValue*, KlassHandle, > Symbol*, Symbol*, JavaCallArguments*, Thread*)+0x164 > V [libjvm.dylib+0x2efb46] JavaCalls::call_virtual(JavaValue*, Handle, > KlassHandle, Symbol*, Symbol*, Thread*)+0x4a > V [libjvm.dylib+0x34a46d] thread_entry(JavaThread*, Thread*)+0x7c > V [libjvm.dylib+0x56eb0f] JavaThread::thread_main_inner()+0x9b > V [libjvm.dylib+0x57020a] JavaThread::run()+0x1c2 > V [libjvm.dylib+0x48d4a6] java_start(Thread*)+0xf6 > C [libsystem_pthread.dylib+0x3305] _pthread_body+0x7e > C [libsystem_pthread.dylib+0x626f] _pthread_start+0x46 > C [libsystem_pthread.dylib+0x2415] thread_start+0xd > C 0x > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-1368) Cleanup old ReplicationManager code from SCM
[ https://issues.apache.org/jira/browse/HDDS-1368?focusedWorklogId=230833&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-230833 ] ASF GitHub Bot logged work on HDDS-1368: Author: ASF GitHub Bot Created on: 22/Apr/19 18:52 Start Date: 22/Apr/19 18:52 Worklog Time Spent: 10m Work Description: nandakumar131 commented on issue #711: HDDS-1368. Cleanup old ReplicationManager code from SCM. URL: https://github.com/apache/hadoop/pull/711#issuecomment-485511736 /retest This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 230833) Time Spent: 2h (was: 1h 50m) > Cleanup old ReplicationManager code from SCM > > > Key: HDDS-1368 > URL: https://issues.apache.org/jira/browse/HDDS-1368 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: SCM >Reporter: Nanda kumar >Assignee: Nanda kumar >Priority: Major > Labels: pull-request-available > Time Spent: 2h > Remaining Estimate: 0h > > HDDS-1205 brings in new ReplicationManager and HDDS-1207 plugs in the new > code, this jira is for removing the old ReplicationManager and related code. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-1452) All chunks should happen to a single file for a block in datanode
Shashikant Banerjee created HDDS-1452: - Summary: All chunks should happen to a single file for a block in datanode Key: HDDS-1452 URL: https://issues.apache.org/jira/browse/HDDS-1452 Project: Hadoop Distributed Data Store Issue Type: Bug Components: Ozone Datanode Affects Versions: 0.5.0 Reporter: Shashikant Banerjee Assignee: Shashikant Banerjee Fix For: 0.5.0 Currently, all chunks of a block happen to individual chunk files in datanode. This idea here is to write all individual chunks to a single file in datanode. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-1425) Ozone compose files are not compatible with the latest docker-compose
[ https://issues.apache.org/jira/browse/HDDS-1425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823343#comment-16823343 ] Arpit Agarwal commented on HDDS-1425: - Thanks [~anu]. Updating the fix version to 0.4.0 since it looks like we will be rolling 0.4.0 RC1. > Ozone compose files are not compatible with the latest docker-compose > - > > Key: HDDS-1425 > URL: https://issues.apache.org/jira/browse/HDDS-1425 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Elek, Marton >Assignee: Elek, Marton >Priority: Blocker > Labels: pull-request-available > Fix For: 0.4.0, 0.5.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > I upgraded my docker-compose to the latest available one (1.24.0) > But after the upgrade I can't start the docker-compose based cluster any more: > {code} > ./test.sh > - > Executing test(s): [basic] > Cluster type: ozone > Compose file: > /home/elek/projects/hadoop-review/hadoop-ozone/dist/target/ozone-0.4.0-SNAPSHOT/smoketest/../compose/ozone/docker-compose.yaml > Output dir: > /home/elek/projects/hadoop-review/hadoop-ozone/dist/target/ozone-0.4.0-SNAPSHOT/smoketest/result > Command to rerun: ./test.sh --keep --env ozone basic > - > ERROR: In file > /home/elek/projects/hadoop-review/hadoop-ozone/dist/target/ozone-0.4.0-SNAPSHOT/compose/ozone/docker-config: > environment variable name 'LOG4J2.PROPERTIES_appender.rolling.file > {code} > It turned out that the line of LOG4J2.PROPERTIES_appender.rolling.file > contains an unnecessary space which is not accepted by the latest > docker-compose any more. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1425) Ozone compose files are not compatible with the latest docker-compose
[ https://issues.apache.org/jira/browse/HDDS-1425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDDS-1425: Fix Version/s: (was: 0.4.1) 0.4.0 > Ozone compose files are not compatible with the latest docker-compose > - > > Key: HDDS-1425 > URL: https://issues.apache.org/jira/browse/HDDS-1425 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Elek, Marton >Assignee: Elek, Marton >Priority: Blocker > Labels: pull-request-available > Fix For: 0.4.0, 0.5.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > I upgraded my docker-compose to the latest available one (1.24.0) > But after the upgrade I can't start the docker-compose based cluster any more: > {code} > ./test.sh > - > Executing test(s): [basic] > Cluster type: ozone > Compose file: > /home/elek/projects/hadoop-review/hadoop-ozone/dist/target/ozone-0.4.0-SNAPSHOT/smoketest/../compose/ozone/docker-compose.yaml > Output dir: > /home/elek/projects/hadoop-review/hadoop-ozone/dist/target/ozone-0.4.0-SNAPSHOT/smoketest/result > Command to rerun: ./test.sh --keep --env ozone basic > - > ERROR: In file > /home/elek/projects/hadoop-review/hadoop-ozone/dist/target/ozone-0.4.0-SNAPSHOT/compose/ozone/docker-config: > environment variable name 'LOG4J2.PROPERTIES_appender.rolling.file > {code} > It turned out that the line of LOG4J2.PROPERTIES_appender.rolling.file > contains an unnecessary space which is not accepted by the latest > docker-compose any more. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-1450) Fix nightly run failures after HDDS-976
[ https://issues.apache.org/jira/browse/HDDS-1450?focusedWorklogId=230822&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-230822 ] ASF GitHub Bot logged work on HDDS-1450: Author: ASF GitHub Bot Created on: 22/Apr/19 18:28 Start Date: 22/Apr/19 18:28 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on issue #757: HDDS-1450. Fix nightly run failures after HDDS-976. Contributed by Xi… URL: https://github.com/apache/hadoop/pull/757#issuecomment-485504373 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Comment | |::|--:|:|:| | 0 | reexec | 26 | Docker mode activated. | ||| _ Prechecks _ | | +1 | @author | 0 | The patch does not contain any @author tags. | | -1 | test4tests | 0 | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | ||| _ trunk Compile Tests _ | | +1 | mvninstall | 1023 | trunk passed | | +1 | compile | 44 | trunk passed | | +1 | checkstyle | 23 | trunk passed | | +1 | mvnsite | 41 | trunk passed | | +1 | shadedclient | 748 | branch has no errors when building and testing our client artifacts. | | +1 | findbugs | 69 | trunk passed | | +1 | javadoc | 44 | trunk passed | ||| _ Patch Compile Tests _ | | +1 | mvninstall | 40 | the patch passed | | +1 | compile | 33 | the patch passed | | +1 | javac | 33 | the patch passed | | +1 | checkstyle | 16 | the patch passed | | +1 | mvnsite | 34 | the patch passed | | +1 | whitespace | 1 | The patch has no whitespace issues. | | +1 | xml | 1 | The patch has no ill-formed XML file. | | +1 | shadedclient | 751 | patch has no errors when building and testing our client artifacts. | | +1 | findbugs | 77 | the patch passed | | +1 | javadoc | 37 | the patch passed | ||| _ Other Tests _ | | +1 | unit | 67 | common in the patch passed. | | +1 | asflicense | 29 | The patch does not generate ASF License warnings. | | | | 3187 | | | Subsystem | Report/Notes | |--:|:-| | Docker | Client=17.05.0-ce Server=17.05.0-ce base: https://builds.apache.org/job/hadoop-multibranch/job/PR-757/1/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/757 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml | | uname | Linux 4814bfbb092b 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | personality/hadoop.sh | | git revision | trunk / 96e3027 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_191 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/hadoop-multibranch/job/PR-757/1/testReport/ | | Max. process+thread count | 444 (vs. ulimit of 5500) | | modules | C: hadoop-hdds/common U: hadoop-hdds/common | | Console output | https://builds.apache.org/job/hadoop-multibranch/job/PR-757/1/console | | Powered by | Apache Yetus 0.9.0 http://yetus.apache.org | This message was automatically generated. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 230822) Time Spent: 0.5h (was: 20m) > Fix nightly run failures after HDDS-976 > --- > > Key: HDDS-1450 > URL: https://issues.apache.org/jira/browse/HDDS-1450 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao >Priority: Minor > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > [https://ci.anzix.net/job/ozone-nightly/72/testReport/] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-1425) Ozone compose files are not compatible with the latest docker-compose
[ https://issues.apache.org/jira/browse/HDDS-1425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823276#comment-16823276 ] Anu Engineer commented on HDDS-1425: we plan to release 0.4.1 and 0.4.2 soon ... So this is helpful. Thanks for the patch and commit. > Ozone compose files are not compatible with the latest docker-compose > - > > Key: HDDS-1425 > URL: https://issues.apache.org/jira/browse/HDDS-1425 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Elek, Marton >Assignee: Elek, Marton >Priority: Blocker > Labels: pull-request-available > Fix For: 0.5.0, 0.4.1 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > I upgraded my docker-compose to the latest available one (1.24.0) > But after the upgrade I can't start the docker-compose based cluster any more: > {code} > ./test.sh > - > Executing test(s): [basic] > Cluster type: ozone > Compose file: > /home/elek/projects/hadoop-review/hadoop-ozone/dist/target/ozone-0.4.0-SNAPSHOT/smoketest/../compose/ozone/docker-compose.yaml > Output dir: > /home/elek/projects/hadoop-review/hadoop-ozone/dist/target/ozone-0.4.0-SNAPSHOT/smoketest/result > Command to rerun: ./test.sh --keep --env ozone basic > - > ERROR: In file > /home/elek/projects/hadoop-review/hadoop-ozone/dist/target/ozone-0.4.0-SNAPSHOT/compose/ozone/docker-config: > environment variable name 'LOG4J2.PROPERTIES_appender.rolling.file > {code} > It turned out that the line of LOG4J2.PROPERTIES_appender.rolling.file > contains an unnecessary space which is not accepted by the latest > docker-compose any more. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14374) Expose total number of delegation tokens in AbstractDelegationTokenSecretManager
[ https://issues.apache.org/jira/browse/HDFS-14374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823274#comment-16823274 ] CR Hota commented on HDFS-14374: [~elgoiri] [~hexiaoqiao] This is a fairly harmless and but important change for specific apps to pick up and report metrics. Could we commit HDFS-14374.002.patch and rebase router branch ? Want to make sure we have this in time to get ready for 3.3 release. > Expose total number of delegation tokens in > AbstractDelegationTokenSecretManager > > > Key: HDFS-14374 > URL: https://issues.apache.org/jira/browse/HDFS-14374 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: CR Hota >Assignee: CR Hota >Priority: Major > Attachments: HDFS-14374.001.patch, HDFS-14374.002.patch > > > AbstractDelegationTokenSecretManager should expose total number of active > delegation tokens for specific implementations to track for observability. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-1450) Fix nightly run failures after HDDS-976
[ https://issues.apache.org/jira/browse/HDDS-1450?focusedWorklogId=230799&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-230799 ] ASF GitHub Bot logged work on HDDS-1450: Author: ASF GitHub Bot Created on: 22/Apr/19 17:36 Start Date: 22/Apr/19 17:36 Worklog Time Spent: 10m Work Description: xiaoyuyao commented on issue #757: HDDS-1450. Fix nightly run failures after HDDS-976. Contributed by Xi… URL: https://github.com/apache/hadoop/pull/757#issuecomment-485487701 Remove the unnecessary configuration key "ozone.scm.network.topology.schema.file.type" and determine the type of schema based on the file extension of existing key "ozone.scm.network.topology.schema.file". cc: @cjjnjust This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 230799) Time Spent: 20m (was: 10m) > Fix nightly run failures after HDDS-976 > --- > > Key: HDDS-1450 > URL: https://issues.apache.org/jira/browse/HDDS-1450 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao >Priority: Minor > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > [https://ci.anzix.net/job/ozone-nightly/72/testReport/] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1450) Fix nightly run failures after HDDS-976
[ https://issues.apache.org/jira/browse/HDDS-1450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-1450: - Labels: pull-request-available (was: ) > Fix nightly run failures after HDDS-976 > --- > > Key: HDDS-1450 > URL: https://issues.apache.org/jira/browse/HDDS-1450 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao >Priority: Minor > Labels: pull-request-available > > [https://ci.anzix.net/job/ozone-nightly/72/testReport/] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-1450) Fix nightly run failures after HDDS-976
[ https://issues.apache.org/jira/browse/HDDS-1450?focusedWorklogId=230798&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-230798 ] ASF GitHub Bot logged work on HDDS-1450: Author: ASF GitHub Bot Created on: 22/Apr/19 17:33 Start Date: 22/Apr/19 17:33 Worklog Time Spent: 10m Work Description: xiaoyuyao commented on pull request #757: HDDS-1450. Fix nightly run failures after HDDS-976. Contributed by Xi… URL: https://github.com/apache/hadoop/pull/757 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 230798) Time Spent: 10m Remaining Estimate: 0h > Fix nightly run failures after HDDS-976 > --- > > Key: HDDS-1450 > URL: https://issues.apache.org/jira/browse/HDDS-1450 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > [https://ci.anzix.net/job/ozone-nightly/72/testReport/] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-1451) SCMBlockManager allocates pipelines in cases when the pipeline has already been allocated
[ https://issues.apache.org/jira/browse/HDDS-1451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aravindan Vijayan reassigned HDDS-1451: --- Assignee: Aravindan Vijayan > SCMBlockManager allocates pipelines in cases when the pipeline has already > been allocated > - > > Key: HDDS-1451 > URL: https://issues.apache.org/jira/browse/HDDS-1451 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: SCM >Affects Versions: 0.3.0 >Reporter: Mukul Kumar Singh >Assignee: Aravindan Vijayan >Priority: Major > Labels: MiniOzoneChaosCluster > > SCM BlockManager may try to allocate pipelines in the cases when it is not > needed. This happens because BlockManagerImpl#allocateBlock is not lock > protected, so multiple pipelines can be allocated from it. One of the > pipeline allocation can fail even when one of the existing pipeline already > exists. > {code} > 2019-04-22 22:34:14,336 INFO pipeline.RatisPipelineProvider > (RatisPipelineProvider.java:lambda$create$1(103)) - pipeline Pipeline[ Id: > 6f4bb2d7-d660-4f9f-bc06-72b10f9a738e, Nodes: 76e1a493-fd55-4d67-9f5 > 5-c04fd6bd3a33{ip: 192.168.0.104, host: 192.168.0.104, certSerialId: > null}2b9850b2-aed3-4a40-91b5-2447dc5246bf{ip: 192.168.0.104, host: > 192.168.0.104, certSerialId: null}12248721-ea6a-453f-8dad-fc7fbe692f > d2{ip: 192.168.0.104, host: 192.168.0.104, certSerialId: null}, Type:RATIS, > Factor:THREE, State:OPEN] > 2019-04-22 22:34:14,386 INFO impl.RoleInfo > (RoleInfo.java:shutdownLeaderElection(134)) - > e17b7852-4691-40c7-8791-ad0b0da5201f: shutdown LeaderElection > 2019-04-22 22:34:14,388 INFO pipeline.RatisPipelineProvider > (RatisPipelineProvider.java:lambda$create$1(103)) - pipeline Pipeline[ Id: > 552e28f3-98d9-41f3-86e0-c1b9494838a5, Nodes: e17b7852-4691-40c7-879 > 1-ad0b0da5201f{ip: 192.168.0.104, host: 192.168.0.104, certSerialId: > null}fd365bac-e26e-4b11-afd8-9d08cd1b0521{ip: 192.168.0.104, host: > 192.168.0.104, certSerialId: null}9583a007-7f02-4074-9e26-19bc18e29e > c5{ip: 192.168.0.104, host: 192.168.0.104, certSerialId: null}, Type:RATIS, > Factor:THREE, State:OPEN] > 2019-04-22 22:34:14,388 INFO impl.RoleInfo (RoleInfo.java:updateAndGet(143)) > - e17b7852-4691-40c7-8791-ad0b0da5201f: start FollowerState > 2019-04-22 22:34:14,388 INFO pipeline.RatisPipelineProvider > (RatisPipelineProvider.java:lambda$create$1(103)) - pipeline Pipeline[ Id: > 5383151b-d625-4362-a7dd-c0d353acaf76, Nodes: 80f16ad6-3879-4a64-a3c > 7-7719813cc139{ip: 192.168.0.104, host: 192.168.0.104, certSerialId: > null}082ce481-7fb0-4f88-ac21-82609290a6a2{ip: 192.168.0.104, host: > 192.168.0.104, certSerialId: null}dd5f5a70-0217-4577-b7a2-c42aa139d1 > 8a{ip: 192.168.0.104, host: 192.168.0.104, certSerialId: null}, Type:RATIS, > Factor:THREE, State:OPEN] > 2019-04-22 22:34:14,389 INFO pipeline.RatisPipelineProvider > (RatisPipelineProvider.java:lambda$create$1(103)) - pipeline Pipeline[ Id: > be4854e5-7933-4caa-b32e-f482cf500247, Nodes: 6e2356f1-479d-498b-876 > a-1c90623c498b{ip: 192.168.0.104, host: 192.168.0.104, certSerialId: > null}8ac46d94-9975-4eea-9448-2618c69d7bf3{ip: 192.168.0.104, host: > 192.168.0.104, certSerialId: null}a3ed36a1-44ca-47b2-b9b3-5aeef04595 > 18{ip: 192.168.0.104, host: 192.168.0.104, certSerialId: null}, Type:RATIS, > Factor:THREE, State:OPEN] > 2019-04-22 22:34:14,390 INFO pipeline.RatisPipelineProvider > (RatisPipelineProvider.java:lambda$create$1(103)) - pipeline Pipeline[ Id: > 21e368e2-f82a-4c61-9cc3-06e8de22ea6b, Nodes: > 82632040-5754-4122-b187-331879586842{ip: 192.168.0.104, host: 192.168.0.104, > certSerialId: null}923c8537-b869-4085-adcb-0a9accdcd089{ip: 192.168.0.104, > host: 192.168.0.104, certSerialId: > null}c6d790bf-e3a6-4064-acb5-f74796cd38a9{ip: 192.168.0.104, host: > 192.168.0.104, certSerialId: null}, Type:RATIS, Factor:THREE, State:OPEN] > 2019-04-22 22:34:14,390 INFO pipeline.RatisPipelineProvider > (RatisPipelineProvider.java:lambda$create$1(103)) - pipeline Pipeline[ Id: > cccbc2ed-e0e2-4578-a8a2-94f4b645be52, Nodes: > 91ae6848-a778-43be-a4a1-5855f7adc0d8{ip: 192.168.0.104, host: 192.168.0.104, > certSerialId: null}8f330a03-40e2-4bd1-9b43-5e05b13d89f0{ip: 192.168.0.104, > host: 192.168.0.104, certSerialId: > null}4f3070dc-650b-48d7-87b5-d2076104e7b4{ip: 192.168.0.104, host: > 192.168.0.104, certSerialId: null}, Type:RATIS, Factor:THREE, State:OPEN] > 2019-04-22 22:34:14,392 ERROR block.BlockManagerImpl > (BlockManagerImpl.java:allocateBlock(192)) - Pipeline creation failed for > type:RATIS factor:THREE > org.apache.hadoop.hdds.scm.pipeline.InsufficientDatanodesException: Cannot > create pipeline of factor 3 using 2 nodes 20 healthy nodes 20 all nodes. > at > org.apache.hadoop
[jira] [Created] (HDDS-1451) SCMBlockManager allocates pipelines in cases when the pipeline has already been allocated
Mukul Kumar Singh created HDDS-1451: --- Summary: SCMBlockManager allocates pipelines in cases when the pipeline has already been allocated Key: HDDS-1451 URL: https://issues.apache.org/jira/browse/HDDS-1451 Project: Hadoop Distributed Data Store Issue Type: Bug Components: SCM Affects Versions: 0.3.0 Reporter: Mukul Kumar Singh SCM BlockManager may try to allocate pipelines in the cases when it is not needed. This happens because BlockManagerImpl#allocateBlock is not lock protected, so multiple pipelines can be allocated from it. One of the pipeline allocation can fail even when one of the existing pipeline already exists. {code} 2019-04-22 22:34:14,336 INFO pipeline.RatisPipelineProvider (RatisPipelineProvider.java:lambda$create$1(103)) - pipeline Pipeline[ Id: 6f4bb2d7-d660-4f9f-bc06-72b10f9a738e, Nodes: 76e1a493-fd55-4d67-9f5 5-c04fd6bd3a33{ip: 192.168.0.104, host: 192.168.0.104, certSerialId: null}2b9850b2-aed3-4a40-91b5-2447dc5246bf{ip: 192.168.0.104, host: 192.168.0.104, certSerialId: null}12248721-ea6a-453f-8dad-fc7fbe692f d2{ip: 192.168.0.104, host: 192.168.0.104, certSerialId: null}, Type:RATIS, Factor:THREE, State:OPEN] 2019-04-22 22:34:14,386 INFO impl.RoleInfo (RoleInfo.java:shutdownLeaderElection(134)) - e17b7852-4691-40c7-8791-ad0b0da5201f: shutdown LeaderElection 2019-04-22 22:34:14,388 INFO pipeline.RatisPipelineProvider (RatisPipelineProvider.java:lambda$create$1(103)) - pipeline Pipeline[ Id: 552e28f3-98d9-41f3-86e0-c1b9494838a5, Nodes: e17b7852-4691-40c7-879 1-ad0b0da5201f{ip: 192.168.0.104, host: 192.168.0.104, certSerialId: null}fd365bac-e26e-4b11-afd8-9d08cd1b0521{ip: 192.168.0.104, host: 192.168.0.104, certSerialId: null}9583a007-7f02-4074-9e26-19bc18e29e c5{ip: 192.168.0.104, host: 192.168.0.104, certSerialId: null}, Type:RATIS, Factor:THREE, State:OPEN] 2019-04-22 22:34:14,388 INFO impl.RoleInfo (RoleInfo.java:updateAndGet(143)) - e17b7852-4691-40c7-8791-ad0b0da5201f: start FollowerState 2019-04-22 22:34:14,388 INFO pipeline.RatisPipelineProvider (RatisPipelineProvider.java:lambda$create$1(103)) - pipeline Pipeline[ Id: 5383151b-d625-4362-a7dd-c0d353acaf76, Nodes: 80f16ad6-3879-4a64-a3c 7-7719813cc139{ip: 192.168.0.104, host: 192.168.0.104, certSerialId: null}082ce481-7fb0-4f88-ac21-82609290a6a2{ip: 192.168.0.104, host: 192.168.0.104, certSerialId: null}dd5f5a70-0217-4577-b7a2-c42aa139d1 8a{ip: 192.168.0.104, host: 192.168.0.104, certSerialId: null}, Type:RATIS, Factor:THREE, State:OPEN] 2019-04-22 22:34:14,389 INFO pipeline.RatisPipelineProvider (RatisPipelineProvider.java:lambda$create$1(103)) - pipeline Pipeline[ Id: be4854e5-7933-4caa-b32e-f482cf500247, Nodes: 6e2356f1-479d-498b-876 a-1c90623c498b{ip: 192.168.0.104, host: 192.168.0.104, certSerialId: null}8ac46d94-9975-4eea-9448-2618c69d7bf3{ip: 192.168.0.104, host: 192.168.0.104, certSerialId: null}a3ed36a1-44ca-47b2-b9b3-5aeef04595 18{ip: 192.168.0.104, host: 192.168.0.104, certSerialId: null}, Type:RATIS, Factor:THREE, State:OPEN] 2019-04-22 22:34:14,390 INFO pipeline.RatisPipelineProvider (RatisPipelineProvider.java:lambda$create$1(103)) - pipeline Pipeline[ Id: 21e368e2-f82a-4c61-9cc3-06e8de22ea6b, Nodes: 82632040-5754-4122-b187-331879586842{ip: 192.168.0.104, host: 192.168.0.104, certSerialId: null}923c8537-b869-4085-adcb-0a9accdcd089{ip: 192.168.0.104, host: 192.168.0.104, certSerialId: null}c6d790bf-e3a6-4064-acb5-f74796cd38a9{ip: 192.168.0.104, host: 192.168.0.104, certSerialId: null}, Type:RATIS, Factor:THREE, State:OPEN] 2019-04-22 22:34:14,390 INFO pipeline.RatisPipelineProvider (RatisPipelineProvider.java:lambda$create$1(103)) - pipeline Pipeline[ Id: cccbc2ed-e0e2-4578-a8a2-94f4b645be52, Nodes: 91ae6848-a778-43be-a4a1-5855f7adc0d8{ip: 192.168.0.104, host: 192.168.0.104, certSerialId: null}8f330a03-40e2-4bd1-9b43-5e05b13d89f0{ip: 192.168.0.104, host: 192.168.0.104, certSerialId: null}4f3070dc-650b-48d7-87b5-d2076104e7b4{ip: 192.168.0.104, host: 192.168.0.104, certSerialId: null}, Type:RATIS, Factor:THREE, State:OPEN] 2019-04-22 22:34:14,392 ERROR block.BlockManagerImpl (BlockManagerImpl.java:allocateBlock(192)) - Pipeline creation failed for type:RATIS factor:THREE org.apache.hadoop.hdds.scm.pipeline.InsufficientDatanodesException: Cannot create pipeline of factor 3 using 2 nodes 20 healthy nodes 20 all nodes. at org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.create(RatisPipelineProvider.java:122) at org.apache.hadoop.hdds.scm.pipeline.PipelineFactory.create(PipelineFactory.java:57) at org.apache.hadoop.hdds.scm.pipeline.SCMPipelineManager.createPipeline(SCMPipelineManager.java:148) at org.apache.hadoop.hdds.scm.block.BlockManagerImpl.allocateBlock(BlockManagerImpl.java:190) at org.apache.hadoop.hdds.scm.server.SCMBlockProtocolServer.allocateBlock(SCMBlockProtocolServ
[jira] [Updated] (HDDS-1451) SCMBlockManager allocates pipelines in cases when the pipeline has already been allocated
[ https://issues.apache.org/jira/browse/HDDS-1451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukul Kumar Singh updated HDDS-1451: Labels: MiniOzoneChaosCluster (was: ) > SCMBlockManager allocates pipelines in cases when the pipeline has already > been allocated > - > > Key: HDDS-1451 > URL: https://issues.apache.org/jira/browse/HDDS-1451 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: SCM >Affects Versions: 0.3.0 >Reporter: Mukul Kumar Singh >Priority: Major > Labels: MiniOzoneChaosCluster > > SCM BlockManager may try to allocate pipelines in the cases when it is not > needed. This happens because BlockManagerImpl#allocateBlock is not lock > protected, so multiple pipelines can be allocated from it. One of the > pipeline allocation can fail even when one of the existing pipeline already > exists. > {code} > 2019-04-22 22:34:14,336 INFO pipeline.RatisPipelineProvider > (RatisPipelineProvider.java:lambda$create$1(103)) - pipeline Pipeline[ Id: > 6f4bb2d7-d660-4f9f-bc06-72b10f9a738e, Nodes: 76e1a493-fd55-4d67-9f5 > 5-c04fd6bd3a33{ip: 192.168.0.104, host: 192.168.0.104, certSerialId: > null}2b9850b2-aed3-4a40-91b5-2447dc5246bf{ip: 192.168.0.104, host: > 192.168.0.104, certSerialId: null}12248721-ea6a-453f-8dad-fc7fbe692f > d2{ip: 192.168.0.104, host: 192.168.0.104, certSerialId: null}, Type:RATIS, > Factor:THREE, State:OPEN] > 2019-04-22 22:34:14,386 INFO impl.RoleInfo > (RoleInfo.java:shutdownLeaderElection(134)) - > e17b7852-4691-40c7-8791-ad0b0da5201f: shutdown LeaderElection > 2019-04-22 22:34:14,388 INFO pipeline.RatisPipelineProvider > (RatisPipelineProvider.java:lambda$create$1(103)) - pipeline Pipeline[ Id: > 552e28f3-98d9-41f3-86e0-c1b9494838a5, Nodes: e17b7852-4691-40c7-879 > 1-ad0b0da5201f{ip: 192.168.0.104, host: 192.168.0.104, certSerialId: > null}fd365bac-e26e-4b11-afd8-9d08cd1b0521{ip: 192.168.0.104, host: > 192.168.0.104, certSerialId: null}9583a007-7f02-4074-9e26-19bc18e29e > c5{ip: 192.168.0.104, host: 192.168.0.104, certSerialId: null}, Type:RATIS, > Factor:THREE, State:OPEN] > 2019-04-22 22:34:14,388 INFO impl.RoleInfo (RoleInfo.java:updateAndGet(143)) > - e17b7852-4691-40c7-8791-ad0b0da5201f: start FollowerState > 2019-04-22 22:34:14,388 INFO pipeline.RatisPipelineProvider > (RatisPipelineProvider.java:lambda$create$1(103)) - pipeline Pipeline[ Id: > 5383151b-d625-4362-a7dd-c0d353acaf76, Nodes: 80f16ad6-3879-4a64-a3c > 7-7719813cc139{ip: 192.168.0.104, host: 192.168.0.104, certSerialId: > null}082ce481-7fb0-4f88-ac21-82609290a6a2{ip: 192.168.0.104, host: > 192.168.0.104, certSerialId: null}dd5f5a70-0217-4577-b7a2-c42aa139d1 > 8a{ip: 192.168.0.104, host: 192.168.0.104, certSerialId: null}, Type:RATIS, > Factor:THREE, State:OPEN] > 2019-04-22 22:34:14,389 INFO pipeline.RatisPipelineProvider > (RatisPipelineProvider.java:lambda$create$1(103)) - pipeline Pipeline[ Id: > be4854e5-7933-4caa-b32e-f482cf500247, Nodes: 6e2356f1-479d-498b-876 > a-1c90623c498b{ip: 192.168.0.104, host: 192.168.0.104, certSerialId: > null}8ac46d94-9975-4eea-9448-2618c69d7bf3{ip: 192.168.0.104, host: > 192.168.0.104, certSerialId: null}a3ed36a1-44ca-47b2-b9b3-5aeef04595 > 18{ip: 192.168.0.104, host: 192.168.0.104, certSerialId: null}, Type:RATIS, > Factor:THREE, State:OPEN] > 2019-04-22 22:34:14,390 INFO pipeline.RatisPipelineProvider > (RatisPipelineProvider.java:lambda$create$1(103)) - pipeline Pipeline[ Id: > 21e368e2-f82a-4c61-9cc3-06e8de22ea6b, Nodes: > 82632040-5754-4122-b187-331879586842{ip: 192.168.0.104, host: 192.168.0.104, > certSerialId: null}923c8537-b869-4085-adcb-0a9accdcd089{ip: 192.168.0.104, > host: 192.168.0.104, certSerialId: > null}c6d790bf-e3a6-4064-acb5-f74796cd38a9{ip: 192.168.0.104, host: > 192.168.0.104, certSerialId: null}, Type:RATIS, Factor:THREE, State:OPEN] > 2019-04-22 22:34:14,390 INFO pipeline.RatisPipelineProvider > (RatisPipelineProvider.java:lambda$create$1(103)) - pipeline Pipeline[ Id: > cccbc2ed-e0e2-4578-a8a2-94f4b645be52, Nodes: > 91ae6848-a778-43be-a4a1-5855f7adc0d8{ip: 192.168.0.104, host: 192.168.0.104, > certSerialId: null}8f330a03-40e2-4bd1-9b43-5e05b13d89f0{ip: 192.168.0.104, > host: 192.168.0.104, certSerialId: > null}4f3070dc-650b-48d7-87b5-d2076104e7b4{ip: 192.168.0.104, host: > 192.168.0.104, certSerialId: null}, Type:RATIS, Factor:THREE, State:OPEN] > 2019-04-22 22:34:14,392 ERROR block.BlockManagerImpl > (BlockManagerImpl.java:allocateBlock(192)) - Pipeline creation failed for > type:RATIS factor:THREE > org.apache.hadoop.hdds.scm.pipeline.InsufficientDatanodesException: Cannot > create pipeline of factor 3 using 2 nodes 20 healthy nodes 20 all nodes. > at > org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvi
[jira] [Assigned] (HDDS-1448) RatisPipelineProvider should only consider open pipeline while excluding dn for pipeline allocation
[ https://issues.apache.org/jira/browse/HDDS-1448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aravindan Vijayan reassigned HDDS-1448: --- Assignee: Aravindan Vijayan > RatisPipelineProvider should only consider open pipeline while excluding dn > for pipeline allocation > --- > > Key: HDDS-1448 > URL: https://issues.apache.org/jira/browse/HDDS-1448 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: SCM >Affects Versions: 0.3.0 >Reporter: Mukul Kumar Singh >Assignee: Aravindan Vijayan >Priority: Major > Labels: MiniOzoneChaosCluster > > While allocation pipelines, Ratis pipeline provider considers all the > pipelines irrespective of the state of the pipeline. This can lead to case > where all the datanodes are up but the pipelines are in closing state in SCM. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14445) TestTrySendErrorReportWhenNNThrowsIOException fails in trunk
[ https://issues.apache.org/jira/browse/HDFS-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823233#comment-16823233 ] Ayush Saxena commented on HDFS-14445: - updated v2 using lambda > TestTrySendErrorReportWhenNNThrowsIOException fails in trunk > > > Key: HDFS-14445 > URL: https://issues.apache.org/jira/browse/HDFS-14445 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Attachments: HDFS-14445-01.patch, HDFS-14445-02.patch > > > {noformat} > Active namenode didn't add the report back to the queue when errorReport > threw IOException > {noformat} > Reference ::: > https://builds.apache.org/job/PreCommit-HDFS-Build/26676/testReport/org.apache.hadoop.hdfs.server.datanode/TestBPOfferService/testTrySendErrorReportWhenNNThrowsIOException/ > https://builds.apache.org/job/PreCommit-HDFS-Build/26662/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestBPOfferService/testTrySendErrorReportWhenNNThrowsIOException/ > https://builds.apache.org/job/PreCommit-HDFS-Build/26661/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestBPOfferService/testTrySendErrorReportWhenNNThrowsIOException/ > https://builds.apache.org/job/PreCommit-HDFS-Build/26644/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestBPOfferService/testTrySendErrorReportWhenNNThrowsIOException/ -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14445) TestTrySendErrorReportWhenNNThrowsIOException fails in trunk
[ https://issues.apache.org/jira/browse/HDFS-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ayush Saxena updated HDFS-14445: Attachment: HDFS-14445-02.patch > TestTrySendErrorReportWhenNNThrowsIOException fails in trunk > > > Key: HDFS-14445 > URL: https://issues.apache.org/jira/browse/HDFS-14445 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Attachments: HDFS-14445-01.patch, HDFS-14445-02.patch > > > {noformat} > Active namenode didn't add the report back to the queue when errorReport > threw IOException > {noformat} > Reference ::: > https://builds.apache.org/job/PreCommit-HDFS-Build/26676/testReport/org.apache.hadoop.hdfs.server.datanode/TestBPOfferService/testTrySendErrorReportWhenNNThrowsIOException/ > https://builds.apache.org/job/PreCommit-HDFS-Build/26662/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestBPOfferService/testTrySendErrorReportWhenNNThrowsIOException/ > https://builds.apache.org/job/PreCommit-HDFS-Build/26661/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestBPOfferService/testTrySendErrorReportWhenNNThrowsIOException/ > https://builds.apache.org/job/PreCommit-HDFS-Build/26644/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestBPOfferService/testTrySendErrorReportWhenNNThrowsIOException/ -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14406) Add per user RPC Processing time
[ https://issues.apache.org/jira/browse/HDFS-14406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823227#comment-16823227 ] Chao Sun commented on HDFS-14406: - [~xuel1]: I'm still in favor of reusing the existing {{RpcDetailedMetrics}} as most of the logic are the same between this and the proposed {{RpcUserMetrics}}. It would be good if we can use some prefix to differentiate the user metrics and the RPC method metrics. > Add per user RPC Processing time > > > Key: HDFS-14406 > URL: https://issues.apache.org/jira/browse/HDFS-14406 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.2.0 >Reporter: Xue Liu >Assignee: Xue Liu >Priority: Minor > Fix For: 3.2.0 > > Attachments: HDFS-14406.001.patch, HDFS-14406.002.patch, > HDFS-14406.003.patch, HDFS-14406.004.patch, HDFS-14406.005.patch, > HDFS-14406.006.patch > > > For a shared cluster we would want to separate users' resources, as well as > having our metrics reflecting on the usage, latency, etc, for each user. > This JIRA aims to add per user RPC processing time metrics and expose it via > JMX. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14440) RBF: Optimize the file write process in case of multiple destinations.
[ https://issues.apache.org/jira/browse/HDFS-14440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823223#comment-16823223 ] Íñigo Goiri commented on HDFS-14440: I think we have some metrics in JMX for the number of RPC queries. It might be worth evaluating the number of calls and the latency of the calls with one approach and the other. It might also be good to have the Namenode metrics here. > RBF: Optimize the file write process in case of multiple destinations. > -- > > Key: HDFS-14440 > URL: https://issues.apache.org/jira/browse/HDFS-14440 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Attachments: HDFS-14440-HDFS-13891-01.patch > > > In case of multiple destinations, We need to check if the file already exists > in one of the subclusters for which we use the existing getBlockLocation() > API which is by default a sequential Call, > In an ideal scenario where the file needs to be created each subcluster shall > be checked sequentially, this can be done concurrently to save time. > In another case where the file is found and if the last block is null, we > need to do getFileInfo to all the locations to get the location where the > file exists. This also can be prevented by use of ConcurrentCall since we > shall be having the remoteLocation to where the getBlockLocation returned a > non null entry. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1449) JVM Exit in datanode while committing a key
[ https://issues.apache.org/jira/browse/HDDS-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukul Kumar Singh updated HDDS-1449: Attachment: 2019-04-22--20-23-56-IST.MiniOzoneChaosCluster.log > JVM Exit in datanode while committing a key > --- > > Key: HDDS-1449 > URL: https://issues.apache.org/jira/browse/HDDS-1449 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Affects Versions: 0.3.0 >Reporter: Mukul Kumar Singh >Priority: Major > Labels: MiniOzoneChaosCluster > Attachments: 2019-04-22--20-23-56-IST.MiniOzoneChaosCluster.log, > hs_err_pid67466.log > > > Saw the following trace in MiniOzoneChaosCluster run. > {code} > C [librocksdbjni17271331491728127.jnilib+0x9755c] > Java_org_rocksdb_RocksDB_write0+0x1c > J 13917 org.rocksdb.RocksDB.write0(JJJ)V (0 bytes) @ 0x0001102ff62e > [0x0001102ff580+0xae] > J 17167 C2 > org.apache.hadoop.utils.RocksDBStore.writeBatch(Lorg/apache/hadoop/utils/BatchOperation;)V > (260 bytes) @ 0x000111bbd01c [0x000111bbcde0+0x23c] > J 20434 C1 > org.apache.hadoop.ozone.container.keyvalue.impl.BlockManagerImpl.putBlock(Lorg/apache/hadoop/ozone/container/common/interfaces/Container;Lorg/apache/hadoop/ozone/container/common/helpers/BlockData;)J > (261 bytes) @ 0x000111c267ac [0x000111c25640+0x116c] > J 19262 C2 > org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(Lorg/apache/hadoop/hdds/protocol/datanode/proto/ContainerProtos$ContainerCommandRequestProto;Lorg/apache/hadoop/ozone/container/common/transport/server/ratis/DispatcherContext;)Lorg/apache/hadoop/hdds/protocol/datanode/proto/ContainerProtos$ContainerCommandResponseProto; > (866 bytes) @ 0x0001125c5aa0 [0x0001125c1560+0x4540] > J 15095 C2 > org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(Lorg/apache/hadoop/hdds/protocol/datanode/proto/ContainerProtos$ContainerCommandRequestProto;Lorg/apache/hadoop/ozone/container/common/transport/server/ratis/DispatcherContext;)Lorg/apache/hadoop/hdds/protocol/datanode/proto/ContainerProtos$ContainerCommandResponseProto; > (142 bytes) @ 0x000110ffc940 [0x000110ffc0c0+0x880] > J 19301 C2 > org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatchCommand(Lorg/apache/hadoop/hdds/protocol/datanode/proto/ContainerProtos$ContainerCommandRequestProto;Lorg/apache/hadoop/ozone/container/common/transport/server/ratis/DispatcherContext;)Lorg/apache/hadoop/hdds/protocol/datanode/proto/ContainerProtos$ContainerCommandResponseProto; > (146 bytes) @ 0x000111396144 [0x000111395e60+0x2e4] > J 15997 C2 > org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine$$Lambda$776.get()Ljava/lang/Object; > (16 bytes) @ 0x000110138e54 [0x000110138d80+0xd4] > J 15970 C2 java.util.concurrent.CompletableFuture$AsyncSupply.run()V (61 > bytes) @ 0x00010fc80094 [0x00010fc8+0x94] > J 17368 C2 > java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V > (225 bytes) @ 0x000110b0a7a0 [0x000110b0a5a0+0x200] > J 7389 C1 java.util.concurrent.ThreadPoolExecutor$Worker.run()V (9 bytes) @ > 0x00011012a004 [0x000110129f00+0x104] > J 6837 C1 java.lang.Thread.run()V (17 bytes) @ 0x00011002b144 > [0x00011002b000+0x144] > v ~StubRoutines::call_stub > V [libjvm.dylib+0x2ef1f6] JavaCalls::call_helper(JavaValue*, methodHandle*, > JavaCallArguments*, Thread*)+0x6ae > V [libjvm.dylib+0x2ef99a] JavaCalls::call_virtual(JavaValue*, KlassHandle, > Symbol*, Symbol*, JavaCallArguments*, Thread*)+0x164 > V [libjvm.dylib+0x2efb46] JavaCalls::call_virtual(JavaValue*, Handle, > KlassHandle, Symbol*, Symbol*, Thread*)+0x4a > V [libjvm.dylib+0x34a46d] thread_entry(JavaThread*, Thread*)+0x7c > V [libjvm.dylib+0x56eb0f] JavaThread::thread_main_inner()+0x9b > V [libjvm.dylib+0x57020a] JavaThread::run()+0x1c2 > V [libjvm.dylib+0x48d4a6] java_start(Thread*)+0xf6 > C [libsystem_pthread.dylib+0x3305] _pthread_body+0x7e > C [libsystem_pthread.dylib+0x626f] _pthread_start+0x46 > C [libsystem_pthread.dylib+0x2415] thread_start+0xd > C 0x > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14353) Erasure Coding: metrics xmitsInProgress become to negative.
[ https://issues.apache.org/jira/browse/HDFS-14353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823205#comment-16823205 ] Íñigo Goiri commented on HDFS-14353: Do you mind fixing the checkstyle warnings? > Erasure Coding: metrics xmitsInProgress become to negative. > --- > > Key: HDFS-14353 > URL: https://issues.apache.org/jira/browse/HDFS-14353 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, erasure-coding >Affects Versions: 3.3.0 >Reporter: maobaolong >Assignee: maobaolong >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-14353.001.patch, HDFS-14353.002.patch, > screenshot-1.png > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-1449) JVM Exit in datanode while committing a key
[ https://issues.apache.org/jira/browse/HDDS-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukul Kumar Singh updated HDDS-1449: Attachment: hs_err_pid67466.log > JVM Exit in datanode while committing a key > --- > > Key: HDDS-1449 > URL: https://issues.apache.org/jira/browse/HDDS-1449 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Affects Versions: 0.3.0 >Reporter: Mukul Kumar Singh >Priority: Major > Labels: MiniOzoneChaosCluster > Attachments: hs_err_pid67466.log > > > Saw the following trace in MiniOzoneChaosCluster run. > {code} > C [librocksdbjni17271331491728127.jnilib+0x9755c] > Java_org_rocksdb_RocksDB_write0+0x1c > J 13917 org.rocksdb.RocksDB.write0(JJJ)V (0 bytes) @ 0x0001102ff62e > [0x0001102ff580+0xae] > J 17167 C2 > org.apache.hadoop.utils.RocksDBStore.writeBatch(Lorg/apache/hadoop/utils/BatchOperation;)V > (260 bytes) @ 0x000111bbd01c [0x000111bbcde0+0x23c] > J 20434 C1 > org.apache.hadoop.ozone.container.keyvalue.impl.BlockManagerImpl.putBlock(Lorg/apache/hadoop/ozone/container/common/interfaces/Container;Lorg/apache/hadoop/ozone/container/common/helpers/BlockData;)J > (261 bytes) @ 0x000111c267ac [0x000111c25640+0x116c] > J 19262 C2 > org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(Lorg/apache/hadoop/hdds/protocol/datanode/proto/ContainerProtos$ContainerCommandRequestProto;Lorg/apache/hadoop/ozone/container/common/transport/server/ratis/DispatcherContext;)Lorg/apache/hadoop/hdds/protocol/datanode/proto/ContainerProtos$ContainerCommandResponseProto; > (866 bytes) @ 0x0001125c5aa0 [0x0001125c1560+0x4540] > J 15095 C2 > org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(Lorg/apache/hadoop/hdds/protocol/datanode/proto/ContainerProtos$ContainerCommandRequestProto;Lorg/apache/hadoop/ozone/container/common/transport/server/ratis/DispatcherContext;)Lorg/apache/hadoop/hdds/protocol/datanode/proto/ContainerProtos$ContainerCommandResponseProto; > (142 bytes) @ 0x000110ffc940 [0x000110ffc0c0+0x880] > J 19301 C2 > org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatchCommand(Lorg/apache/hadoop/hdds/protocol/datanode/proto/ContainerProtos$ContainerCommandRequestProto;Lorg/apache/hadoop/ozone/container/common/transport/server/ratis/DispatcherContext;)Lorg/apache/hadoop/hdds/protocol/datanode/proto/ContainerProtos$ContainerCommandResponseProto; > (146 bytes) @ 0x000111396144 [0x000111395e60+0x2e4] > J 15997 C2 > org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine$$Lambda$776.get()Ljava/lang/Object; > (16 bytes) @ 0x000110138e54 [0x000110138d80+0xd4] > J 15970 C2 java.util.concurrent.CompletableFuture$AsyncSupply.run()V (61 > bytes) @ 0x00010fc80094 [0x00010fc8+0x94] > J 17368 C2 > java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V > (225 bytes) @ 0x000110b0a7a0 [0x000110b0a5a0+0x200] > J 7389 C1 java.util.concurrent.ThreadPoolExecutor$Worker.run()V (9 bytes) @ > 0x00011012a004 [0x000110129f00+0x104] > J 6837 C1 java.lang.Thread.run()V (17 bytes) @ 0x00011002b144 > [0x00011002b000+0x144] > v ~StubRoutines::call_stub > V [libjvm.dylib+0x2ef1f6] JavaCalls::call_helper(JavaValue*, methodHandle*, > JavaCallArguments*, Thread*)+0x6ae > V [libjvm.dylib+0x2ef99a] JavaCalls::call_virtual(JavaValue*, KlassHandle, > Symbol*, Symbol*, JavaCallArguments*, Thread*)+0x164 > V [libjvm.dylib+0x2efb46] JavaCalls::call_virtual(JavaValue*, Handle, > KlassHandle, Symbol*, Symbol*, Thread*)+0x4a > V [libjvm.dylib+0x34a46d] thread_entry(JavaThread*, Thread*)+0x7c > V [libjvm.dylib+0x56eb0f] JavaThread::thread_main_inner()+0x9b > V [libjvm.dylib+0x57020a] JavaThread::run()+0x1c2 > V [libjvm.dylib+0x48d4a6] java_start(Thread*)+0xf6 > C [libsystem_pthread.dylib+0x3305] _pthread_body+0x7e > C [libsystem_pthread.dylib+0x626f] _pthread_start+0x46 > C [libsystem_pthread.dylib+0x2415] thread_start+0xd > C 0x > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-1450) Fix nightly run failures after HDDS-976
Xiaoyu Yao created HDDS-1450: Summary: Fix nightly run failures after HDDS-976 Key: HDDS-1450 URL: https://issues.apache.org/jira/browse/HDDS-1450 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Xiaoyu Yao Assignee: Xiaoyu Yao [https://ci.anzix.net/job/ozone-nightly/72/testReport/] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14445) TestTrySendErrorReportWhenNNThrowsIOException fails in trunk
[ https://issues.apache.org/jira/browse/HDFS-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823203#comment-16823203 ] Íñigo Goiri commented on HDFS-14445: Removing a sleep() is always good. Can we use a lambda for the waitFor? > TestTrySendErrorReportWhenNNThrowsIOException fails in trunk > > > Key: HDFS-14445 > URL: https://issues.apache.org/jira/browse/HDFS-14445 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Attachments: HDFS-14445-01.patch > > > {noformat} > Active namenode didn't add the report back to the queue when errorReport > threw IOException > {noformat} > Reference ::: > https://builds.apache.org/job/PreCommit-HDFS-Build/26676/testReport/org.apache.hadoop.hdfs.server.datanode/TestBPOfferService/testTrySendErrorReportWhenNNThrowsIOException/ > https://builds.apache.org/job/PreCommit-HDFS-Build/26662/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestBPOfferService/testTrySendErrorReportWhenNNThrowsIOException/ > https://builds.apache.org/job/PreCommit-HDFS-Build/26661/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestBPOfferService/testTrySendErrorReportWhenNNThrowsIOException/ > https://builds.apache.org/job/PreCommit-HDFS-Build/26644/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestBPOfferService/testTrySendErrorReportWhenNNThrowsIOException/ -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org