[jira] [Commented] (HDFS-14437) Exception happened when rollEditLog expects empty EditsDoubleBuffer.bufCurrent but not

2019-04-22 Thread star (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823750#comment-16823750
 ] 

star commented on HDFS-14437:
-

ok, Later I will take a view.

> Exception happened when   rollEditLog expects empty 
> EditsDoubleBuffer.bufCurrent  but not
> -
>
> Key: HDFS-14437
> URL: https://issues.apache.org/jira/browse/HDFS-14437
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, namenode, qjm
>Reporter: angerszhu
>Priority: Major
>
> For the problem mentioned in https://issues.apache.org/jira/browse/HDFS-10943 
> , I have sort the process of write and flush EditLog and some important 
> function, I found the in the class  FSEditLog class, the close() function 
> will call such process like below:
>  
> {code:java}
> waitForSyncToFinish();
> endCurrentLogSegment(true);{code}
> since we have gain the object lock in the function close(), so when  
> waitForSyncToFish() method return, it mean all logSync job has done and all 
> data in bufReady has been flushed out, and since current thread has the lock 
> of this object, when call endCurrentLogSegment(), no other thread will gain 
> the lock so they can't write new editlog into currentBuf.
> But when we don't call waitForSyncToFish() before endCurrentLogSegment(), 
> there may be some autoScheduled logSync()'s flush process is doing, since 
> this process don't need
> synchronization since it has mention in the comment of logSync() method :
>  
> {code:java}
> /**
>  * Sync all modifications done by this thread.
>  *
>  * The internal concurrency design of this class is as follows:
>  *   - Log items are written synchronized into an in-memory buffer,
>  * and each assigned a transaction ID.
>  *   - When a thread (client) would like to sync all of its edits, logSync()
>  * uses a ThreadLocal transaction ID to determine what edit number must
>  * be synced to.
>  *   - The isSyncRunning volatile boolean tracks whether a sync is currently
>  * under progress.
>  *
>  * The data is double-buffered within each edit log implementation so that
>  * in-memory writing can occur in parallel with the on-disk writing.
>  *
>  * Each sync occurs in three steps:
>  *   1. synchronized, it swaps the double buffer and sets the isSyncRunning
>  *  flag.
>  *   2. unsynchronized, it flushes the data to storage
>  *   3. synchronized, it resets the flag and notifies anyone waiting on the
>  *  sync.
>  *
>  * The lack of synchronization on step 2 allows other threads to continue
>  * to write into the memory buffer while the sync is in progress.
>  * Because this step is unsynchronized, actions that need to avoid
>  * concurrency with sync() should be synchronized and also call
>  * waitForSyncToFinish() before assuming they are running alone.
>  */
> public void logSync() {
>   long syncStart = 0;
>   // Fetch the transactionId of this thread. 
>   long mytxid = myTransactionId.get().txid;
>   
>   boolean sync = false;
>   try {
> EditLogOutputStream logStream = null;
> synchronized (this) {
>   try {
> printStatistics(false);
> // if somebody is already syncing, then wait
> while (mytxid > synctxid && isSyncRunning) {
>   try {
> wait(1000);
>   } catch (InterruptedException ie) {
>   }
> }
> //
> // If this transaction was already flushed, then nothing to do
> //
> if (mytxid <= synctxid) {
>   numTransactionsBatchedInSync++;
>   if (metrics != null) {
> // Metrics is non-null only when used inside name node
> metrics.incrTransactionsBatchedInSync();
>   }
>   return;
> }
>
> // now, this thread will do the sync
> syncStart = txid;
> isSyncRunning = true;
> sync = true;
> // swap buffers
> try {
>   if (journalSet.isEmpty()) {
> throw new IOException("No journals available to flush");
>   }
>   editLogStream.setReadyToFlush();
> } catch (IOException e) {
>   final String msg =
>   "Could not sync enough journals to persistent storage " +
>   "due to " + e.getMessage() + ". " +
>   "Unsynced transactions: " + (txid - synctxid);
>   LOG.fatal(msg, new Exception());
>   synchronized(journalSetLock) {
> IOUtils.cleanup(LOG, journalSet);
>   }
>   terminate(1, msg);
> }
>   } finally {
> // Prevent RuntimeException from blocking other log edit write 
> doneWithAutoSyncScheduling();
>   }
>   //editLogStream may become null,
>   //so store a local variable for flush.
>  

[jira] [Comment Edited] (HDFS-14437) Exception happened when rollEditLog expects empty EditsDoubleBuffer.bufCurrent but not

2019-04-22 Thread star (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823714#comment-16823714
 ] 

star edited comment on HDFS-14437 at 4/23/19 6:55 AM:
--

[~angerszhuuu], I think I've got that.

 
||Thread0||thread1||Thread2||locked||isSyncRunning||
| |flush| |false|true|
|endCurrentSegment#{color:#33}logSyncAll{color}| | |true|true|
|wait| | |false|true|
| |flush done| |false|false|
| | |*logEdit*|true|false|
|{color:#d04437}finalize error{color}| | |true|false|

 

 


was (Author: starphin):
[~angerszhuuu], I think I've got that.

 
||Thread0||thread1||Thread2||locked||isSyncRunning||
| |flush| |false|true|
|endCurrentSegment#{color:#33}logSyncAll{color}| | |true|true|
|wait| | |false|true|
| |flush done| |false|false|
| | |*logAppend*|true|false|
|{color:#d04437}finalize error{color}| | |true|false|

 

 

> Exception happened when   rollEditLog expects empty 
> EditsDoubleBuffer.bufCurrent  but not
> -
>
> Key: HDFS-14437
> URL: https://issues.apache.org/jira/browse/HDFS-14437
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, namenode, qjm
>Reporter: angerszhu
>Priority: Major
>
> For the problem mentioned in https://issues.apache.org/jira/browse/HDFS-10943 
> , I have sort the process of write and flush EditLog and some important 
> function, I found the in the class  FSEditLog class, the close() function 
> will call such process like below:
>  
> {code:java}
> waitForSyncToFinish();
> endCurrentLogSegment(true);{code}
> since we have gain the object lock in the function close(), so when  
> waitForSyncToFish() method return, it mean all logSync job has done and all 
> data in bufReady has been flushed out, and since current thread has the lock 
> of this object, when call endCurrentLogSegment(), no other thread will gain 
> the lock so they can't write new editlog into currentBuf.
> But when we don't call waitForSyncToFish() before endCurrentLogSegment(), 
> there may be some autoScheduled logSync()'s flush process is doing, since 
> this process don't need
> synchronization since it has mention in the comment of logSync() method :
>  
> {code:java}
> /**
>  * Sync all modifications done by this thread.
>  *
>  * The internal concurrency design of this class is as follows:
>  *   - Log items are written synchronized into an in-memory buffer,
>  * and each assigned a transaction ID.
>  *   - When a thread (client) would like to sync all of its edits, logSync()
>  * uses a ThreadLocal transaction ID to determine what edit number must
>  * be synced to.
>  *   - The isSyncRunning volatile boolean tracks whether a sync is currently
>  * under progress.
>  *
>  * The data is double-buffered within each edit log implementation so that
>  * in-memory writing can occur in parallel with the on-disk writing.
>  *
>  * Each sync occurs in three steps:
>  *   1. synchronized, it swaps the double buffer and sets the isSyncRunning
>  *  flag.
>  *   2. unsynchronized, it flushes the data to storage
>  *   3. synchronized, it resets the flag and notifies anyone waiting on the
>  *  sync.
>  *
>  * The lack of synchronization on step 2 allows other threads to continue
>  * to write into the memory buffer while the sync is in progress.
>  * Because this step is unsynchronized, actions that need to avoid
>  * concurrency with sync() should be synchronized and also call
>  * waitForSyncToFinish() before assuming they are running alone.
>  */
> public void logSync() {
>   long syncStart = 0;
>   // Fetch the transactionId of this thread. 
>   long mytxid = myTransactionId.get().txid;
>   
>   boolean sync = false;
>   try {
> EditLogOutputStream logStream = null;
> synchronized (this) {
>   try {
> printStatistics(false);
> // if somebody is already syncing, then wait
> while (mytxid > synctxid && isSyncRunning) {
>   try {
> wait(1000);
>   } catch (InterruptedException ie) {
>   }
> }
> //
> // If this transaction was already flushed, then nothing to do
> //
> if (mytxid <= synctxid) {
>   numTransactionsBatchedInSync++;
>   if (metrics != null) {
> // Metrics is non-null only when used inside name node
> metrics.incrTransactionsBatchedInSync();
>   }
>   return;
> }
>
> // now, this thread will do the sync
> syncStart = txid;
> isSyncRunning = true;
> sync = true;
> // swap buffers
> try {
>   if (journalSet.isEmpty()) {
> throw new IOException("No journals available to flush");
>   }
>   editLogStream.setReadyToFlus

[jira] [Commented] (HDFS-14437) Exception happened when rollEditLog expects empty EditsDoubleBuffer.bufCurrent but not

2019-04-22 Thread angerszhu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823746#comment-16823746
 ] 

angerszhu commented on HDFS-14437:
--

[~starphin]

You can see my pull request, two way both can solve this problem , but I advise 
the way in pull request.

> Exception happened when   rollEditLog expects empty 
> EditsDoubleBuffer.bufCurrent  but not
> -
>
> Key: HDFS-14437
> URL: https://issues.apache.org/jira/browse/HDFS-14437
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, namenode, qjm
>Reporter: angerszhu
>Priority: Major
>
> For the problem mentioned in https://issues.apache.org/jira/browse/HDFS-10943 
> , I have sort the process of write and flush EditLog and some important 
> function, I found the in the class  FSEditLog class, the close() function 
> will call such process like below:
>  
> {code:java}
> waitForSyncToFinish();
> endCurrentLogSegment(true);{code}
> since we have gain the object lock in the function close(), so when  
> waitForSyncToFish() method return, it mean all logSync job has done and all 
> data in bufReady has been flushed out, and since current thread has the lock 
> of this object, when call endCurrentLogSegment(), no other thread will gain 
> the lock so they can't write new editlog into currentBuf.
> But when we don't call waitForSyncToFish() before endCurrentLogSegment(), 
> there may be some autoScheduled logSync()'s flush process is doing, since 
> this process don't need
> synchronization since it has mention in the comment of logSync() method :
>  
> {code:java}
> /**
>  * Sync all modifications done by this thread.
>  *
>  * The internal concurrency design of this class is as follows:
>  *   - Log items are written synchronized into an in-memory buffer,
>  * and each assigned a transaction ID.
>  *   - When a thread (client) would like to sync all of its edits, logSync()
>  * uses a ThreadLocal transaction ID to determine what edit number must
>  * be synced to.
>  *   - The isSyncRunning volatile boolean tracks whether a sync is currently
>  * under progress.
>  *
>  * The data is double-buffered within each edit log implementation so that
>  * in-memory writing can occur in parallel with the on-disk writing.
>  *
>  * Each sync occurs in three steps:
>  *   1. synchronized, it swaps the double buffer and sets the isSyncRunning
>  *  flag.
>  *   2. unsynchronized, it flushes the data to storage
>  *   3. synchronized, it resets the flag and notifies anyone waiting on the
>  *  sync.
>  *
>  * The lack of synchronization on step 2 allows other threads to continue
>  * to write into the memory buffer while the sync is in progress.
>  * Because this step is unsynchronized, actions that need to avoid
>  * concurrency with sync() should be synchronized and also call
>  * waitForSyncToFinish() before assuming they are running alone.
>  */
> public void logSync() {
>   long syncStart = 0;
>   // Fetch the transactionId of this thread. 
>   long mytxid = myTransactionId.get().txid;
>   
>   boolean sync = false;
>   try {
> EditLogOutputStream logStream = null;
> synchronized (this) {
>   try {
> printStatistics(false);
> // if somebody is already syncing, then wait
> while (mytxid > synctxid && isSyncRunning) {
>   try {
> wait(1000);
>   } catch (InterruptedException ie) {
>   }
> }
> //
> // If this transaction was already flushed, then nothing to do
> //
> if (mytxid <= synctxid) {
>   numTransactionsBatchedInSync++;
>   if (metrics != null) {
> // Metrics is non-null only when used inside name node
> metrics.incrTransactionsBatchedInSync();
>   }
>   return;
> }
>
> // now, this thread will do the sync
> syncStart = txid;
> isSyncRunning = true;
> sync = true;
> // swap buffers
> try {
>   if (journalSet.isEmpty()) {
> throw new IOException("No journals available to flush");
>   }
>   editLogStream.setReadyToFlush();
> } catch (IOException e) {
>   final String msg =
>   "Could not sync enough journals to persistent storage " +
>   "due to " + e.getMessage() + ". " +
>   "Unsynced transactions: " + (txid - synctxid);
>   LOG.fatal(msg, new Exception());
>   synchronized(journalSetLock) {
> IOUtils.cleanup(LOG, journalSet);
>   }
>   terminate(1, msg);
> }
>   } finally {
> // Prevent RuntimeException from blocking other log edit write 
> doneWithAutoSyncScheduling();
> 

[jira] [Commented] (HDFS-14437) Exception happened when rollEditLog expects empty EditsDoubleBuffer.bufCurrent but not

2019-04-22 Thread star (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823742#comment-16823742
 ] 

star commented on HDFS-14437:
-

[~angerszhuuu] Right. lock state fixed.
{quote}But logAppend also need lock.
{quote}

> Exception happened when   rollEditLog expects empty 
> EditsDoubleBuffer.bufCurrent  but not
> -
>
> Key: HDFS-14437
> URL: https://issues.apache.org/jira/browse/HDFS-14437
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, namenode, qjm
>Reporter: angerszhu
>Priority: Major
>
> For the problem mentioned in https://issues.apache.org/jira/browse/HDFS-10943 
> , I have sort the process of write and flush EditLog and some important 
> function, I found the in the class  FSEditLog class, the close() function 
> will call such process like below:
>  
> {code:java}
> waitForSyncToFinish();
> endCurrentLogSegment(true);{code}
> since we have gain the object lock in the function close(), so when  
> waitForSyncToFish() method return, it mean all logSync job has done and all 
> data in bufReady has been flushed out, and since current thread has the lock 
> of this object, when call endCurrentLogSegment(), no other thread will gain 
> the lock so they can't write new editlog into currentBuf.
> But when we don't call waitForSyncToFish() before endCurrentLogSegment(), 
> there may be some autoScheduled logSync()'s flush process is doing, since 
> this process don't need
> synchronization since it has mention in the comment of logSync() method :
>  
> {code:java}
> /**
>  * Sync all modifications done by this thread.
>  *
>  * The internal concurrency design of this class is as follows:
>  *   - Log items are written synchronized into an in-memory buffer,
>  * and each assigned a transaction ID.
>  *   - When a thread (client) would like to sync all of its edits, logSync()
>  * uses a ThreadLocal transaction ID to determine what edit number must
>  * be synced to.
>  *   - The isSyncRunning volatile boolean tracks whether a sync is currently
>  * under progress.
>  *
>  * The data is double-buffered within each edit log implementation so that
>  * in-memory writing can occur in parallel with the on-disk writing.
>  *
>  * Each sync occurs in three steps:
>  *   1. synchronized, it swaps the double buffer and sets the isSyncRunning
>  *  flag.
>  *   2. unsynchronized, it flushes the data to storage
>  *   3. synchronized, it resets the flag and notifies anyone waiting on the
>  *  sync.
>  *
>  * The lack of synchronization on step 2 allows other threads to continue
>  * to write into the memory buffer while the sync is in progress.
>  * Because this step is unsynchronized, actions that need to avoid
>  * concurrency with sync() should be synchronized and also call
>  * waitForSyncToFinish() before assuming they are running alone.
>  */
> public void logSync() {
>   long syncStart = 0;
>   // Fetch the transactionId of this thread. 
>   long mytxid = myTransactionId.get().txid;
>   
>   boolean sync = false;
>   try {
> EditLogOutputStream logStream = null;
> synchronized (this) {
>   try {
> printStatistics(false);
> // if somebody is already syncing, then wait
> while (mytxid > synctxid && isSyncRunning) {
>   try {
> wait(1000);
>   } catch (InterruptedException ie) {
>   }
> }
> //
> // If this transaction was already flushed, then nothing to do
> //
> if (mytxid <= synctxid) {
>   numTransactionsBatchedInSync++;
>   if (metrics != null) {
> // Metrics is non-null only when used inside name node
> metrics.incrTransactionsBatchedInSync();
>   }
>   return;
> }
>
> // now, this thread will do the sync
> syncStart = txid;
> isSyncRunning = true;
> sync = true;
> // swap buffers
> try {
>   if (journalSet.isEmpty()) {
> throw new IOException("No journals available to flush");
>   }
>   editLogStream.setReadyToFlush();
> } catch (IOException e) {
>   final String msg =
>   "Could not sync enough journals to persistent storage " +
>   "due to " + e.getMessage() + ". " +
>   "Unsynced transactions: " + (txid - synctxid);
>   LOG.fatal(msg, new Exception());
>   synchronized(journalSetLock) {
> IOUtils.cleanup(LOG, journalSet);
>   }
>   terminate(1, msg);
> }
>   } finally {
> // Prevent RuntimeException from blocking other log edit write 
> doneWithAutoSyncScheduling();
>   }
>   //editLogStream may become null,

[jira] [Comment Edited] (HDFS-14437) Exception happened when rollEditLog expects empty EditsDoubleBuffer.bufCurrent but not

2019-04-22 Thread star (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823714#comment-16823714
 ] 

star edited comment on HDFS-14437 at 4/23/19 6:38 AM:
--

[~angerszhuuu], I think I've got that.

 
||Thread0||thread1||Thread2||locked||isSyncRunning||
| |flush| |false|true|
|endCurrentSegment#{color:#33}logSyncAll{color}| | |true|true|
|wait| | |false|true|
| |flush done| |false|false|
| | |*logAppend*|true|false|
|{color:#d04437}finalize error{color}| | |true|false|

 

 


was (Author: starphin):
[~angerszhuuu], I think I've got that. 

 
||Thread0||thread1||Thread2||locked||isSyncRunning||
| |flush| |false|true|
|endCurrentSegment#{color:#33}logSyncAll{color}| | |true|true|
|wait| | |false|true|
| |flush done| |false|false|
| | |*logAppend*|false|false|
|{color:#d04437}finalize error{color}| | |true|false|

 

 

> Exception happened when   rollEditLog expects empty 
> EditsDoubleBuffer.bufCurrent  but not
> -
>
> Key: HDFS-14437
> URL: https://issues.apache.org/jira/browse/HDFS-14437
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, namenode, qjm
>Reporter: angerszhu
>Priority: Major
>
> For the problem mentioned in https://issues.apache.org/jira/browse/HDFS-10943 
> , I have sort the process of write and flush EditLog and some important 
> function, I found the in the class  FSEditLog class, the close() function 
> will call such process like below:
>  
> {code:java}
> waitForSyncToFinish();
> endCurrentLogSegment(true);{code}
> since we have gain the object lock in the function close(), so when  
> waitForSyncToFish() method return, it mean all logSync job has done and all 
> data in bufReady has been flushed out, and since current thread has the lock 
> of this object, when call endCurrentLogSegment(), no other thread will gain 
> the lock so they can't write new editlog into currentBuf.
> But when we don't call waitForSyncToFish() before endCurrentLogSegment(), 
> there may be some autoScheduled logSync()'s flush process is doing, since 
> this process don't need
> synchronization since it has mention in the comment of logSync() method :
>  
> {code:java}
> /**
>  * Sync all modifications done by this thread.
>  *
>  * The internal concurrency design of this class is as follows:
>  *   - Log items are written synchronized into an in-memory buffer,
>  * and each assigned a transaction ID.
>  *   - When a thread (client) would like to sync all of its edits, logSync()
>  * uses a ThreadLocal transaction ID to determine what edit number must
>  * be synced to.
>  *   - The isSyncRunning volatile boolean tracks whether a sync is currently
>  * under progress.
>  *
>  * The data is double-buffered within each edit log implementation so that
>  * in-memory writing can occur in parallel with the on-disk writing.
>  *
>  * Each sync occurs in three steps:
>  *   1. synchronized, it swaps the double buffer and sets the isSyncRunning
>  *  flag.
>  *   2. unsynchronized, it flushes the data to storage
>  *   3. synchronized, it resets the flag and notifies anyone waiting on the
>  *  sync.
>  *
>  * The lack of synchronization on step 2 allows other threads to continue
>  * to write into the memory buffer while the sync is in progress.
>  * Because this step is unsynchronized, actions that need to avoid
>  * concurrency with sync() should be synchronized and also call
>  * waitForSyncToFinish() before assuming they are running alone.
>  */
> public void logSync() {
>   long syncStart = 0;
>   // Fetch the transactionId of this thread. 
>   long mytxid = myTransactionId.get().txid;
>   
>   boolean sync = false;
>   try {
> EditLogOutputStream logStream = null;
> synchronized (this) {
>   try {
> printStatistics(false);
> // if somebody is already syncing, then wait
> while (mytxid > synctxid && isSyncRunning) {
>   try {
> wait(1000);
>   } catch (InterruptedException ie) {
>   }
> }
> //
> // If this transaction was already flushed, then nothing to do
> //
> if (mytxid <= synctxid) {
>   numTransactionsBatchedInSync++;
>   if (metrics != null) {
> // Metrics is non-null only when used inside name node
> metrics.incrTransactionsBatchedInSync();
>   }
>   return;
> }
>
> // now, this thread will do the sync
> syncStart = txid;
> isSyncRunning = true;
> sync = true;
> // swap buffers
> try {
>   if (journalSet.isEmpty()) {
> throw new IOException("No journals available to flush");
>   }
>   editLogStream.setReadyTo

[jira] [Commented] (HDFS-14437) Exception happened when rollEditLog expects empty EditsDoubleBuffer.bufCurrent but not

2019-04-22 Thread angerszhu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823732#comment-16823732
 ] 

angerszhu commented on HDFS-14437:
--

[~starphin]

But logAppend also need lock.

What you show is just the situation I want to say.

lol

> Exception happened when   rollEditLog expects empty 
> EditsDoubleBuffer.bufCurrent  but not
> -
>
> Key: HDFS-14437
> URL: https://issues.apache.org/jira/browse/HDFS-14437
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, namenode, qjm
>Reporter: angerszhu
>Priority: Major
>
> For the problem mentioned in https://issues.apache.org/jira/browse/HDFS-10943 
> , I have sort the process of write and flush EditLog and some important 
> function, I found the in the class  FSEditLog class, the close() function 
> will call such process like below:
>  
> {code:java}
> waitForSyncToFinish();
> endCurrentLogSegment(true);{code}
> since we have gain the object lock in the function close(), so when  
> waitForSyncToFish() method return, it mean all logSync job has done and all 
> data in bufReady has been flushed out, and since current thread has the lock 
> of this object, when call endCurrentLogSegment(), no other thread will gain 
> the lock so they can't write new editlog into currentBuf.
> But when we don't call waitForSyncToFish() before endCurrentLogSegment(), 
> there may be some autoScheduled logSync()'s flush process is doing, since 
> this process don't need
> synchronization since it has mention in the comment of logSync() method :
>  
> {code:java}
> /**
>  * Sync all modifications done by this thread.
>  *
>  * The internal concurrency design of this class is as follows:
>  *   - Log items are written synchronized into an in-memory buffer,
>  * and each assigned a transaction ID.
>  *   - When a thread (client) would like to sync all of its edits, logSync()
>  * uses a ThreadLocal transaction ID to determine what edit number must
>  * be synced to.
>  *   - The isSyncRunning volatile boolean tracks whether a sync is currently
>  * under progress.
>  *
>  * The data is double-buffered within each edit log implementation so that
>  * in-memory writing can occur in parallel with the on-disk writing.
>  *
>  * Each sync occurs in three steps:
>  *   1. synchronized, it swaps the double buffer and sets the isSyncRunning
>  *  flag.
>  *   2. unsynchronized, it flushes the data to storage
>  *   3. synchronized, it resets the flag and notifies anyone waiting on the
>  *  sync.
>  *
>  * The lack of synchronization on step 2 allows other threads to continue
>  * to write into the memory buffer while the sync is in progress.
>  * Because this step is unsynchronized, actions that need to avoid
>  * concurrency with sync() should be synchronized and also call
>  * waitForSyncToFinish() before assuming they are running alone.
>  */
> public void logSync() {
>   long syncStart = 0;
>   // Fetch the transactionId of this thread. 
>   long mytxid = myTransactionId.get().txid;
>   
>   boolean sync = false;
>   try {
> EditLogOutputStream logStream = null;
> synchronized (this) {
>   try {
> printStatistics(false);
> // if somebody is already syncing, then wait
> while (mytxid > synctxid && isSyncRunning) {
>   try {
> wait(1000);
>   } catch (InterruptedException ie) {
>   }
> }
> //
> // If this transaction was already flushed, then nothing to do
> //
> if (mytxid <= synctxid) {
>   numTransactionsBatchedInSync++;
>   if (metrics != null) {
> // Metrics is non-null only when used inside name node
> metrics.incrTransactionsBatchedInSync();
>   }
>   return;
> }
>
> // now, this thread will do the sync
> syncStart = txid;
> isSyncRunning = true;
> sync = true;
> // swap buffers
> try {
>   if (journalSet.isEmpty()) {
> throw new IOException("No journals available to flush");
>   }
>   editLogStream.setReadyToFlush();
> } catch (IOException e) {
>   final String msg =
>   "Could not sync enough journals to persistent storage " +
>   "due to " + e.getMessage() + ". " +
>   "Unsynced transactions: " + (txid - synctxid);
>   LOG.fatal(msg, new Exception());
>   synchronized(journalSetLock) {
> IOUtils.cleanup(LOG, journalSet);
>   }
>   terminate(1, msg);
> }
>   } finally {
> // Prevent RuntimeException from blocking other log edit write 
> doneWithAutoSyncScheduling();
>   }
>   //editL

[jira] [Commented] (HDFS-14437) Exception happened when rollEditLog expects empty EditsDoubleBuffer.bufCurrent but not

2019-04-22 Thread star (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823726#comment-16823726
 ] 

star commented on HDFS-14437:
-

locked is object lock of FSEditLog.

> Exception happened when   rollEditLog expects empty 
> EditsDoubleBuffer.bufCurrent  but not
> -
>
> Key: HDFS-14437
> URL: https://issues.apache.org/jira/browse/HDFS-14437
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, namenode, qjm
>Reporter: angerszhu
>Priority: Major
>
> For the problem mentioned in https://issues.apache.org/jira/browse/HDFS-10943 
> , I have sort the process of write and flush EditLog and some important 
> function, I found the in the class  FSEditLog class, the close() function 
> will call such process like below:
>  
> {code:java}
> waitForSyncToFinish();
> endCurrentLogSegment(true);{code}
> since we have gain the object lock in the function close(), so when  
> waitForSyncToFish() method return, it mean all logSync job has done and all 
> data in bufReady has been flushed out, and since current thread has the lock 
> of this object, when call endCurrentLogSegment(), no other thread will gain 
> the lock so they can't write new editlog into currentBuf.
> But when we don't call waitForSyncToFish() before endCurrentLogSegment(), 
> there may be some autoScheduled logSync()'s flush process is doing, since 
> this process don't need
> synchronization since it has mention in the comment of logSync() method :
>  
> {code:java}
> /**
>  * Sync all modifications done by this thread.
>  *
>  * The internal concurrency design of this class is as follows:
>  *   - Log items are written synchronized into an in-memory buffer,
>  * and each assigned a transaction ID.
>  *   - When a thread (client) would like to sync all of its edits, logSync()
>  * uses a ThreadLocal transaction ID to determine what edit number must
>  * be synced to.
>  *   - The isSyncRunning volatile boolean tracks whether a sync is currently
>  * under progress.
>  *
>  * The data is double-buffered within each edit log implementation so that
>  * in-memory writing can occur in parallel with the on-disk writing.
>  *
>  * Each sync occurs in three steps:
>  *   1. synchronized, it swaps the double buffer and sets the isSyncRunning
>  *  flag.
>  *   2. unsynchronized, it flushes the data to storage
>  *   3. synchronized, it resets the flag and notifies anyone waiting on the
>  *  sync.
>  *
>  * The lack of synchronization on step 2 allows other threads to continue
>  * to write into the memory buffer while the sync is in progress.
>  * Because this step is unsynchronized, actions that need to avoid
>  * concurrency with sync() should be synchronized and also call
>  * waitForSyncToFinish() before assuming they are running alone.
>  */
> public void logSync() {
>   long syncStart = 0;
>   // Fetch the transactionId of this thread. 
>   long mytxid = myTransactionId.get().txid;
>   
>   boolean sync = false;
>   try {
> EditLogOutputStream logStream = null;
> synchronized (this) {
>   try {
> printStatistics(false);
> // if somebody is already syncing, then wait
> while (mytxid > synctxid && isSyncRunning) {
>   try {
> wait(1000);
>   } catch (InterruptedException ie) {
>   }
> }
> //
> // If this transaction was already flushed, then nothing to do
> //
> if (mytxid <= synctxid) {
>   numTransactionsBatchedInSync++;
>   if (metrics != null) {
> // Metrics is non-null only when used inside name node
> metrics.incrTransactionsBatchedInSync();
>   }
>   return;
> }
>
> // now, this thread will do the sync
> syncStart = txid;
> isSyncRunning = true;
> sync = true;
> // swap buffers
> try {
>   if (journalSet.isEmpty()) {
> throw new IOException("No journals available to flush");
>   }
>   editLogStream.setReadyToFlush();
> } catch (IOException e) {
>   final String msg =
>   "Could not sync enough journals to persistent storage " +
>   "due to " + e.getMessage() + ". " +
>   "Unsynced transactions: " + (txid - synctxid);
>   LOG.fatal(msg, new Exception());
>   synchronized(journalSetLock) {
> IOUtils.cleanup(LOG, journalSet);
>   }
>   terminate(1, msg);
> }
>   } finally {
> // Prevent RuntimeException from blocking other log edit write 
> doneWithAutoSyncScheduling();
>   }
>   //editLogStream may become null,
>   //so store a local variable for flush.
>

[jira] [Comment Edited] (HDFS-14437) Exception happened when rollEditLog expects empty EditsDoubleBuffer.bufCurrent but not

2019-04-22 Thread star (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823726#comment-16823726
 ] 

star edited comment on HDFS-14437 at 4/23/19 6:24 AM:
--

locked is the state of object lock for FSEditLog.


was (Author: starphin):
locked is object lock of FSEditLog.

> Exception happened when   rollEditLog expects empty 
> EditsDoubleBuffer.bufCurrent  but not
> -
>
> Key: HDFS-14437
> URL: https://issues.apache.org/jira/browse/HDFS-14437
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, namenode, qjm
>Reporter: angerszhu
>Priority: Major
>
> For the problem mentioned in https://issues.apache.org/jira/browse/HDFS-10943 
> , I have sort the process of write and flush EditLog and some important 
> function, I found the in the class  FSEditLog class, the close() function 
> will call such process like below:
>  
> {code:java}
> waitForSyncToFinish();
> endCurrentLogSegment(true);{code}
> since we have gain the object lock in the function close(), so when  
> waitForSyncToFish() method return, it mean all logSync job has done and all 
> data in bufReady has been flushed out, and since current thread has the lock 
> of this object, when call endCurrentLogSegment(), no other thread will gain 
> the lock so they can't write new editlog into currentBuf.
> But when we don't call waitForSyncToFish() before endCurrentLogSegment(), 
> there may be some autoScheduled logSync()'s flush process is doing, since 
> this process don't need
> synchronization since it has mention in the comment of logSync() method :
>  
> {code:java}
> /**
>  * Sync all modifications done by this thread.
>  *
>  * The internal concurrency design of this class is as follows:
>  *   - Log items are written synchronized into an in-memory buffer,
>  * and each assigned a transaction ID.
>  *   - When a thread (client) would like to sync all of its edits, logSync()
>  * uses a ThreadLocal transaction ID to determine what edit number must
>  * be synced to.
>  *   - The isSyncRunning volatile boolean tracks whether a sync is currently
>  * under progress.
>  *
>  * The data is double-buffered within each edit log implementation so that
>  * in-memory writing can occur in parallel with the on-disk writing.
>  *
>  * Each sync occurs in three steps:
>  *   1. synchronized, it swaps the double buffer and sets the isSyncRunning
>  *  flag.
>  *   2. unsynchronized, it flushes the data to storage
>  *   3. synchronized, it resets the flag and notifies anyone waiting on the
>  *  sync.
>  *
>  * The lack of synchronization on step 2 allows other threads to continue
>  * to write into the memory buffer while the sync is in progress.
>  * Because this step is unsynchronized, actions that need to avoid
>  * concurrency with sync() should be synchronized and also call
>  * waitForSyncToFinish() before assuming they are running alone.
>  */
> public void logSync() {
>   long syncStart = 0;
>   // Fetch the transactionId of this thread. 
>   long mytxid = myTransactionId.get().txid;
>   
>   boolean sync = false;
>   try {
> EditLogOutputStream logStream = null;
> synchronized (this) {
>   try {
> printStatistics(false);
> // if somebody is already syncing, then wait
> while (mytxid > synctxid && isSyncRunning) {
>   try {
> wait(1000);
>   } catch (InterruptedException ie) {
>   }
> }
> //
> // If this transaction was already flushed, then nothing to do
> //
> if (mytxid <= synctxid) {
>   numTransactionsBatchedInSync++;
>   if (metrics != null) {
> // Metrics is non-null only when used inside name node
> metrics.incrTransactionsBatchedInSync();
>   }
>   return;
> }
>
> // now, this thread will do the sync
> syncStart = txid;
> isSyncRunning = true;
> sync = true;
> // swap buffers
> try {
>   if (journalSet.isEmpty()) {
> throw new IOException("No journals available to flush");
>   }
>   editLogStream.setReadyToFlush();
> } catch (IOException e) {
>   final String msg =
>   "Could not sync enough journals to persistent storage " +
>   "due to " + e.getMessage() + ". " +
>   "Unsynced transactions: " + (txid - synctxid);
>   LOG.fatal(msg, new Exception());
>   synchronized(journalSetLock) {
> IOUtils.cleanup(LOG, journalSet);
>   }
>   terminate(1, msg);
> }
>   } finally {
> // Prevent RuntimeException from blocking other log edit write 
> done

[jira] [Comment Edited] (HDFS-14437) Exception happened when rollEditLog expects empty EditsDoubleBuffer.bufCurrent but not

2019-04-22 Thread angerszhu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823717#comment-16823717
 ] 

angerszhu edited comment on HDFS-14437 at 4/23/19 6:17 AM:
---

[~starphin] 

yeah , that is what i want to show , but  in your table process, #locked means 
thread0 get lock, may make other confused.

you can see my pull request.

I think I should improve my skill of show technology logic


was (Author: angerszhuuu):
[~starphin] 

yeah , that is what i want to show .

you can see my pull request.

I think I should improve my skill of show technology logic

> Exception happened when   rollEditLog expects empty 
> EditsDoubleBuffer.bufCurrent  but not
> -
>
> Key: HDFS-14437
> URL: https://issues.apache.org/jira/browse/HDFS-14437
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, namenode, qjm
>Reporter: angerszhu
>Priority: Major
>
> For the problem mentioned in https://issues.apache.org/jira/browse/HDFS-10943 
> , I have sort the process of write and flush EditLog and some important 
> function, I found the in the class  FSEditLog class, the close() function 
> will call such process like below:
>  
> {code:java}
> waitForSyncToFinish();
> endCurrentLogSegment(true);{code}
> since we have gain the object lock in the function close(), so when  
> waitForSyncToFish() method return, it mean all logSync job has done and all 
> data in bufReady has been flushed out, and since current thread has the lock 
> of this object, when call endCurrentLogSegment(), no other thread will gain 
> the lock so they can't write new editlog into currentBuf.
> But when we don't call waitForSyncToFish() before endCurrentLogSegment(), 
> there may be some autoScheduled logSync()'s flush process is doing, since 
> this process don't need
> synchronization since it has mention in the comment of logSync() method :
>  
> {code:java}
> /**
>  * Sync all modifications done by this thread.
>  *
>  * The internal concurrency design of this class is as follows:
>  *   - Log items are written synchronized into an in-memory buffer,
>  * and each assigned a transaction ID.
>  *   - When a thread (client) would like to sync all of its edits, logSync()
>  * uses a ThreadLocal transaction ID to determine what edit number must
>  * be synced to.
>  *   - The isSyncRunning volatile boolean tracks whether a sync is currently
>  * under progress.
>  *
>  * The data is double-buffered within each edit log implementation so that
>  * in-memory writing can occur in parallel with the on-disk writing.
>  *
>  * Each sync occurs in three steps:
>  *   1. synchronized, it swaps the double buffer and sets the isSyncRunning
>  *  flag.
>  *   2. unsynchronized, it flushes the data to storage
>  *   3. synchronized, it resets the flag and notifies anyone waiting on the
>  *  sync.
>  *
>  * The lack of synchronization on step 2 allows other threads to continue
>  * to write into the memory buffer while the sync is in progress.
>  * Because this step is unsynchronized, actions that need to avoid
>  * concurrency with sync() should be synchronized and also call
>  * waitForSyncToFinish() before assuming they are running alone.
>  */
> public void logSync() {
>   long syncStart = 0;
>   // Fetch the transactionId of this thread. 
>   long mytxid = myTransactionId.get().txid;
>   
>   boolean sync = false;
>   try {
> EditLogOutputStream logStream = null;
> synchronized (this) {
>   try {
> printStatistics(false);
> // if somebody is already syncing, then wait
> while (mytxid > synctxid && isSyncRunning) {
>   try {
> wait(1000);
>   } catch (InterruptedException ie) {
>   }
> }
> //
> // If this transaction was already flushed, then nothing to do
> //
> if (mytxid <= synctxid) {
>   numTransactionsBatchedInSync++;
>   if (metrics != null) {
> // Metrics is non-null only when used inside name node
> metrics.incrTransactionsBatchedInSync();
>   }
>   return;
> }
>
> // now, this thread will do the sync
> syncStart = txid;
> isSyncRunning = true;
> sync = true;
> // swap buffers
> try {
>   if (journalSet.isEmpty()) {
> throw new IOException("No journals available to flush");
>   }
>   editLogStream.setReadyToFlush();
> } catch (IOException e) {
>   final String msg =
>   "Could not sync enough journals to persistent storage " +
>   "due to " + e.getMessage() + ". " +
>   "Unsynced transactions: " + (txid - synctxid);
> 

[jira] [Commented] (HDFS-14437) Exception happened when rollEditLog expects empty EditsDoubleBuffer.bufCurrent but not

2019-04-22 Thread angerszhu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823717#comment-16823717
 ] 

angerszhu commented on HDFS-14437:
--

[~starphin] 

yeah , that is what i want to show .

you can see my pull request.

I think I should improve my skill of show technology logic

> Exception happened when   rollEditLog expects empty 
> EditsDoubleBuffer.bufCurrent  but not
> -
>
> Key: HDFS-14437
> URL: https://issues.apache.org/jira/browse/HDFS-14437
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, namenode, qjm
>Reporter: angerszhu
>Priority: Major
>
> For the problem mentioned in https://issues.apache.org/jira/browse/HDFS-10943 
> , I have sort the process of write and flush EditLog and some important 
> function, I found the in the class  FSEditLog class, the close() function 
> will call such process like below:
>  
> {code:java}
> waitForSyncToFinish();
> endCurrentLogSegment(true);{code}
> since we have gain the object lock in the function close(), so when  
> waitForSyncToFish() method return, it mean all logSync job has done and all 
> data in bufReady has been flushed out, and since current thread has the lock 
> of this object, when call endCurrentLogSegment(), no other thread will gain 
> the lock so they can't write new editlog into currentBuf.
> But when we don't call waitForSyncToFish() before endCurrentLogSegment(), 
> there may be some autoScheduled logSync()'s flush process is doing, since 
> this process don't need
> synchronization since it has mention in the comment of logSync() method :
>  
> {code:java}
> /**
>  * Sync all modifications done by this thread.
>  *
>  * The internal concurrency design of this class is as follows:
>  *   - Log items are written synchronized into an in-memory buffer,
>  * and each assigned a transaction ID.
>  *   - When a thread (client) would like to sync all of its edits, logSync()
>  * uses a ThreadLocal transaction ID to determine what edit number must
>  * be synced to.
>  *   - The isSyncRunning volatile boolean tracks whether a sync is currently
>  * under progress.
>  *
>  * The data is double-buffered within each edit log implementation so that
>  * in-memory writing can occur in parallel with the on-disk writing.
>  *
>  * Each sync occurs in three steps:
>  *   1. synchronized, it swaps the double buffer and sets the isSyncRunning
>  *  flag.
>  *   2. unsynchronized, it flushes the data to storage
>  *   3. synchronized, it resets the flag and notifies anyone waiting on the
>  *  sync.
>  *
>  * The lack of synchronization on step 2 allows other threads to continue
>  * to write into the memory buffer while the sync is in progress.
>  * Because this step is unsynchronized, actions that need to avoid
>  * concurrency with sync() should be synchronized and also call
>  * waitForSyncToFinish() before assuming they are running alone.
>  */
> public void logSync() {
>   long syncStart = 0;
>   // Fetch the transactionId of this thread. 
>   long mytxid = myTransactionId.get().txid;
>   
>   boolean sync = false;
>   try {
> EditLogOutputStream logStream = null;
> synchronized (this) {
>   try {
> printStatistics(false);
> // if somebody is already syncing, then wait
> while (mytxid > synctxid && isSyncRunning) {
>   try {
> wait(1000);
>   } catch (InterruptedException ie) {
>   }
> }
> //
> // If this transaction was already flushed, then nothing to do
> //
> if (mytxid <= synctxid) {
>   numTransactionsBatchedInSync++;
>   if (metrics != null) {
> // Metrics is non-null only when used inside name node
> metrics.incrTransactionsBatchedInSync();
>   }
>   return;
> }
>
> // now, this thread will do the sync
> syncStart = txid;
> isSyncRunning = true;
> sync = true;
> // swap buffers
> try {
>   if (journalSet.isEmpty()) {
> throw new IOException("No journals available to flush");
>   }
>   editLogStream.setReadyToFlush();
> } catch (IOException e) {
>   final String msg =
>   "Could not sync enough journals to persistent storage " +
>   "due to " + e.getMessage() + ". " +
>   "Unsynced transactions: " + (txid - synctxid);
>   LOG.fatal(msg, new Exception());
>   synchronized(journalSetLock) {
> IOUtils.cleanup(LOG, journalSet);
>   }
>   terminate(1, msg);
> }
>   } finally {
> // Prevent RuntimeException from blocking other log edit write 
> doneWithAutoSy

[jira] [Commented] (HDFS-14437) Exception happened when rollEditLog expects empty EditsDoubleBuffer.bufCurrent but not

2019-04-22 Thread star (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823714#comment-16823714
 ] 

star commented on HDFS-14437:
-

[~angerszhuuu], I think I've got that. 

 
||Thread0||thread1||Thread2||locked||isSyncRunning||
| |flush| |false|true|
|endCurrentSegment#{color:#33}logSyncAll{color}| | |true|true|
|wait| | |false|true|
| |flush done| |false|false|
| | |*logAppend*|false|false|
|{color:#d04437}finalize error{color}| | |true|false|

 

 

> Exception happened when   rollEditLog expects empty 
> EditsDoubleBuffer.bufCurrent  but not
> -
>
> Key: HDFS-14437
> URL: https://issues.apache.org/jira/browse/HDFS-14437
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, namenode, qjm
>Reporter: angerszhu
>Priority: Major
>
> For the problem mentioned in https://issues.apache.org/jira/browse/HDFS-10943 
> , I have sort the process of write and flush EditLog and some important 
> function, I found the in the class  FSEditLog class, the close() function 
> will call such process like below:
>  
> {code:java}
> waitForSyncToFinish();
> endCurrentLogSegment(true);{code}
> since we have gain the object lock in the function close(), so when  
> waitForSyncToFish() method return, it mean all logSync job has done and all 
> data in bufReady has been flushed out, and since current thread has the lock 
> of this object, when call endCurrentLogSegment(), no other thread will gain 
> the lock so they can't write new editlog into currentBuf.
> But when we don't call waitForSyncToFish() before endCurrentLogSegment(), 
> there may be some autoScheduled logSync()'s flush process is doing, since 
> this process don't need
> synchronization since it has mention in the comment of logSync() method :
>  
> {code:java}
> /**
>  * Sync all modifications done by this thread.
>  *
>  * The internal concurrency design of this class is as follows:
>  *   - Log items are written synchronized into an in-memory buffer,
>  * and each assigned a transaction ID.
>  *   - When a thread (client) would like to sync all of its edits, logSync()
>  * uses a ThreadLocal transaction ID to determine what edit number must
>  * be synced to.
>  *   - The isSyncRunning volatile boolean tracks whether a sync is currently
>  * under progress.
>  *
>  * The data is double-buffered within each edit log implementation so that
>  * in-memory writing can occur in parallel with the on-disk writing.
>  *
>  * Each sync occurs in three steps:
>  *   1. synchronized, it swaps the double buffer and sets the isSyncRunning
>  *  flag.
>  *   2. unsynchronized, it flushes the data to storage
>  *   3. synchronized, it resets the flag and notifies anyone waiting on the
>  *  sync.
>  *
>  * The lack of synchronization on step 2 allows other threads to continue
>  * to write into the memory buffer while the sync is in progress.
>  * Because this step is unsynchronized, actions that need to avoid
>  * concurrency with sync() should be synchronized and also call
>  * waitForSyncToFinish() before assuming they are running alone.
>  */
> public void logSync() {
>   long syncStart = 0;
>   // Fetch the transactionId of this thread. 
>   long mytxid = myTransactionId.get().txid;
>   
>   boolean sync = false;
>   try {
> EditLogOutputStream logStream = null;
> synchronized (this) {
>   try {
> printStatistics(false);
> // if somebody is already syncing, then wait
> while (mytxid > synctxid && isSyncRunning) {
>   try {
> wait(1000);
>   } catch (InterruptedException ie) {
>   }
> }
> //
> // If this transaction was already flushed, then nothing to do
> //
> if (mytxid <= synctxid) {
>   numTransactionsBatchedInSync++;
>   if (metrics != null) {
> // Metrics is non-null only when used inside name node
> metrics.incrTransactionsBatchedInSync();
>   }
>   return;
> }
>
> // now, this thread will do the sync
> syncStart = txid;
> isSyncRunning = true;
> sync = true;
> // swap buffers
> try {
>   if (journalSet.isEmpty()) {
> throw new IOException("No journals available to flush");
>   }
>   editLogStream.setReadyToFlush();
> } catch (IOException e) {
>   final String msg =
>   "Could not sync enough journals to persistent storage " +
>   "due to " + e.getMessage() + ". " +
>   "Unsynced transactions: " + (txid - synctxid);
>   LOG.fatal(msg, new Exception());
>   synchronized(journalSetLock) {
> IOUtils.cleanup(LOG, journalSet);

[jira] [Comment Edited] (HDFS-14437) Exception happened when rollEditLog expects empty EditsDoubleBuffer.bufCurrent but not

2019-04-22 Thread angerszhu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823669#comment-16823669
 ] 

angerszhu edited comment on HDFS-14437 at 4/23/19 5:55 AM:
---

[~hexiaoqiao] 

 

Since #logSync's step 2 can run without lock, if other thread is call #logSync  
because of bufCurrent is full and call autoSchedulerSync. when this thread run 
into step2[flush()], and at this moment, current thread call #rollEditLog

You can see that when other thread is also run logsync, isSyncRunning == 
true.Current thread call #endCurrentLogSegment will go into while loop:
{code:java}
while (mytxid > synctxid && isSyncRunning) {
  try {
wait(1000);
  } catch (InterruptedException ie) {
  }
}
{code}
if other thread can't get lock, isSyncRunning won't be false, and synctxid 
won't change.

the the current thread will always blocking in the while loop, this situation 
is not correct.

 

My english is not very good . may have some mistake, if you need, send a mail 
to me and explain to you in Chinese.


was (Author: angerszhuuu):
[~hexiaoqiao] 

You can see that when other thread is also run logsync, isSyncRunning == 
true.Current thread call #endCurrentLogSegment will go into while loop:
{code:java}
while (mytxid > synctxid && isSyncRunning) {
  try {
wait(1000);
  } catch (InterruptedException ie) {
  }
}
{code}
if other thread can't get lock, isSyncRunning won't be false, and synctxid 
won't change.

the the current thread will always blocking in the while loop, this situation 
is not correct.

 

My english is not very good . may have some mistake, if you need, send a mail 
to me and explain to you in chinese.

> Exception happened when   rollEditLog expects empty 
> EditsDoubleBuffer.bufCurrent  but not
> -
>
> Key: HDFS-14437
> URL: https://issues.apache.org/jira/browse/HDFS-14437
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, namenode, qjm
>Reporter: angerszhu
>Priority: Major
>
> For the problem mentioned in https://issues.apache.org/jira/browse/HDFS-10943 
> , I have sort the process of write and flush EditLog and some important 
> function, I found the in the class  FSEditLog class, the close() function 
> will call such process like below:
>  
> {code:java}
> waitForSyncToFinish();
> endCurrentLogSegment(true);{code}
> since we have gain the object lock in the function close(), so when  
> waitForSyncToFish() method return, it mean all logSync job has done and all 
> data in bufReady has been flushed out, and since current thread has the lock 
> of this object, when call endCurrentLogSegment(), no other thread will gain 
> the lock so they can't write new editlog into currentBuf.
> But when we don't call waitForSyncToFish() before endCurrentLogSegment(), 
> there may be some autoScheduled logSync()'s flush process is doing, since 
> this process don't need
> synchronization since it has mention in the comment of logSync() method :
>  
> {code:java}
> /**
>  * Sync all modifications done by this thread.
>  *
>  * The internal concurrency design of this class is as follows:
>  *   - Log items are written synchronized into an in-memory buffer,
>  * and each assigned a transaction ID.
>  *   - When a thread (client) would like to sync all of its edits, logSync()
>  * uses a ThreadLocal transaction ID to determine what edit number must
>  * be synced to.
>  *   - The isSyncRunning volatile boolean tracks whether a sync is currently
>  * under progress.
>  *
>  * The data is double-buffered within each edit log implementation so that
>  * in-memory writing can occur in parallel with the on-disk writing.
>  *
>  * Each sync occurs in three steps:
>  *   1. synchronized, it swaps the double buffer and sets the isSyncRunning
>  *  flag.
>  *   2. unsynchronized, it flushes the data to storage
>  *   3. synchronized, it resets the flag and notifies anyone waiting on the
>  *  sync.
>  *
>  * The lack of synchronization on step 2 allows other threads to continue
>  * to write into the memory buffer while the sync is in progress.
>  * Because this step is unsynchronized, actions that need to avoid
>  * concurrency with sync() should be synchronized and also call
>  * waitForSyncToFinish() before assuming they are running alone.
>  */
> public void logSync() {
>   long syncStart = 0;
>   // Fetch the transactionId of this thread. 
>   long mytxid = myTransactionId.get().txid;
>   
>   boolean sync = false;
>   try {
> EditLogOutputStream logStream = null;
> synchronized (this) {
>   try {
> printStatistics(false);
> // if somebody is already syncing, then wait
> while (mytxid > synctxid && isSyncRunning) {
>   try {
> wait(1000

[jira] [Commented] (HDDS-1448) RatisPipelineProvider should only consider open pipeline while excluding dn for pipeline allocation

2019-04-22 Thread Lokesh Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823698#comment-16823698
 ] 

Lokesh Jain commented on HDDS-1448:
---

The changes required for this Jira would enable multiple three node pipelines 
in a datanode. It was implemented this way to make sure that a datanode is not 
part of more than one factor three pipeline.

> RatisPipelineProvider should only consider open pipeline while excluding dn 
> for pipeline allocation
> ---
>
> Key: HDDS-1448
> URL: https://issues.apache.org/jira/browse/HDDS-1448
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Affects Versions: 0.3.0
>Reporter: Mukul Kumar Singh
>Assignee: Aravindan Vijayan
>Priority: Major
>  Labels: MiniOzoneChaosCluster
>
> While allocation pipelines, Ratis pipeline provider considers all the 
> pipelines irrespective of the state of the pipeline. This can lead to case 
> where all the datanodes are up but the pipelines are in closing state in SCM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1453) Fix unit test TestConfigurationFields broken on trunk

2019-04-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1453?focusedWorklogId=231065&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-231065
 ]

ASF GitHub Bot logged work on HDDS-1453:


Author: ASF GitHub Bot
Created on: 23/Apr/19 05:37
Start Date: 23/Apr/19 05:37
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on issue #759: HDDS-1453. Fix 
unit test TestConfigurationFields broken on trunk. (swagle)
URL: https://github.com/apache/hadoop/pull/759#issuecomment-485650589
 
 
   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | 0 | reexec | 501 | Docker mode activated. |
   ||| _ Prechecks _ |
   | +1 | @author | 0 | The patch does not contain any @author tags. |
   | -1 | test4tests | 0 | The patch doesn't appear to include any new or 
modified tests.  Please justify why no new tests are needed for this patch. 
Also please list what manual steps were performed to verify this patch. |
   ||| _ trunk Compile Tests _ |
   | +1 | mvninstall | 1043 | trunk passed |
   | +1 | compile | 48 | trunk passed |
   | +1 | checkstyle | 25 | trunk passed |
   | +1 | mvnsite | 42 | trunk passed |
   | +1 | shadedclient | 734 | branch has no errors when building and testing 
our client artifacts. |
   | +1 | findbugs | 70 | trunk passed |
   | +1 | javadoc | 42 | trunk passed |
   ||| _ Patch Compile Tests _ |
   | +1 | mvninstall | 42 | the patch passed |
   | +1 | compile | 33 | the patch passed |
   | +1 | javac | 33 | the patch passed |
   | +1 | checkstyle | 17 | the patch passed |
   | +1 | mvnsite | 35 | the patch passed |
   | +1 | whitespace | 0 | The patch has no whitespace issues. |
   | +1 | xml | 1 | The patch has no ill-formed XML file. |
   | +1 | shadedclient | 748 | patch has no errors when building and testing 
our client artifacts. |
   | +1 | findbugs | 75 | the patch passed |
   | +1 | javadoc | 39 | the patch passed |
   ||| _ Other Tests _ |
   | -1 | unit | 80 | common in the patch failed. |
   | +1 | asflicense | 30 | The patch does not generate ASF License warnings. |
   | | | 3685 | |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdds.scm.net.TestNodeSchemaManager |
   |   | hadoop.hdds.scm.net.TestNetworkTopologyImpl |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | Client=17.05.0-ce Server=17.05.0-ce base: 
https://builds.apache.org/job/hadoop-multibranch/job/PR-759/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/759 |
   | Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall 
 mvnsite  unit  shadedclient  findbugs  checkstyle  xml  |
   | uname | Linux 77059421f0f1 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | personality/hadoop.sh |
   | git revision | trunk / f4ab937 |
   | maven | version: Apache Maven 3.3.9 |
   | Default Java | 1.8.0_191 |
   | findbugs | v3.1.0-RC1 |
   | unit | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-759/1/artifact/out/patch-unit-hadoop-hdds_common.txt
 |
   |  Test Results | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-759/1/testReport/ |
   | Max. process+thread count | 445 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdds/common U: hadoop-hdds/common |
   | Console output | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-759/1/console |
   | Powered by | Apache Yetus 0.9.0 http://yetus.apache.org |
   
   
   This message was automatically generated.
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 231065)
Time Spent: 20m  (was: 10m)

> Fix unit test TestConfigurationFields broken on trunk
> -
>
> Key: HDDS-1453
> URL: https://issues.apache.org/jira/browse/HDDS-1453
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Affects Versions: 0.5.0
>Reporter: Siddharth Wagle
>Assignee: Siddharth Wagle
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Unit test failure::
> {code}
> [INFO] Running org.apache.hadoop.ozone.TestOzoneConfigurationFields
> [ERROR] Tests run: 4, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.772 
> s <<< FAILURE! - in org.apache.ha

[jira] [Comment Edited] (HDFS-14426) RBF: Add delegation token total count as one of the federation metrics

2019-04-22 Thread Fengnan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823676#comment-16823676
 ] 

Fengnan Li edited comment on HDFS-14426 at 4/23/19 5:26 AM:


 

[Deleted some confusing comments]

Once the change in https://issues.apache.org/jira/browse/HDFS-14374 is rebased, 
I will rebase my patch again.

Also, I will also include changes in Namenode and KMS using a separate ticket 
since those can go to trunk.


was (Author: fengnanli):
[~crh] I think the commit https://issues.apache.org/jira/browse/HDFS-14374 is 
going to trunk.

[~elgoiri] For the current ticket, since I am going to be based on CR's change 
in https://issues.apache.org/jira/browse/HDFS-14374 and merge it back to 
https://issues.apache.org/jira/browse/HDFS-13891, you probably need to 
cherry-pick that commit to HDFS-13891? Then I can rebase my patch again for 
this.

Also, I will also include changes in Namenode and KMS using a separate ticket 
since those can go to trunk.

> RBF: Add delegation token total count as one of the federation metrics
> --
>
> Key: HDFS-14426
> URL: https://issues.apache.org/jira/browse/HDFS-14426
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Fengnan Li
>Assignee: Fengnan Li
>Priority: Major
> Attachments: HDFS-14426-HDFS-13891.001.patch, HDFS-14426.001.patch
>
>
> Currently router doesn't report the total number of current valid delegation 
> tokens it has, but this piece of information is useful for monitoring and 
> understanding the real time situation of tokens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14374) Expose total number of delegation tokens in AbstractDelegationTokenSecretManager

2019-04-22 Thread Fengnan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823677#comment-16823677
 ] 

Fengnan Li commented on HDFS-14374:
---

[~hexiaoqiao] I will actually take care of those jmx metrics. For router it is 
currently tracked in https://issues.apache.org/jira/browse/HDFS-14426  and 
Namenode/KMS will go to https://issues.apache.org/jira/browse/HDFS-14449 The 
first ticket will go to router branch and the latter will go to trunk.

> Expose total number of delegation tokens in 
> AbstractDelegationTokenSecretManager
> 
>
> Key: HDFS-14374
> URL: https://issues.apache.org/jira/browse/HDFS-14374
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: CR Hota
>Assignee: CR Hota
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14374.001.patch, HDFS-14374.002.patch
>
>
> AbstractDelegationTokenSecretManager should expose total number of active 
> delegation tokens for specific implementations to track for observability.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14449) Expose total number of dt in jmx for KMS/Namenode

2019-04-22 Thread Fengnan Li (JIRA)
Fengnan Li created HDFS-14449:
-

 Summary: Expose total number of dt in jmx for KMS/Namenode
 Key: HDFS-14449
 URL: https://issues.apache.org/jira/browse/HDFS-14449
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Fengnan Li
Assignee: Fengnan Li






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14426) RBF: Add delegation token total count as one of the federation metrics

2019-04-22 Thread Fengnan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823676#comment-16823676
 ] 

Fengnan Li commented on HDFS-14426:
---

[~crh] I think the commit https://issues.apache.org/jira/browse/HDFS-14374 is 
going to trunk.

[~elgoiri] For the current ticket, since I am going to be based on CR's change 
in https://issues.apache.org/jira/browse/HDFS-14374 and merge it back to 
https://issues.apache.org/jira/browse/HDFS-13891, you probably need to 
cherry-pick that commit to HDFS-13891? Then I can rebase my patch again for 
this.

Also, I will also include changes in Namenode and KMS using a separate ticket 
since those can go to trunk.

> RBF: Add delegation token total count as one of the federation metrics
> --
>
> Key: HDFS-14426
> URL: https://issues.apache.org/jira/browse/HDFS-14426
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Fengnan Li
>Assignee: Fengnan Li
>Priority: Major
> Attachments: HDFS-14426-HDFS-13891.001.patch, HDFS-14426.001.patch
>
>
> Currently router doesn't report the total number of current valid delegation 
> tokens it has, but this piece of information is useful for monitoring and 
> understanding the real time situation of tokens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14353) Erasure Coding: metrics xmitsInProgress become to negative.

2019-04-22 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823673#comment-16823673
 ] 

Hadoop QA commented on HDFS-14353:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
33s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 47s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
4s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
52s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 37s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 1 new + 7 unchanged - 0 fixed = 8 total (was 7) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 38s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}104m 50s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
40s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}166m 49s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.tools.TestDFSZKFailoverController |
|   | hadoop.hdfs.server.datanode.TestDataNodeErasureCodingMetrics |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e |
| JIRA Issue | HDFS-14353 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12966674/HDFS-14353.003.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 9a838dcaec33 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / f4ab937 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_191 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/26684/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/26684/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/26684

[jira] [Commented] (HDFS-14437) Exception happened when rollEditLog expects empty EditsDoubleBuffer.bufCurrent but not

2019-04-22 Thread angerszhu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823669#comment-16823669
 ] 

angerszhu commented on HDFS-14437:
--

[~hexiaoqiao] 

You can see that when other thread is also run logsync, isSyncRunning == 
true.Current thread call #endCurrentLogSegment will go into while loop:
{code:java}
while (mytxid > synctxid && isSyncRunning) {
  try {
wait(1000);
  } catch (InterruptedException ie) {
  }
}
{code}
if other thread can't get lock, isSyncRunning won't be false, and synctxid 
won't change.

the the current thread will always blocking in the while loop, this situation 
is not correct.

 

My english is not very good . may have some mistake, if you need, send a mail 
to me and explain to you in chinese.

> Exception happened when   rollEditLog expects empty 
> EditsDoubleBuffer.bufCurrent  but not
> -
>
> Key: HDFS-14437
> URL: https://issues.apache.org/jira/browse/HDFS-14437
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, namenode, qjm
>Reporter: angerszhu
>Priority: Major
>
> For the problem mentioned in https://issues.apache.org/jira/browse/HDFS-10943 
> , I have sort the process of write and flush EditLog and some important 
> function, I found the in the class  FSEditLog class, the close() function 
> will call such process like below:
>  
> {code:java}
> waitForSyncToFinish();
> endCurrentLogSegment(true);{code}
> since we have gain the object lock in the function close(), so when  
> waitForSyncToFish() method return, it mean all logSync job has done and all 
> data in bufReady has been flushed out, and since current thread has the lock 
> of this object, when call endCurrentLogSegment(), no other thread will gain 
> the lock so they can't write new editlog into currentBuf.
> But when we don't call waitForSyncToFish() before endCurrentLogSegment(), 
> there may be some autoScheduled logSync()'s flush process is doing, since 
> this process don't need
> synchronization since it has mention in the comment of logSync() method :
>  
> {code:java}
> /**
>  * Sync all modifications done by this thread.
>  *
>  * The internal concurrency design of this class is as follows:
>  *   - Log items are written synchronized into an in-memory buffer,
>  * and each assigned a transaction ID.
>  *   - When a thread (client) would like to sync all of its edits, logSync()
>  * uses a ThreadLocal transaction ID to determine what edit number must
>  * be synced to.
>  *   - The isSyncRunning volatile boolean tracks whether a sync is currently
>  * under progress.
>  *
>  * The data is double-buffered within each edit log implementation so that
>  * in-memory writing can occur in parallel with the on-disk writing.
>  *
>  * Each sync occurs in three steps:
>  *   1. synchronized, it swaps the double buffer and sets the isSyncRunning
>  *  flag.
>  *   2. unsynchronized, it flushes the data to storage
>  *   3. synchronized, it resets the flag and notifies anyone waiting on the
>  *  sync.
>  *
>  * The lack of synchronization on step 2 allows other threads to continue
>  * to write into the memory buffer while the sync is in progress.
>  * Because this step is unsynchronized, actions that need to avoid
>  * concurrency with sync() should be synchronized and also call
>  * waitForSyncToFinish() before assuming they are running alone.
>  */
> public void logSync() {
>   long syncStart = 0;
>   // Fetch the transactionId of this thread. 
>   long mytxid = myTransactionId.get().txid;
>   
>   boolean sync = false;
>   try {
> EditLogOutputStream logStream = null;
> synchronized (this) {
>   try {
> printStatistics(false);
> // if somebody is already syncing, then wait
> while (mytxid > synctxid && isSyncRunning) {
>   try {
> wait(1000);
>   } catch (InterruptedException ie) {
>   }
> }
> //
> // If this transaction was already flushed, then nothing to do
> //
> if (mytxid <= synctxid) {
>   numTransactionsBatchedInSync++;
>   if (metrics != null) {
> // Metrics is non-null only when used inside name node
> metrics.incrTransactionsBatchedInSync();
>   }
>   return;
> }
>
> // now, this thread will do the sync
> syncStart = txid;
> isSyncRunning = true;
> sync = true;
> // swap buffers
> try {
>   if (journalSet.isEmpty()) {
> throw new IOException("No journals available to flush");
>   }
>   editLogStream.setReadyToFlush();
> } catch (IOException e) {
>   final String msg =
>   "Could not sync enough jo

[jira] [Updated] (HDDS-1453) Fix unit test TestConfigurationFields broken on trunk

2019-04-22 Thread Siddharth Wagle (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Wagle updated HDDS-1453:
--
Status: Patch Available  (was: Open)

> Fix unit test TestConfigurationFields broken on trunk
> -
>
> Key: HDDS-1453
> URL: https://issues.apache.org/jira/browse/HDDS-1453
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Affects Versions: 0.5.0
>Reporter: Siddharth Wagle
>Assignee: Siddharth Wagle
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Unit test failure::
> {code}
> [INFO] Running org.apache.hadoop.ozone.TestOzoneConfigurationFields
> [ERROR] Tests run: 4, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.772 
> s <<< FAILURE! - in org.apache.hadoop.ozone.TestOzoneConfigurationFields
> [ERROR] 
> testCompareConfigurationClassAgainstXml(org.apache.hadoop.ozone.TestOzoneConfigurationFields)
>   Time elapsed: 0.052 s  <<< FAILURE!
> java.lang.AssertionError: class org.apache.hadoop.ozone.OzoneConfigKeys class 
> org.apache.hadoop.hdds.scm.ScmConfigKeys class 
> org.apache.hadoop.ozone.om.OMConfigKeys class 
> org.apache.hadoop.hdds.HddsConfigKeys class 
> org.apache.hadoop.ozone.recon.ReconServerConfigKeys class 
> org.apache.hadoop.ozone.s3.S3GatewayConfigKeys has 1 variables missing in 
> ozone-default.xml Entries:   ozone.scm.network.topology.schema.file.type 
> expected:<0> but was:<1>
>   at org.junit.Assert.fail(Assert.java:88)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1453) Fix unit test TestConfigurationFields broken on trunk

2019-04-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1453?focusedWorklogId=231045&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-231045
 ]

ASF GitHub Bot logged work on HDDS-1453:


Author: ASF GitHub Bot
Created on: 23/Apr/19 04:34
Start Date: 23/Apr/19 04:34
Worklog Time Spent: 10m 
  Work Description: swagle commented on pull request #759: HDDS-1453. Fix 
unit test TestConfigurationFields broken on trunk. (swagle)
URL: https://github.com/apache/hadoop/pull/759
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 231045)
Time Spent: 10m
Remaining Estimate: 0h

> Fix unit test TestConfigurationFields broken on trunk
> -
>
> Key: HDDS-1453
> URL: https://issues.apache.org/jira/browse/HDDS-1453
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Affects Versions: 0.5.0
>Reporter: Siddharth Wagle
>Assignee: Siddharth Wagle
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Unit test failure::
> {code}
> [INFO] Running org.apache.hadoop.ozone.TestOzoneConfigurationFields
> [ERROR] Tests run: 4, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.772 
> s <<< FAILURE! - in org.apache.hadoop.ozone.TestOzoneConfigurationFields
> [ERROR] 
> testCompareConfigurationClassAgainstXml(org.apache.hadoop.ozone.TestOzoneConfigurationFields)
>   Time elapsed: 0.052 s  <<< FAILURE!
> java.lang.AssertionError: class org.apache.hadoop.ozone.OzoneConfigKeys class 
> org.apache.hadoop.hdds.scm.ScmConfigKeys class 
> org.apache.hadoop.ozone.om.OMConfigKeys class 
> org.apache.hadoop.hdds.HddsConfigKeys class 
> org.apache.hadoop.ozone.recon.ReconServerConfigKeys class 
> org.apache.hadoop.ozone.s3.S3GatewayConfigKeys has 1 variables missing in 
> ozone-default.xml Entries:   ozone.scm.network.topology.schema.file.type 
> expected:<0> but was:<1>
>   at org.junit.Assert.fail(Assert.java:88)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1453) Fix unit test TestConfigurationFields broken on trunk

2019-04-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-1453:
-
Labels: pull-request-available  (was: )

> Fix unit test TestConfigurationFields broken on trunk
> -
>
> Key: HDDS-1453
> URL: https://issues.apache.org/jira/browse/HDDS-1453
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Affects Versions: 0.5.0
>Reporter: Siddharth Wagle
>Assignee: Siddharth Wagle
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>
> Unit test failure::
> {code}
> [INFO] Running org.apache.hadoop.ozone.TestOzoneConfigurationFields
> [ERROR] Tests run: 4, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.772 
> s <<< FAILURE! - in org.apache.hadoop.ozone.TestOzoneConfigurationFields
> [ERROR] 
> testCompareConfigurationClassAgainstXml(org.apache.hadoop.ozone.TestOzoneConfigurationFields)
>   Time elapsed: 0.052 s  <<< FAILURE!
> java.lang.AssertionError: class org.apache.hadoop.ozone.OzoneConfigKeys class 
> org.apache.hadoop.hdds.scm.ScmConfigKeys class 
> org.apache.hadoop.ozone.om.OMConfigKeys class 
> org.apache.hadoop.hdds.HddsConfigKeys class 
> org.apache.hadoop.ozone.recon.ReconServerConfigKeys class 
> org.apache.hadoop.ozone.s3.S3GatewayConfigKeys has 1 variables missing in 
> ozone-default.xml Entries:   ozone.scm.network.topology.schema.file.type 
> expected:<0> but was:<1>
>   at org.junit.Assert.fail(Assert.java:88)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-1453) Fix unit test TestConfigurationFields broken on trunk

2019-04-22 Thread Siddharth Wagle (JIRA)
Siddharth Wagle created HDDS-1453:
-

 Summary: Fix unit test TestConfigurationFields broken on trunk
 Key: HDDS-1453
 URL: https://issues.apache.org/jira/browse/HDDS-1453
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: SCM
Affects Versions: 0.5.0
Reporter: Siddharth Wagle
Assignee: Siddharth Wagle
 Fix For: 0.5.0


Unit test failure::

{code}
[INFO] Running org.apache.hadoop.ozone.TestOzoneConfigurationFields
[ERROR] Tests run: 4, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.772 s 
<<< FAILURE! - in org.apache.hadoop.ozone.TestOzoneConfigurationFields
[ERROR] 
testCompareConfigurationClassAgainstXml(org.apache.hadoop.ozone.TestOzoneConfigurationFields)
  Time elapsed: 0.052 s  <<< FAILURE!
java.lang.AssertionError: class org.apache.hadoop.ozone.OzoneConfigKeys class 
org.apache.hadoop.hdds.scm.ScmConfigKeys class 
org.apache.hadoop.ozone.om.OMConfigKeys class 
org.apache.hadoop.hdds.HddsConfigKeys class 
org.apache.hadoop.ozone.recon.ReconServerConfigKeys class 
org.apache.hadoop.ozone.s3.S3GatewayConfigKeys has 1 variables missing in 
ozone-default.xml Entries:   ozone.scm.network.topology.schema.file.type 
expected:<0> but was:<1>
at org.junit.Assert.fail(Assert.java:88)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14437) Exception happened when rollEditLog expects empty EditsDoubleBuffer.bufCurrent but not

2019-04-22 Thread He Xiaoqiao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823660#comment-16823660
 ] 

He Xiaoqiao commented on HDFS-14437:


Thanks [~angerszhuuu] for your more information.
{quote}When #rollEditLog#endCurrentLogSegment call logSync, if there is some 
other thread is calling logSync, it will #wait() in the while loop. then other 
thread can get the lock.{quote}
Sorry, I still don't understand why other threads could acquire the FSEditLog 
Lock and run #logSync or #LogEdit (these two method also need acquire FSEditLog 
Lock) when execute endCurrentLogSegment#logSync which is synchronized.

> Exception happened when   rollEditLog expects empty 
> EditsDoubleBuffer.bufCurrent  but not
> -
>
> Key: HDFS-14437
> URL: https://issues.apache.org/jira/browse/HDFS-14437
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, namenode, qjm
>Reporter: angerszhu
>Priority: Major
>
> For the problem mentioned in https://issues.apache.org/jira/browse/HDFS-10943 
> , I have sort the process of write and flush EditLog and some important 
> function, I found the in the class  FSEditLog class, the close() function 
> will call such process like below:
>  
> {code:java}
> waitForSyncToFinish();
> endCurrentLogSegment(true);{code}
> since we have gain the object lock in the function close(), so when  
> waitForSyncToFish() method return, it mean all logSync job has done and all 
> data in bufReady has been flushed out, and since current thread has the lock 
> of this object, when call endCurrentLogSegment(), no other thread will gain 
> the lock so they can't write new editlog into currentBuf.
> But when we don't call waitForSyncToFish() before endCurrentLogSegment(), 
> there may be some autoScheduled logSync()'s flush process is doing, since 
> this process don't need
> synchronization since it has mention in the comment of logSync() method :
>  
> {code:java}
> /**
>  * Sync all modifications done by this thread.
>  *
>  * The internal concurrency design of this class is as follows:
>  *   - Log items are written synchronized into an in-memory buffer,
>  * and each assigned a transaction ID.
>  *   - When a thread (client) would like to sync all of its edits, logSync()
>  * uses a ThreadLocal transaction ID to determine what edit number must
>  * be synced to.
>  *   - The isSyncRunning volatile boolean tracks whether a sync is currently
>  * under progress.
>  *
>  * The data is double-buffered within each edit log implementation so that
>  * in-memory writing can occur in parallel with the on-disk writing.
>  *
>  * Each sync occurs in three steps:
>  *   1. synchronized, it swaps the double buffer and sets the isSyncRunning
>  *  flag.
>  *   2. unsynchronized, it flushes the data to storage
>  *   3. synchronized, it resets the flag and notifies anyone waiting on the
>  *  sync.
>  *
>  * The lack of synchronization on step 2 allows other threads to continue
>  * to write into the memory buffer while the sync is in progress.
>  * Because this step is unsynchronized, actions that need to avoid
>  * concurrency with sync() should be synchronized and also call
>  * waitForSyncToFinish() before assuming they are running alone.
>  */
> public void logSync() {
>   long syncStart = 0;
>   // Fetch the transactionId of this thread. 
>   long mytxid = myTransactionId.get().txid;
>   
>   boolean sync = false;
>   try {
> EditLogOutputStream logStream = null;
> synchronized (this) {
>   try {
> printStatistics(false);
> // if somebody is already syncing, then wait
> while (mytxid > synctxid && isSyncRunning) {
>   try {
> wait(1000);
>   } catch (InterruptedException ie) {
>   }
> }
> //
> // If this transaction was already flushed, then nothing to do
> //
> if (mytxid <= synctxid) {
>   numTransactionsBatchedInSync++;
>   if (metrics != null) {
> // Metrics is non-null only when used inside name node
> metrics.incrTransactionsBatchedInSync();
>   }
>   return;
> }
>
> // now, this thread will do the sync
> syncStart = txid;
> isSyncRunning = true;
> sync = true;
> // swap buffers
> try {
>   if (journalSet.isEmpty()) {
> throw new IOException("No journals available to flush");
>   }
>   editLogStream.setReadyToFlush();
> } catch (IOException e) {
>   final String msg =
>   "Could not sync enough journals to persistent storage " +
>   "due to " + e.getMessage() + ". " +
>   "Unsynced transactions: "

[jira] [Commented] (HDFS-14374) Expose total number of delegation tokens in AbstractDelegationTokenSecretManager

2019-04-22 Thread He Xiaoqiao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823658#comment-16823658
 ] 

He Xiaoqiao commented on HDFS-14374:


Thanks [~crh] and [~elgoiri] working on this, and sorry for missing this 
ticket's progress. As [~elgoiri] mentioned, I just mean expose the number to 
JMX for monitoring. Actually, it is very useful to diagnose token issues even 
memory leak (just for some older version) in our experience so I deploy this 
feature for long time for NameNode/KMS Server. Fortunately, I have seen there 
are some new JIRAs to push this forward. [~crh] would you like to fix that for 
KMS Server/NameNode/Router together?

> Expose total number of delegation tokens in 
> AbstractDelegationTokenSecretManager
> 
>
> Key: HDFS-14374
> URL: https://issues.apache.org/jira/browse/HDFS-14374
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: CR Hota
>Assignee: CR Hota
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14374.001.patch, HDFS-14374.002.patch
>
>
> AbstractDelegationTokenSecretManager should expose total number of active 
> delegation tokens for specific implementations to track for observability.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14421) HDFS block two replicas exist in one DataNode

2019-04-22 Thread Yuanbo Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823640#comment-16823640
 ] 

Yuanbo Liu commented on HDFS-14421:
---

Sorry, quite busy this week. Will comment as soon as I figure out.

> HDFS block two replicas exist in one DataNode
> -
>
> Key: HDFS-14421
> URL: https://issues.apache.org/jira/browse/HDFS-14421
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yuanbo Liu
>Priority: Major
> Attachments: 326942161.log
>
>
> We're using Hadoop-2.7.0.
> There is a file in the cluster and it's replication factor is 2. Those two 
> replicas exist in one Datande. the fsck info is here:
> {color:#707070}BP-499819267-xx.xxx.131.201-1452072365222:blk_1400651575_326942161
>  len=484045 repl=2 
> [DatanodeInfoWithStorage[xx.xxx.80.205:50010,DS-d321be27-cbd4-4edd-81ad-29b3d021ee82,DISK],
>  
> DatanodeInfoWithStorage[xx.xx.80.205:50010,DS-d321be27-cbd4-4edd-81ad-29b3d021ee82,DISK]].{color}
> and this is the exception from xx.xx.80.205
> {color:#707070}org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException:
>  Replica not found for 
> BP-499819267-xx.xxx.131.201-1452072365222:blk_1400651575_326942161{color}
> It's confusing that why NameNode doesn't update block map after exception. 
> What's the reason of two replicas existing in one Datande?
> Hope to get your comments. Thanks in advance.
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1065) OM and DN should persist SCM certificate as the trust root.

2019-04-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1065?focusedWorklogId=231016&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-231016
 ]

ASF GitHub Bot logged work on HDDS-1065:


Author: ASF GitHub Bot
Created on: 23/Apr/19 02:22
Start Date: 23/Apr/19 02:22
Worklog Time Spent: 10m 
  Work Description: ajayydv commented on pull request #754: HDDS-1065. OM 
and DN should persist SCM certificate as the trust root. Contributed by Ajay 
Kumar.
URL: https://github.com/apache/hadoop/pull/754#discussion_r277498610
 
 

 ##
 File path: 
hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/security/x509/certificate/client/DefaultCertificateClient.java
 ##
 @@ -80,6 +80,7 @@
 public abstract class DefaultCertificateClient implements CertificateClient {
 
   private static final String CERT_FILE_NAME_FORMAT = "%s.crt";
+  private static final String CA_CERT_PREFIX = "CA-";
 
 Review comment:
   yes for block token and DT validation. It will be used to establish trust of 
chain.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 231016)
Time Spent: 1h 10m  (was: 1h)

> OM and DN should persist SCM certificate as the trust root.
> ---
>
> Key: HDDS-1065
> URL: https://issues.apache.org/jira/browse/HDDS-1065
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Ajay Kumar
>Assignee: Ajay Kumar
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> OM and DN should persist SCM certificate as the trust root.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1065) OM and DN should persist SCM certificate as the trust root.

2019-04-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1065?focusedWorklogId=230992&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-230992
 ]

ASF GitHub Bot logged work on HDDS-1065:


Author: ASF GitHub Bot
Created on: 23/Apr/19 01:59
Start Date: 23/Apr/19 01:59
Worklog Time Spent: 10m 
  Work Description: ajayydv commented on pull request #754: HDDS-1065. OM 
and DN should persist SCM certificate as the trust root. Contributed by Ajay 
Kumar.
URL: https://github.com/apache/hadoop/pull/754#discussion_r277495379
 
 

 ##
 File path: 
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/HddsDatanodeService.java
 ##
 @@ -268,10 +268,13 @@ private void getSCMSignedCert(OzoneConfiguration config) 
{
 
   String pemEncodedCert = secureScmClient.getDataNodeCertificate(
   datanodeDetails.getProtoBufMessage(), getEncodedString(csr));
-  dnCertClient.storeCertificate(pemEncodedCert, true);
+  dnCertClient.storeCertificate(pemEncodedCert, true, false);
   datanodeDetails.setCertSerialId(getX509Certificate(pemEncodedCert).
   getSerialNumber().toString());
   persistDatanodeDetails(datanodeDetails);
+  // Get SCM CA certificate and store it in filesystem.
+  String pemEncodedRootCert = secureScmClient.getCACertificate();
 
 Review comment:
   As of now we don't have functionality to look up certificates by subject or 
scm id. getCACertificate returns default certificate for SCM who signed it.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 230992)
Time Spent: 1h  (was: 50m)

> OM and DN should persist SCM certificate as the trust root.
> ---
>
> Key: HDDS-1065
> URL: https://issues.apache.org/jira/browse/HDDS-1065
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Ajay Kumar
>Assignee: Ajay Kumar
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> OM and DN should persist SCM certificate as the trust root.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14353) Erasure Coding: metrics xmitsInProgress become to negative.

2019-04-22 Thread maobaolong (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823615#comment-16823615
 ] 

maobaolong commented on HDFS-14353:
---

[~elgoiri] Of curse, thank you for remind me, i upload a new patch, PTAL after 
the jenkins report.

> Erasure Coding: metrics xmitsInProgress become to negative.
> ---
>
> Key: HDFS-14353
> URL: https://issues.apache.org/jira/browse/HDFS-14353
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, erasure-coding
>Affects Versions: 3.3.0
>Reporter: maobaolong
>Assignee: maobaolong
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14353.001.patch, HDFS-14353.002.patch, 
> HDFS-14353.003.patch, screenshot-1.png
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14353) Erasure Coding: metrics xmitsInProgress become to negative.

2019-04-22 Thread maobaolong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

maobaolong updated HDFS-14353:
--
Attachment: HDFS-14353.003.patch

> Erasure Coding: metrics xmitsInProgress become to negative.
> ---
>
> Key: HDFS-14353
> URL: https://issues.apache.org/jira/browse/HDFS-14353
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, erasure-coding
>Affects Versions: 3.3.0
>Reporter: maobaolong
>Assignee: maobaolong
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14353.001.patch, HDFS-14353.002.patch, 
> HDFS-14353.003.patch, screenshot-1.png
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-999) Make the DNS resolution in OzoneManager more resilient

2019-04-22 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823610#comment-16823610
 ] 

Hadoop QA commented on HDDS-999:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
31s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:blue}0{color} | {color:blue} yamllint {color} | {color:blue}  0m  
0s{color} | {color:blue} yamllint was not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
44s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  3m  
5s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 22s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
56s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  3m 
50s{color} | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  6m 
38s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
14s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  3m  
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  3m  
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 49s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  7m 
43s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  1m  6s{color} 
| {color:red} hadoop-hdds in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 27m 17s{color} 
| {color:red} hadoop-ozone in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
34s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 95m  5s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdds.scm.net.TestNodeSchemaManager |
|   | hadoop.hdds.scm.net.TestNetworkTopologyImpl |
|   | hadoop.ozone.TestOzoneConfigurationFields |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce base: 
https://builds.apache.org/

[jira] [Created] (HDFS-14448) Provide an appropriate Message when Secondary Namenode is not bootstrap initialized

2019-04-22 Thread Sailesh Patel (JIRA)
Sailesh Patel created HDFS-14448:


 Summary: Provide an appropriate Message when Secondary Namenode is 
not  bootstrap initialized
 Key: HDFS-14448
 URL: https://issues.apache.org/jira/browse/HDFS-14448
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs
Affects Versions: 2.6.0
Reporter: Sailesh Patel


After  HDFS is running for some time and then it is enabled for HDFS-HA, if the 
secondary bootstrap fails ( say because active NN was down),  and then it is 
attempted to be started, the message in secondary NN says "NameNode is not 
formatted."

e.g.
2019-04-16 19:43:27,951 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: 
Failed to start namenode. java.io.IOException: NameNode is not formatted. at 
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:232)
 at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1144)
 

Can this message be improved to  say:

 "Secondary NameNode is not initialized. Please Bootstrap Secondary Namenode"

or

 "NameNode is not formatted/Secondary Namenode not Bootstrapped(Initialized)"

This is to avoid customers mistakenly formatting namenode and there by losing 
data.

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-999) Make the DNS resolution in OzoneManager more resilient

2019-04-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-999?focusedWorklogId=230980&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-230980
 ]

ASF GitHub Bot logged work on HDDS-999:
---

Author: ASF GitHub Bot
Created on: 23/Apr/19 00:50
Start Date: 23/Apr/19 00:50
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on issue #758: HDDS-999. Make 
the DNS resolution in OzoneManager more resilient. (swagle)
URL: https://github.com/apache/hadoop/pull/758#issuecomment-485601623
 
 
   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | 0 | reexec | 26 | Docker mode activated. |
   ||| _ Prechecks _ |
   | 0 | yamllint | 0 | yamllint was not available. |
   | +1 | @author | 0 | The patch does not contain any @author tags. |
   | -1 | test4tests | 0 | The patch doesn't appear to include any new or 
modified tests.  Please justify why no new tests are needed for this patch. 
Also please list what manual steps were performed to verify this patch. |
   ||| _ trunk Compile Tests _ |
   | 0 | mvndep | 42 | Maven dependency ordering for branch |
   | +1 | mvninstall | 1084 | trunk passed |
   | +1 | compile | 130 | trunk passed |
   | +1 | checkstyle | 53 | trunk passed |
   | +1 | mvnsite | 66 | trunk passed |
   | +1 | shadedclient | 787 | branch has no errors when building and testing 
our client artifacts. |
   | 0 | findbugs | 1 | Skipped patched modules with no Java source: 
hadoop-ozone/dist |
   | +1 | findbugs | 44 | trunk passed |
   | +1 | javadoc | 38 | trunk passed |
   ||| _ Patch Compile Tests _ |
   | 0 | mvndep | 13 | Maven dependency ordering for patch |
   | -1 | mvninstall | 18 | dist in the patch failed. |
   | +1 | compile | 114 | the patch passed |
   | +1 | javac | 114 | the patch passed |
   | +1 | checkstyle | 24 | the patch passed |
   | +1 | mvnsite | 45 | the patch passed |
   | +1 | whitespace | 0 | The patch has no whitespace issues. |
   | +1 | shadedclient | 726 | patch has no errors when building and testing 
our client artifacts. |
   | 0 | findbugs | 0 | Skipped patched modules with no Java source: 
hadoop-ozone/dist |
   | +1 | findbugs | 47 | the patch passed |
   | +1 | javadoc | 40 | the patch passed |
   ||| _ Other Tests _ |
   | +1 | unit | 43 | ozone-manager in the patch passed. |
   | +1 | unit | 24 | dist in the patch passed. |
   | +1 | asflicense | 30 | The patch does not generate ASF License warnings. |
   | | | 3482 | |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | Client=17.05.0-ce Server=17.05.0-ce base: 
https://builds.apache.org/job/hadoop-multibranch/job/PR-758/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/758 |
   | Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall 
 mvnsite  unit  shadedclient  yamllint  findbugs  checkstyle  |
   | uname | Linux d3c23ad8d320 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | personality/hadoop.sh |
   | git revision | trunk / f4ab937 |
   | maven | version: Apache Maven 3.3.9 |
   | Default Java | 1.8.0_191 |
   | findbugs | v3.1.0-RC1 |
   | mvninstall | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-758/1/artifact/out/patch-mvninstall-hadoop-ozone_dist.txt
 |
   |  Test Results | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-758/1/testReport/ |
   | Max. process+thread count | 411 (vs. ulimit of 5500) |
   | modules | C: hadoop-ozone/ozone-manager hadoop-ozone/dist U: hadoop-ozone |
   | Console output | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-758/1/console |
   | Powered by | Apache Yetus 0.9.0 http://yetus.apache.org |
   
   
   This message was automatically generated.
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 230980)
Time Spent: 20m  (was: 10m)

> Make the DNS resolution in OzoneManager more resilient
> --
>
> Key: HDDS-999
> URL: https://issues.apache.org/jira/browse/HDDS-999
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Reporter: Elek, Marton
>Assignee: Siddharth Wagle
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDDS-999.01.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> If t

[jira] [Work logged] (HDDS-1441) Remove usage of getRetryFailureException

2019-04-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1441?focusedWorklogId=230966&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-230966
 ]

ASF GitHub Bot logged work on HDDS-1441:


Author: ASF GitHub Bot
Created on: 22/Apr/19 23:55
Start Date: 22/Apr/19 23:55
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on issue #745: HDDS-1441. Remove 
usage of getRetryFailureException. (swagle)
URL: https://github.com/apache/hadoop/pull/745#issuecomment-485592408
 
 
   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | 0 | reexec | 61 | Docker mode activated. |
   ||| _ Prechecks _ |
   | +1 | @author | 1 | The patch does not contain any @author tags. |
   | -1 | test4tests | 0 | The patch doesn't appear to include any new or 
modified tests.  Please justify why no new tests are needed for this patch. 
Also please list what manual steps were performed to verify this patch. |
   ||| _ trunk Compile Tests _ |
   | 0 | mvndep | 314 | Maven dependency ordering for branch |
   | +1 | mvninstall | 1437 | trunk passed |
   | +1 | compile | 1384 | trunk passed |
   | +1 | checkstyle | 167 | trunk passed |
   | +1 | mvnsite | 461 | trunk passed |
   | +1 | shadedclient | 1411 | branch has no errors when building and testing 
our client artifacts. |
   | 0 | findbugs | 0 | Skipped patched modules with no Java source: 
hadoop-hdds hadoop-ozone |
   | +1 | findbugs | 95 | trunk passed |
   | +1 | javadoc | 221 | trunk passed |
   ||| _ Patch Compile Tests _ |
   | 0 | mvndep | 25 | Maven dependency ordering for patch |
   | -1 | mvninstall | 39 | hadoop-hdds in the patch failed. |
   | -1 | mvninstall | 12 | client in the patch failed. |
   | -1 | mvninstall | 16 | hadoop-ozone in the patch failed. |
   | -1 | mvninstall | 12 | ozone-manager in the patch failed. |
   | +1 | compile | 915 | the patch passed |
   | +1 | javac | 915 | the patch passed |
   | +1 | checkstyle | 141 | the patch passed |
   | -1 | mvnsite | 26 | hadoop-hdds in the patch failed. |
   | -1 | mvnsite | 21 | client in the patch failed. |
   | -1 | mvnsite | 26 | hadoop-ozone in the patch failed. |
   | -1 | mvnsite | 22 | ozone-manager in the patch failed. |
   | +1 | whitespace | 0 | The patch has no whitespace issues. |
   | +1 | xml | 2 | The patch has no ill-formed XML file. |
   | +1 | shadedclient | 707 | patch has no errors when building and testing 
our client artifacts. |
   | 0 | findbugs | 0 | Skipped patched modules with no Java source: 
hadoop-hdds hadoop-ozone |
   | -1 | findbugs | 21 | client in the patch failed. |
   | -1 | findbugs | 21 | ozone-manager in the patch failed. |
   | -1 | javadoc | 23 | hadoop-hdds in the patch failed. |
   | -1 | javadoc | 21 | client in the patch failed. |
   | -1 | javadoc | 23 | hadoop-ozone in the patch failed. |
   | -1 | javadoc | 22 | ozone-manager in the patch failed. |
   ||| _ Other Tests _ |
   | -1 | unit | 28 | hadoop-hdds in the patch failed. |
   | -1 | unit | 23 | client in the patch failed. |
   | -1 | unit | 24 | hadoop-ozone in the patch failed. |
   | -1 | unit | 22 | ozone-manager in the patch failed. |
   | +1 | asflicense | 42 | The patch does not generate ASF License warnings. |
   | | | 7423 | |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | Client=17.05.0-ce Server=17.05.0-ce base: 
https://builds.apache.org/job/hadoop-multibranch/job/PR-745/5/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/745 |
   | Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall 
 mvnsite  unit  shadedclient  findbugs  checkstyle  xml  |
   | uname | Linux 76c5899e4a1b 4.4.0-143-generic #169~14.04.2-Ubuntu SMP Wed 
Feb 13 15:00:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | personality/hadoop.sh |
   | git revision | trunk / a54c1e3 |
   | maven | version: Apache Maven 3.3.9 |
   | Default Java | 1.8.0_191 |
   | findbugs | v3.1.0-RC1 |
   | mvninstall | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-745/5/artifact/out/patch-mvninstall-hadoop-hdds.txt
 |
   | mvninstall | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-745/5/artifact/out/patch-mvninstall-hadoop-hdds_client.txt
 |
   | mvninstall | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-745/5/artifact/out/patch-mvninstall-hadoop-ozone.txt
 |
   | mvninstall | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-745/5/artifact/out/patch-mvninstall-hadoop-ozone_ozone-manager.txt
 |
   | mvnsite | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-745/5/artifact/out/patch-mvnsite-hadoop-hdds.txt
 |
   | mvnsite | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-745/5/artifact/out/patch-mvnsite-hadoop-hdds_client.txt
 |
   | mvnsite | 
https

[jira] [Updated] (HDDS-999) Make the DNS resolution in OzoneManager more resilient

2019-04-22 Thread Siddharth Wagle (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Wagle updated HDDS-999:
-
Status: Patch Available  (was: Open)

cc: [~arpitagarwal] / [~elek]

> Make the DNS resolution in OzoneManager more resilient
> --
>
> Key: HDDS-999
> URL: https://issues.apache.org/jira/browse/HDDS-999
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Reporter: Elek, Marton
>Assignee: Siddharth Wagle
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDDS-999.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If the OzoneManager is started before scm the scm dns may not be available. 
> In this case the om should retry and re-resolve the dns, but as of now it 
> throws an exception:
> {code:java}
> 2019-01-23 17:14:25 ERROR OzoneManager:593 - Failed to start the OzoneManager.
> java.net.SocketException: Call From om-0.om to null:0 failed on socket 
> exception: java.net.SocketException: Unresolved address; For more details 
> see:  http://wiki.apache.org/hadoop/SocketException
>     at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>     at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>     at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>     at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831)
>     at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:798)
>     at org.apache.hadoop.ipc.Server.bind(Server.java:566)
>     at org.apache.hadoop.ipc.Server$Listener.(Server.java:1042)
>     at org.apache.hadoop.ipc.Server.(Server.java:2815)
>     at org.apache.hadoop.ipc.RPC$Server.(RPC.java:994)
>     at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server.(ProtobufRpcEngine.java:421)
>     at 
> org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:342)
>     at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:804)
>     at 
> org.apache.hadoop.ozone.om.OzoneManager.startRpcServer(OzoneManager.java:563)
>     at 
> org.apache.hadoop.ozone.om.OzoneManager.getRpcServer(OzoneManager.java:927)
>     at org.apache.hadoop.ozone.om.OzoneManager.(OzoneManager.java:265)
>     at org.apache.hadoop.ozone.om.OzoneManager.createOm(OzoneManager.java:674)
>     at org.apache.hadoop.ozone.om.OzoneManager.main(OzoneManager.java:587)
> Caused by: java.net.SocketException: Unresolved address
>     at sun.nio.ch.Net.translateToSocketException(Net.java:131)
>     at sun.nio.ch.Net.translateException(Net.java:157)
>     at sun.nio.ch.Net.translateException(Net.java:163)
>     at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:76)
>     at org.apache.hadoop.ipc.Server.bind(Server.java:549)
>     ... 11 more
> Caused by: java.nio.channels.UnresolvedAddressException
>     at sun.nio.ch.Net.checkAddress(Net.java:101)
>     at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:218)
>     at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
>     ... 12 more{code}
> It should be fixed. (See also HDDS-421 which fixed the same problem in 
> datanode side and HDDS-907 which is the workaround while this issue is not 
> resolved).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-999) Make the DNS resolution in OzoneManager more resilient

2019-04-22 Thread Siddharth Wagle (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Wagle updated HDDS-999:
-
Attachment: HDDS-999.01.patch

> Make the DNS resolution in OzoneManager more resilient
> --
>
> Key: HDDS-999
> URL: https://issues.apache.org/jira/browse/HDDS-999
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Reporter: Elek, Marton
>Assignee: Siddharth Wagle
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDDS-999.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If the OzoneManager is started before scm the scm dns may not be available. 
> In this case the om should retry and re-resolve the dns, but as of now it 
> throws an exception:
> {code:java}
> 2019-01-23 17:14:25 ERROR OzoneManager:593 - Failed to start the OzoneManager.
> java.net.SocketException: Call From om-0.om to null:0 failed on socket 
> exception: java.net.SocketException: Unresolved address; For more details 
> see:  http://wiki.apache.org/hadoop/SocketException
>     at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>     at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>     at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>     at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831)
>     at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:798)
>     at org.apache.hadoop.ipc.Server.bind(Server.java:566)
>     at org.apache.hadoop.ipc.Server$Listener.(Server.java:1042)
>     at org.apache.hadoop.ipc.Server.(Server.java:2815)
>     at org.apache.hadoop.ipc.RPC$Server.(RPC.java:994)
>     at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server.(ProtobufRpcEngine.java:421)
>     at 
> org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:342)
>     at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:804)
>     at 
> org.apache.hadoop.ozone.om.OzoneManager.startRpcServer(OzoneManager.java:563)
>     at 
> org.apache.hadoop.ozone.om.OzoneManager.getRpcServer(OzoneManager.java:927)
>     at org.apache.hadoop.ozone.om.OzoneManager.(OzoneManager.java:265)
>     at org.apache.hadoop.ozone.om.OzoneManager.createOm(OzoneManager.java:674)
>     at org.apache.hadoop.ozone.om.OzoneManager.main(OzoneManager.java:587)
> Caused by: java.net.SocketException: Unresolved address
>     at sun.nio.ch.Net.translateToSocketException(Net.java:131)
>     at sun.nio.ch.Net.translateException(Net.java:157)
>     at sun.nio.ch.Net.translateException(Net.java:163)
>     at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:76)
>     at org.apache.hadoop.ipc.Server.bind(Server.java:549)
>     ... 11 more
> Caused by: java.nio.channels.UnresolvedAddressException
>     at sun.nio.ch.Net.checkAddress(Net.java:101)
>     at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:218)
>     at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
>     ... 12 more{code}
> It should be fixed. (See also HDDS-421 which fixed the same problem in 
> datanode side and HDDS-907 which is the workaround while this issue is not 
> resolved).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-999) Make the DNS resolution in OzoneManager more resilient

2019-04-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-999:

Labels: pull-request-available  (was: )

> Make the DNS resolution in OzoneManager more resilient
> --
>
> Key: HDDS-999
> URL: https://issues.apache.org/jira/browse/HDDS-999
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Reporter: Elek, Marton
>Assignee: Siddharth Wagle
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDDS-999.01.patch
>
>
> If the OzoneManager is started before scm the scm dns may not be available. 
> In this case the om should retry and re-resolve the dns, but as of now it 
> throws an exception:
> {code:java}
> 2019-01-23 17:14:25 ERROR OzoneManager:593 - Failed to start the OzoneManager.
> java.net.SocketException: Call From om-0.om to null:0 failed on socket 
> exception: java.net.SocketException: Unresolved address; For more details 
> see:  http://wiki.apache.org/hadoop/SocketException
>     at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>     at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>     at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>     at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831)
>     at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:798)
>     at org.apache.hadoop.ipc.Server.bind(Server.java:566)
>     at org.apache.hadoop.ipc.Server$Listener.(Server.java:1042)
>     at org.apache.hadoop.ipc.Server.(Server.java:2815)
>     at org.apache.hadoop.ipc.RPC$Server.(RPC.java:994)
>     at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server.(ProtobufRpcEngine.java:421)
>     at 
> org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:342)
>     at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:804)
>     at 
> org.apache.hadoop.ozone.om.OzoneManager.startRpcServer(OzoneManager.java:563)
>     at 
> org.apache.hadoop.ozone.om.OzoneManager.getRpcServer(OzoneManager.java:927)
>     at org.apache.hadoop.ozone.om.OzoneManager.(OzoneManager.java:265)
>     at org.apache.hadoop.ozone.om.OzoneManager.createOm(OzoneManager.java:674)
>     at org.apache.hadoop.ozone.om.OzoneManager.main(OzoneManager.java:587)
> Caused by: java.net.SocketException: Unresolved address
>     at sun.nio.ch.Net.translateToSocketException(Net.java:131)
>     at sun.nio.ch.Net.translateException(Net.java:157)
>     at sun.nio.ch.Net.translateException(Net.java:163)
>     at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:76)
>     at org.apache.hadoop.ipc.Server.bind(Server.java:549)
>     ... 11 more
> Caused by: java.nio.channels.UnresolvedAddressException
>     at sun.nio.ch.Net.checkAddress(Net.java:101)
>     at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:218)
>     at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
>     ... 12 more{code}
> It should be fixed. (See also HDDS-421 which fixed the same problem in 
> datanode side and HDDS-907 which is the workaround while this issue is not 
> resolved).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-999) Make the DNS resolution in OzoneManager more resilient

2019-04-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-999?focusedWorklogId=230965&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-230965
 ]

ASF GitHub Bot logged work on HDDS-999:
---

Author: ASF GitHub Bot
Created on: 22/Apr/19 23:51
Start Date: 22/Apr/19 23:51
Worklog Time Spent: 10m 
  Work Description: swagle commented on pull request #758: HDDS-999. Make 
the DNS resolution in OzoneManager more resilient. (swagle)
URL: https://github.com/apache/hadoop/pull/758
 
 
   Brought back change from HDDS-776 with the retriable task, OM will now wait 
for at least 50 seconds before giving up.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 230965)
Time Spent: 10m
Remaining Estimate: 0h

> Make the DNS resolution in OzoneManager more resilient
> --
>
> Key: HDDS-999
> URL: https://issues.apache.org/jira/browse/HDDS-999
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Reporter: Elek, Marton
>Assignee: Siddharth Wagle
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDDS-999.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If the OzoneManager is started before scm the scm dns may not be available. 
> In this case the om should retry and re-resolve the dns, but as of now it 
> throws an exception:
> {code:java}
> 2019-01-23 17:14:25 ERROR OzoneManager:593 - Failed to start the OzoneManager.
> java.net.SocketException: Call From om-0.om to null:0 failed on socket 
> exception: java.net.SocketException: Unresolved address; For more details 
> see:  http://wiki.apache.org/hadoop/SocketException
>     at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>     at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>     at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>     at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831)
>     at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:798)
>     at org.apache.hadoop.ipc.Server.bind(Server.java:566)
>     at org.apache.hadoop.ipc.Server$Listener.(Server.java:1042)
>     at org.apache.hadoop.ipc.Server.(Server.java:2815)
>     at org.apache.hadoop.ipc.RPC$Server.(RPC.java:994)
>     at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server.(ProtobufRpcEngine.java:421)
>     at 
> org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:342)
>     at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:804)
>     at 
> org.apache.hadoop.ozone.om.OzoneManager.startRpcServer(OzoneManager.java:563)
>     at 
> org.apache.hadoop.ozone.om.OzoneManager.getRpcServer(OzoneManager.java:927)
>     at org.apache.hadoop.ozone.om.OzoneManager.(OzoneManager.java:265)
>     at org.apache.hadoop.ozone.om.OzoneManager.createOm(OzoneManager.java:674)
>     at org.apache.hadoop.ozone.om.OzoneManager.main(OzoneManager.java:587)
> Caused by: java.net.SocketException: Unresolved address
>     at sun.nio.ch.Net.translateToSocketException(Net.java:131)
>     at sun.nio.ch.Net.translateException(Net.java:157)
>     at sun.nio.ch.Net.translateException(Net.java:163)
>     at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:76)
>     at org.apache.hadoop.ipc.Server.bind(Server.java:549)
>     ... 11 more
> Caused by: java.nio.channels.UnresolvedAddressException
>     at sun.nio.ch.Net.checkAddress(Net.java:101)
>     at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:218)
>     at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
>     ... 12 more{code}
> It should be fixed. (See also HDDS-421 which fixed the same problem in 
> datanode side and HDDS-907 which is the workaround while this issue is not 
> resolved).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1301) Optimize recursive ozone filesystem apis

2019-04-22 Thread Anu Engineer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823567#comment-16823567
 ] 

Anu Engineer commented on HDDS-1301:


Thank you, I will sync with you, but I will write and post a design document 
that explains all these changes and the possible changes in the output 
committer. Then based on your feedback, we can shape the Ozone manager API.

> Optimize recursive ozone filesystem apis
> 
>
> Key: HDDS-1301
> URL: https://issues.apache.org/jira/browse/HDDS-1301
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDDS-1301.001.patch
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> This Jira aims to optimise recursive apis in ozone file system. These are the 
> apis which have a recursive flag which requires an operation to be performed 
> on all the children of the directory. The Jira would add support for 
> recursive apis in Ozone manager in order to reduce the number of rpc calls to 
> Ozone Manager. Also currently these operations are not atomic. This Jira 
> would make all the operations in ozone filesystem atomic.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-999) Make the DNS resolution in OzoneManager more resilient

2019-04-22 Thread Siddharth Wagle (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823562#comment-16823562
 ] 

Siddharth Wagle commented on HDDS-999:
--

I am able to easily reproduce this on the latest trunk, working on a patch to 
get HDDS-776 changes effective. 

{code}
om_1| 2019-04-22 22:30:15 ERROR OzoneManager:888 - Failed to start the 
OzoneManager.
om_1| java.io.IOException: Invalid host name: local host is: (unknown); 
destination host is: "scm":9863; java.net.UnknownHostException; For more 
details see:  http://wiki.apache.org/hadoop/UnknownHost
om_1|   at 
org.apache.hadoop.hdds.scm.protocolPB.ScmBlockLocationProtocolClientSideTranslatorPB.transformServiceException(ScmBlockLocationProtocolClientSideTranslatorPB.java:173)
om_1|   at 
org.apache.hadoop.hdds.scm.protocolPB.ScmBlockLocationProtocolClientSideTranslatorPB.getScmInfo(ScmBlockLocationProtocolClientSideTranslatorPB.java:197)
om_1|   at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
om_1|   at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
om_1|   at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
om_1|   at java.base/java.lang.reflect.Method.invoke(Method.java:566)
om_1|   at 
org.apache.hadoop.hdds.tracing.TraceAllMethod.invoke(TraceAllMethod.java:66)
om_1|   at com.sun.proxy.$Proxy32.getScmInfo(Unknown Source)
om_1|   at 
org.apache.hadoop.ozone.om.OzoneManager.(OzoneManager.java:305)
om_1|   at 
org.apache.hadoop.ozone.om.OzoneManager.createOm(OzoneManager.java:964)
om_1|   at 
org.apache.hadoop.ozone.om.OzoneManager.main(OzoneManager.java:882)
om_1| Caused by: java.net.UnknownHostException
om_1|   at 
org.apache.hadoop.ipc.Client$Connection.(Client.java:450)
om_1|   at org.apache.hadoop.ipc.Client.getConnection(Client.java:1552)
om_1|   at org.apache.hadoop.ipc.Client.call(Client.java:1403)
om_1|   at org.apache.hadoop.ipc.Client.call(Client.java:1367)
om_1|   at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
om_1|   at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
om_1|   at com.sun.proxy.$Proxy31.getScmInfo(Unknown Source)
om_1|   at 
org.apache.hadoop.hdds.scm.protocolPB.ScmBlockLocationProtocolClientSideTranslatorPB.getScmInfo(ScmBlockLocationProtocolClientSideTranslatorPB.java:195)
om_1|   ... 9 more
om_1| 2019-04-22 22:30:15 INFO  ExitUtil:210 - Exiting with status 1: 
java.io.IOException: Invalid host name: local host is: (unknown); destination 
host is: "scm":9863; java.net.UnknownHostException; For more details see:  
http://wiki.apache.org/hadoop/UnknownHost
om_1| 2019-04-22 22:30:15 INFO  OzoneManager:51 - SHUTDOWN_MSG:
om_1| /
om_1| SHUTDOWN_MSG: Shutting down OzoneManager at 
989273176ea2/172.21.0.2
om_1| /
{code}

> Make the DNS resolution in OzoneManager more resilient
> --
>
> Key: HDDS-999
> URL: https://issues.apache.org/jira/browse/HDDS-999
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Reporter: Elek, Marton
>Assignee: Siddharth Wagle
>Priority: Major
>
> If the OzoneManager is started before scm the scm dns may not be available. 
> In this case the om should retry and re-resolve the dns, but as of now it 
> throws an exception:
> {code:java}
> 2019-01-23 17:14:25 ERROR OzoneManager:593 - Failed to start the OzoneManager.
> java.net.SocketException: Call From om-0.om to null:0 failed on socket 
> exception: java.net.SocketException: Unresolved address; For more details 
> see:  http://wiki.apache.org/hadoop/SocketException
>     at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>     at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>     at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>     at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831)
>     at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:798)
>     at org.apache.hadoop.ipc.Server.bind(Server.java:566)
>     at org.apache.hadoop.ipc.Server$Listener.(Server.java:1042)
>     at org.apache.hadoop.ipc.Server.(Server.java:2815)
>     at org.apache.hadoop.ipc.RPC$Server.(RPC.java:994)
>     at 
> 

[jira] [Work logged] (HDDS-1450) Fix nightly run failures after HDDS-976

2019-04-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1450?focusedWorklogId=230928&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-230928
 ]

ASF GitHub Bot logged work on HDDS-1450:


Author: ASF GitHub Bot
Created on: 22/Apr/19 22:45
Start Date: 22/Apr/19 22:45
Worklog Time Spent: 10m 
  Work Description: cjjnjust commented on issue #757: HDDS-1450. Fix 
nightly run failures after HDDS-976. Contributed by Xi…
URL: https://github.com/apache/hadoop/pull/757#issuecomment-485578266
 
 
   Thanks @xiaoyuyao, Does this fix the nightly build? It seems not fix "the 
good.xml file not found" issue, how does it work?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 230928)
Time Spent: 40m  (was: 0.5h)

> Fix nightly run failures after HDDS-976
> ---
>
> Key: HDDS-1450
> URL: https://issues.apache.org/jira/browse/HDDS-1450
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> [https://ci.anzix.net/job/ozone-nightly/72/testReport/]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14426) RBF: Add delegation token total count as one of the federation metrics

2019-04-22 Thread CR Hota (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823492#comment-16823492
 ] 

CR Hota commented on HDFS-14426:


[~elgoiri] 

Strangely I don't see the commit in HDFS-13891 branch yet.

> RBF: Add delegation token total count as one of the federation metrics
> --
>
> Key: HDFS-14426
> URL: https://issues.apache.org/jira/browse/HDFS-14426
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Fengnan Li
>Assignee: Fengnan Li
>Priority: Major
> Attachments: HDFS-14426-HDFS-13891.001.patch, HDFS-14426.001.patch
>
>
> Currently router doesn't report the total number of current valid delegation 
> tokens it has, but this piece of information is useful for monitoring and 
> understanding the real time situation of tokens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14435) ObserverReadProxyProvider is unable to properly fetch HAState from Standby NNs

2019-04-22 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823484#comment-16823484
 ] 

Hudson commented on HDFS-14435:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #16448 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/16448/])
HDFS-14435. [SBN Read] Enable ObserverReadProxyProvider to gracefully (xkrogen: 
rev 174b7d3126e215c519b1c4a74892c7020712f9df)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/HATestUtil.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestDelegationTokensWithHA.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/ObserverReadProxyProvider.java


> ObserverReadProxyProvider is unable to properly fetch HAState from Standby NNs
> --
>
> Key: HDFS-14435
> URL: https://issues.apache.org/jira/browse/HDFS-14435
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, nn
>Affects Versions: 3.3.0
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14435.000.patch, HDFS-14435.001.patch, 
> HDFS-14435.002.patch, HDFS-14435.003.patch, HDFS-14435.004.patch, 
> HDFS-14435.005.patch, dt_stack_trace.png
>
>
> We have been seeing issues during testing of the Consistent Read from Standby 
> feature that indicate that ORPP is unable to call {{getHAServiceState}} on 
> Standby NNs, as they are rejected with a {{StandbyException}}. Upon further 
> investigation, we realized that although the Standby allows the 
> {{getHAServiceState()}} call, reading a delegation token is not allowed in 
> Standby state, thus the call will fail when using DT-based authentication. 
> This hasn't caused issues in practice, since ORPP assumes that the state is 
> Standby if it is unable to fetch the state, but we should fix the logic to 
> properly handle this scenario.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14374) Expose total number of delegation tokens in AbstractDelegationTokenSecretManager

2019-04-22 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823485#comment-16823485
 ] 

Hudson commented on HDFS-14374:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #16448 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/16448/])
HDFS-14374. Expose total number of delegation tokens in (inigoiri: rev 
fb1c5491398bbdac181e867022881fe2ff73c884)
* (edit) 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/security/token/delegation/TestDelegationToken.java
* (edit) 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/token/delegation/AbstractDelegationTokenSecretManager.java


> Expose total number of delegation tokens in 
> AbstractDelegationTokenSecretManager
> 
>
> Key: HDFS-14374
> URL: https://issues.apache.org/jira/browse/HDFS-14374
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: CR Hota
>Assignee: CR Hota
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14374.001.patch, HDFS-14374.002.patch
>
>
> AbstractDelegationTokenSecretManager should expose total number of active 
> delegation tokens for specific implementations to track for observability.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14426) RBF: Add delegation token total count as one of the federation metrics

2019-04-22 Thread JIRA


[ 
https://issues.apache.org/jira/browse/HDFS-14426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823480#comment-16823480
 ] 

Íñigo Goiri commented on HDFS-14426:


It should already be rebased.

> RBF: Add delegation token total count as one of the federation metrics
> --
>
> Key: HDFS-14426
> URL: https://issues.apache.org/jira/browse/HDFS-14426
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Fengnan Li
>Assignee: Fengnan Li
>Priority: Major
> Attachments: HDFS-14426-HDFS-13891.001.patch, HDFS-14426.001.patch
>
>
> Currently router doesn't report the total number of current valid delegation 
> tokens it has, but this piece of information is useful for monitoring and 
> understanding the real time situation of tokens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14426) RBF: Add delegation token total count as one of the federation metrics

2019-04-22 Thread CR Hota (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823460#comment-16823460
 ] 

CR Hota commented on HDFS-14426:


[~elgoiri] Could you please help rebase HDFS-13891?

 

> RBF: Add delegation token total count as one of the federation metrics
> --
>
> Key: HDFS-14426
> URL: https://issues.apache.org/jira/browse/HDFS-14426
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Fengnan Li
>Assignee: Fengnan Li
>Priority: Major
> Attachments: HDFS-14426-HDFS-13891.001.patch, HDFS-14426.001.patch
>
>
> Currently router doesn't report the total number of current valid delegation 
> tokens it has, but this piece of information is useful for monitoring and 
> understanding the real time situation of tokens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14374) Expose total number of delegation tokens in AbstractDelegationTokenSecretManager

2019-04-22 Thread CR Hota (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823456#comment-16823456
 ] 

CR Hota commented on HDFS-14374:


[~elgoiri]  Thanks for the review and commit. 

> Expose total number of delegation tokens in 
> AbstractDelegationTokenSecretManager
> 
>
> Key: HDFS-14374
> URL: https://issues.apache.org/jira/browse/HDFS-14374
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: CR Hota
>Assignee: CR Hota
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14374.001.patch, HDFS-14374.002.patch
>
>
> AbstractDelegationTokenSecretManager should expose total number of active 
> delegation tokens for specific implementations to track for observability.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14445) TestTrySendErrorReportWhenNNThrowsIOException fails in trunk

2019-04-22 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823450#comment-16823450
 ] 

Hudson commented on HDFS-14445:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #16447 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/16447/])
HDFS-14445. TestTrySendErrorReportWhenNNThrowsIOException fails in (inigoiri: 
rev 5321235fe8d89f01fe2c141fdef5d8186a6b20dd)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBPOfferService.java


> TestTrySendErrorReportWhenNNThrowsIOException fails in trunk
> 
>
> Key: HDFS-14445
> URL: https://issues.apache.org/jira/browse/HDFS-14445
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14445-01.patch, HDFS-14445-02.patch
>
>
> {noformat}
> Active namenode didn't add the report back to the queue when errorReport 
> threw IOException
> {noformat}
> Reference :::
> https://builds.apache.org/job/PreCommit-HDFS-Build/26676/testReport/org.apache.hadoop.hdfs.server.datanode/TestBPOfferService/testTrySendErrorReportWhenNNThrowsIOException/
> https://builds.apache.org/job/PreCommit-HDFS-Build/26662/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestBPOfferService/testTrySendErrorReportWhenNNThrowsIOException/
> https://builds.apache.org/job/PreCommit-HDFS-Build/26661/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestBPOfferService/testTrySendErrorReportWhenNNThrowsIOException/
> https://builds.apache.org/job/PreCommit-HDFS-Build/26644/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestBPOfferService/testTrySendErrorReportWhenNNThrowsIOException/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14374) Expose total number of delegation tokens in AbstractDelegationTokenSecretManager

2019-04-22 Thread JIRA


[ 
https://issues.apache.org/jira/browse/HDFS-14374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823439#comment-16823439
 ] 

Íñigo Goiri commented on HDFS-14374:


Thanks [~crh] for the patch.
Committed to trunk.

> Expose total number of delegation tokens in 
> AbstractDelegationTokenSecretManager
> 
>
> Key: HDFS-14374
> URL: https://issues.apache.org/jira/browse/HDFS-14374
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: CR Hota
>Assignee: CR Hota
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14374.001.patch, HDFS-14374.002.patch
>
>
> AbstractDelegationTokenSecretManager should expose total number of active 
> delegation tokens for specific implementations to track for observability.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14374) Expose total number of delegation tokens in AbstractDelegationTokenSecretManager

2019-04-22 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/HDFS-14374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri updated HDFS-14374:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.3.0
   Status: Resolved  (was: Patch Available)

> Expose total number of delegation tokens in 
> AbstractDelegationTokenSecretManager
> 
>
> Key: HDFS-14374
> URL: https://issues.apache.org/jira/browse/HDFS-14374
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: CR Hota
>Assignee: CR Hota
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14374.001.patch, HDFS-14374.002.patch
>
>
> AbstractDelegationTokenSecretManager should expose total number of active 
> delegation tokens for specific implementations to track for observability.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14435) ObserverReadProxyProvider is unable to properly fetch HAState from Standby NNs

2019-04-22 Thread Erik Krogen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Krogen updated HDFS-14435:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.3.0
   Status: Resolved  (was: Patch Available)

> ObserverReadProxyProvider is unable to properly fetch HAState from Standby NNs
> --
>
> Key: HDFS-14435
> URL: https://issues.apache.org/jira/browse/HDFS-14435
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, nn
>Affects Versions: 3.3.0
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14435.000.patch, HDFS-14435.001.patch, 
> HDFS-14435.002.patch, HDFS-14435.003.patch, HDFS-14435.004.patch, 
> HDFS-14435.005.patch, dt_stack_trace.png
>
>
> We have been seeing issues during testing of the Consistent Read from Standby 
> feature that indicate that ORPP is unable to call {{getHAServiceState}} on 
> Standby NNs, as they are rejected with a {{StandbyException}}. Upon further 
> investigation, we realized that although the Standby allows the 
> {{getHAServiceState()}} call, reading a delegation token is not allowed in 
> Standby state, thus the call will fail when using DT-based authentication. 
> This hasn't caused issues in practice, since ORPP assumes that the state is 
> Standby if it is unable to fetch the state, but we should fix the logic to 
> properly handle this scenario.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14435) ObserverReadProxyProvider is unable to properly fetch HAState from Standby NNs

2019-04-22 Thread Erik Krogen (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823438#comment-16823438
 ] 

Erik Krogen commented on HDFS-14435:


Thanks all! Committed to trunk.

> ObserverReadProxyProvider is unable to properly fetch HAState from Standby NNs
> --
>
> Key: HDFS-14435
> URL: https://issues.apache.org/jira/browse/HDFS-14435
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, nn
>Affects Versions: 3.3.0
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14435.000.patch, HDFS-14435.001.patch, 
> HDFS-14435.002.patch, HDFS-14435.003.patch, HDFS-14435.004.patch, 
> HDFS-14435.005.patch, dt_stack_trace.png
>
>
> We have been seeing issues during testing of the Consistent Read from Standby 
> feature that indicate that ORPP is unable to call {{getHAServiceState}} on 
> Standby NNs, as they are rejected with a {{StandbyException}}. Upon further 
> investigation, we realized that although the Standby allows the 
> {{getHAServiceState()}} call, reading a delegation token is not allowed in 
> Standby state, thus the call will fail when using DT-based authentication. 
> This hasn't caused issues in practice, since ORPP assumes that the state is 
> Standby if it is unable to fetch the state, but we should fix the logic to 
> properly handle this scenario.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14374) Expose total number of delegation tokens in AbstractDelegationTokenSecretManager

2019-04-22 Thread JIRA


[ 
https://issues.apache.org/jira/browse/HDFS-14374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823437#comment-16823437
 ] 

Íñigo Goiri commented on HDFS-14374:


+1 on [^HDFS-14374.002.patch].
[~hexiaoqiao] we will open a follow up JIRA to expose this metric in the Router 
and the Namenode.
At that point, we should discuss what are the implications in terms of 
information being leaked with this.
I think it should be fine but worth mentioning.

> Expose total number of delegation tokens in 
> AbstractDelegationTokenSecretManager
> 
>
> Key: HDFS-14374
> URL: https://issues.apache.org/jira/browse/HDFS-14374
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: CR Hota
>Assignee: CR Hota
>Priority: Major
> Attachments: HDFS-14374.001.patch, HDFS-14374.002.patch
>
>
> AbstractDelegationTokenSecretManager should expose total number of active 
> delegation tokens for specific implementations to track for observability.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14435) ObserverReadProxyProvider is unable to properly fetch HAState from Standby NNs

2019-04-22 Thread JIRA


[ 
https://issues.apache.org/jira/browse/HDFS-14435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823432#comment-16823432
 ] 

Íñigo Goiri commented on HDFS-14435:


+1 on  [^HDFS-14435.004.patch].

> ObserverReadProxyProvider is unable to properly fetch HAState from Standby NNs
> --
>
> Key: HDFS-14435
> URL: https://issues.apache.org/jira/browse/HDFS-14435
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, nn
>Affects Versions: 3.3.0
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Attachments: HDFS-14435.000.patch, HDFS-14435.001.patch, 
> HDFS-14435.002.patch, HDFS-14435.003.patch, HDFS-14435.004.patch, 
> HDFS-14435.005.patch, dt_stack_trace.png
>
>
> We have been seeing issues during testing of the Consistent Read from Standby 
> feature that indicate that ORPP is unable to call {{getHAServiceState}} on 
> Standby NNs, as they are rejected with a {{StandbyException}}. Upon further 
> investigation, we realized that although the Standby allows the 
> {{getHAServiceState()}} call, reading a delegation token is not allowed in 
> Standby state, thus the call will fail when using DT-based authentication. 
> This hasn't caused issues in practice, since ORPP assumes that the state is 
> Standby if it is unable to fetch the state, but we should fix the logic to 
> properly handle this scenario.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14445) TestTrySendErrorReportWhenNNThrowsIOException fails in trunk

2019-04-22 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/HDFS-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri updated HDFS-14445:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.3.0
   Status: Resolved  (was: Patch Available)

> TestTrySendErrorReportWhenNNThrowsIOException fails in trunk
> 
>
> Key: HDFS-14445
> URL: https://issues.apache.org/jira/browse/HDFS-14445
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14445-01.patch, HDFS-14445-02.patch
>
>
> {noformat}
> Active namenode didn't add the report back to the queue when errorReport 
> threw IOException
> {noformat}
> Reference :::
> https://builds.apache.org/job/PreCommit-HDFS-Build/26676/testReport/org.apache.hadoop.hdfs.server.datanode/TestBPOfferService/testTrySendErrorReportWhenNNThrowsIOException/
> https://builds.apache.org/job/PreCommit-HDFS-Build/26662/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestBPOfferService/testTrySendErrorReportWhenNNThrowsIOException/
> https://builds.apache.org/job/PreCommit-HDFS-Build/26661/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestBPOfferService/testTrySendErrorReportWhenNNThrowsIOException/
> https://builds.apache.org/job/PreCommit-HDFS-Build/26644/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestBPOfferService/testTrySendErrorReportWhenNNThrowsIOException/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14445) TestTrySendErrorReportWhenNNThrowsIOException fails in trunk

2019-04-22 Thread JIRA


[ 
https://issues.apache.org/jira/browse/HDFS-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823430#comment-16823430
 ] 

Íñigo Goiri commented on HDFS-14445:


Thanks [~ayushtkn] for the patch.
Committed to trunk.

> TestTrySendErrorReportWhenNNThrowsIOException fails in trunk
> 
>
> Key: HDFS-14445
> URL: https://issues.apache.org/jira/browse/HDFS-14445
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14445-01.patch, HDFS-14445-02.patch
>
>
> {noformat}
> Active namenode didn't add the report back to the queue when errorReport 
> threw IOException
> {noformat}
> Reference :::
> https://builds.apache.org/job/PreCommit-HDFS-Build/26676/testReport/org.apache.hadoop.hdfs.server.datanode/TestBPOfferService/testTrySendErrorReportWhenNNThrowsIOException/
> https://builds.apache.org/job/PreCommit-HDFS-Build/26662/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestBPOfferService/testTrySendErrorReportWhenNNThrowsIOException/
> https://builds.apache.org/job/PreCommit-HDFS-Build/26661/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestBPOfferService/testTrySendErrorReportWhenNNThrowsIOException/
> https://builds.apache.org/job/PreCommit-HDFS-Build/26644/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestBPOfferService/testTrySendErrorReportWhenNNThrowsIOException/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14445) TestTrySendErrorReportWhenNNThrowsIOException fails in trunk

2019-04-22 Thread JIRA


[ 
https://issues.apache.org/jira/browse/HDFS-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823428#comment-16823428
 ] 

Íñigo Goiri commented on HDFS-14445:


Yetus reported failed unit tests but the report shows none.
Anyway this patch shouldn't have any impact on any other unit test.
+1 on [^HDFS-14445-02.patch].

> TestTrySendErrorReportWhenNNThrowsIOException fails in trunk
> 
>
> Key: HDFS-14445
> URL: https://issues.apache.org/jira/browse/HDFS-14445
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
> Attachments: HDFS-14445-01.patch, HDFS-14445-02.patch
>
>
> {noformat}
> Active namenode didn't add the report back to the queue when errorReport 
> threw IOException
> {noformat}
> Reference :::
> https://builds.apache.org/job/PreCommit-HDFS-Build/26676/testReport/org.apache.hadoop.hdfs.server.datanode/TestBPOfferService/testTrySendErrorReportWhenNNThrowsIOException/
> https://builds.apache.org/job/PreCommit-HDFS-Build/26662/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestBPOfferService/testTrySendErrorReportWhenNNThrowsIOException/
> https://builds.apache.org/job/PreCommit-HDFS-Build/26661/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestBPOfferService/testTrySendErrorReportWhenNNThrowsIOException/
> https://builds.apache.org/job/PreCommit-HDFS-Build/26644/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestBPOfferService/testTrySendErrorReportWhenNNThrowsIOException/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1452) All chunks should happen to a single file for a block in datanode

2019-04-22 Thread Anu Engineer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823407#comment-16823407
 ] 

Anu Engineer commented on HDDS-1452:


{quote}Why impose such a requirement? Doing EC of 1 KB files is probably a 
terrible idea even from the perspective of disk usage
{quote}
We will not be doing EC on the 1KB files, we will be doing EC at the data file. 
That is, if you store lots of data into a data file, we can EC that file. It is 
irrelevant if what the size is from the Ozone point of view. HDDS can do the EC 
at the data files level, and be completely independent of the sizes in question.

 

Now if we have 1GB of data – so some arbitrary number that is large, then EC 
makes sense. This is one of the advantages of Ozone over HDFS.

> All chunks should happen to a single file for a block in datanode
> -
>
> Key: HDDS-1452
> URL: https://issues.apache.org/jira/browse/HDDS-1452
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.5.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.5.0
>
>
> Currently, all chunks of a block happen to individual chunk files in 
> datanode. This idea here is to write all individual chunks to a single file 
> in datanode.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1452) All chunks should happen to a single file for a block in datanode

2019-04-22 Thread Arpit Agarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823404#comment-16823404
 ] 

Arpit Agarwal commented on HDDS-1452:
-

bq. In the ozone world, it should not matter. Especially since we plan to EC at 
the level of data or containers. The actual EC would not work on RockDB, but it 
should work on all containers, irrespective of the data size of the actual keys.
Why impose such a requirement? Doing EC of 1 KB files is probably a terrible 
idea even from the perspective of disk usage.

> All chunks should happen to a single file for a block in datanode
> -
>
> Key: HDDS-1452
> URL: https://issues.apache.org/jira/browse/HDDS-1452
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.5.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.5.0
>
>
> Currently, all chunks of a block happen to individual chunk files in 
> datanode. This idea here is to write all individual chunks to a single file 
> in datanode.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14435) ObserverReadProxyProvider is unable to properly fetch HAState from Standby NNs

2019-04-22 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823381#comment-16823381
 ] 

Chao Sun commented on HDFS-14435:
-

+1 from me as well!

> ObserverReadProxyProvider is unable to properly fetch HAState from Standby NNs
> --
>
> Key: HDFS-14435
> URL: https://issues.apache.org/jira/browse/HDFS-14435
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, nn
>Affects Versions: 3.3.0
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Attachments: HDFS-14435.000.patch, HDFS-14435.001.patch, 
> HDFS-14435.002.patch, HDFS-14435.003.patch, HDFS-14435.004.patch, 
> HDFS-14435.005.patch, dt_stack_trace.png
>
>
> We have been seeing issues during testing of the Consistent Read from Standby 
> feature that indicate that ORPP is unable to call {{getHAServiceState}} on 
> Standby NNs, as they are rejected with a {{StandbyException}}. Upon further 
> investigation, we realized that although the Standby allows the 
> {{getHAServiceState()}} call, reading a delegation token is not allowed in 
> Standby state, thus the call will fail when using DT-based authentication. 
> This hasn't caused issues in practice, since ORPP assumes that the state is 
> Standby if it is unable to fetch the state, but we should fix the logic to 
> properly handle this scenario.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1452) All chunks should happen to a single file for a block in datanode

2019-04-22 Thread Anu Engineer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823378#comment-16823378
 ] 

Anu Engineer commented on HDDS-1452:


{quote}If a container is full of 1KB files it may not be a good candidate for 
Erasure Coding. If your entire 
{quote}
In the ozone world, it should not matter. Especially since we plan to EC at the 
level of data or containers. The actual EC would not work on RockDB, but it 
should work on all containers, irrespective of the data size of the actual keys.
{quote}cluster is full of 1KB files then we have other serious problems, of 
course.
{quote}
Hopefully, Ozone will be just be able to handle this scenario, we might need 
many Ozone Managers, but a single SCM and few data nodes. I am not advising 
this model, but it is something that I am sure we will run into eventually, 
especially since we are an object store; the HDFS use case is different; but in 
the ozone world I think we will have to be prepared for this eventuality.

> All chunks should happen to a single file for a block in datanode
> -
>
> Key: HDDS-1452
> URL: https://issues.apache.org/jira/browse/HDDS-1452
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.5.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.5.0
>
>
> Currently, all chunks of a block happen to individual chunk files in 
> datanode. This idea here is to write all individual chunks to a single file 
> in datanode.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14435) ObserverReadProxyProvider is unable to properly fetch HAState from Standby NNs

2019-04-22 Thread Chen Liang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823376#comment-16823376
 ] 

Chen Liang edited comment on HDFS-14435 at 4/22/19 7:38 PM:


[~xkrogen] sorry for the late response... +1 from me. Feel free to commit this 
yourself. :)


was (Author: vagarychen):
[~xkrogen] sorry for the late response... +1 from me. Feel free to commit 
yourself. :)

> ObserverReadProxyProvider is unable to properly fetch HAState from Standby NNs
> --
>
> Key: HDFS-14435
> URL: https://issues.apache.org/jira/browse/HDFS-14435
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, nn
>Affects Versions: 3.3.0
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Attachments: HDFS-14435.000.patch, HDFS-14435.001.patch, 
> HDFS-14435.002.patch, HDFS-14435.003.patch, HDFS-14435.004.patch, 
> HDFS-14435.005.patch, dt_stack_trace.png
>
>
> We have been seeing issues during testing of the Consistent Read from Standby 
> feature that indicate that ORPP is unable to call {{getHAServiceState}} on 
> Standby NNs, as they are rejected with a {{StandbyException}}. Upon further 
> investigation, we realized that although the Standby allows the 
> {{getHAServiceState()}} call, reading a delegation token is not allowed in 
> Standby state, thus the call will fail when using DT-based authentication. 
> This hasn't caused issues in practice, since ORPP assumes that the state is 
> Standby if it is unable to fetch the state, but we should fix the logic to 
> properly handle this scenario.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14435) ObserverReadProxyProvider is unable to properly fetch HAState from Standby NNs

2019-04-22 Thread Chen Liang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823376#comment-16823376
 ] 

Chen Liang commented on HDFS-14435:
---

[~xkrogen] sorry for the late response... +1 from me. Feel free to commit 
yourself. :)

> ObserverReadProxyProvider is unable to properly fetch HAState from Standby NNs
> --
>
> Key: HDFS-14435
> URL: https://issues.apache.org/jira/browse/HDFS-14435
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, nn
>Affects Versions: 3.3.0
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Attachments: HDFS-14435.000.patch, HDFS-14435.001.patch, 
> HDFS-14435.002.patch, HDFS-14435.003.patch, HDFS-14435.004.patch, 
> HDFS-14435.005.patch, dt_stack_trace.png
>
>
> We have been seeing issues during testing of the Consistent Read from Standby 
> feature that indicate that ORPP is unable to call {{getHAServiceState}} on 
> Standby NNs, as they are rejected with a {{StandbyException}}. Upon further 
> investigation, we realized that although the Standby allows the 
> {{getHAServiceState()}} call, reading a delegation token is not allowed in 
> Standby state, thus the call will fail when using DT-based authentication. 
> This hasn't caused issues in practice, since ORPP assumes that the state is 
> Standby if it is unable to fetch the state, but we should fix the logic to 
> properly handle this scenario.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1452) All chunks should happen to a single file for a block in datanode

2019-04-22 Thread Arpit Agarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823375#comment-16823375
 ] 

Arpit Agarwal commented on HDDS-1452:
-

Yeah we will have to benchmark it.

If a container is full of 1KB files it may not be a good candidate for Erasure 
Coding. If your entire cluster is full of 1KB files then we have other serious 
problems, of course.

The one downside of putting multiple blocks in the same file (can we call it a 
superblock?) is that deletes become harder. We will need to do some kind of 
background GC/compaction of the superblocks.

> All chunks should happen to a single file for a block in datanode
> -
>
> Key: HDDS-1452
> URL: https://issues.apache.org/jira/browse/HDDS-1452
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.5.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.5.0
>
>
> Currently, all chunks of a block happen to individual chunk files in 
> datanode. This idea here is to write all individual chunks to a single file 
> in datanode.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1452) All chunks should happen to a single file for a block in datanode

2019-04-22 Thread Anu Engineer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823374#comment-16823374
 ] 

Anu Engineer commented on HDDS-1452:


Possibly; I don't know what would be a good option; one single large file or 
RocksDB. Either way, when we do this, we need to make sure that it is not a 
single block to a single file mapping that exists. It is better to have the 
ability to control the data size of the files.

One down side with the 1KB files in RockDB is the erasure coding might become 
harder, since we can take a closed container and erasure code all data files 
and leave metadata in RockDB with erasure coding. That is my only concern with 
leaving 1 KB inside RockDB; and also we will have to benchmark how it will work 
out.

 

> All chunks should happen to a single file for a block in datanode
> -
>
> Key: HDDS-1452
> URL: https://issues.apache.org/jira/browse/HDDS-1452
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.5.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.5.0
>
>
> Currently, all chunks of a block happen to individual chunk files in 
> datanode. This idea here is to write all individual chunks to a single file 
> in datanode.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14435) ObserverReadProxyProvider is unable to properly fetch HAState from Standby NNs

2019-04-22 Thread Erik Krogen (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823373#comment-16823373
 ] 

Erik Krogen edited comment on HDFS-14435 at 4/22/19 7:33 PM:
-

Hey [~vagarychen], [~elgoiri] -- can I get a binding +1 from either of you? 
Want to make sure it's ok for me (or you! :) ) to commit this.


was (Author: xkrogen):
Hey [~csun], [~elgoiri] -- can I get a binding +1 from either of you? Want to 
make sure it's ok for me (or you! :) ) to commit this.

> ObserverReadProxyProvider is unable to properly fetch HAState from Standby NNs
> --
>
> Key: HDFS-14435
> URL: https://issues.apache.org/jira/browse/HDFS-14435
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, nn
>Affects Versions: 3.3.0
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Attachments: HDFS-14435.000.patch, HDFS-14435.001.patch, 
> HDFS-14435.002.patch, HDFS-14435.003.patch, HDFS-14435.004.patch, 
> HDFS-14435.005.patch, dt_stack_trace.png
>
>
> We have been seeing issues during testing of the Consistent Read from Standby 
> feature that indicate that ORPP is unable to call {{getHAServiceState}} on 
> Standby NNs, as they are rejected with a {{StandbyException}}. Upon further 
> investigation, we realized that although the Standby allows the 
> {{getHAServiceState()}} call, reading a delegation token is not allowed in 
> Standby state, thus the call will fail when using DT-based authentication. 
> This hasn't caused issues in practice, since ORPP assumes that the state is 
> Standby if it is unable to fetch the state, but we should fix the logic to 
> properly handle this scenario.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14435) ObserverReadProxyProvider is unable to properly fetch HAState from Standby NNs

2019-04-22 Thread Erik Krogen (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823373#comment-16823373
 ] 

Erik Krogen commented on HDFS-14435:


Hey [~csun], [~elgoiri] -- can I get a binding +1 from either of you? Want to 
make sure it's ok for me (or you! :) ) to commit this.

> ObserverReadProxyProvider is unable to properly fetch HAState from Standby NNs
> --
>
> Key: HDFS-14435
> URL: https://issues.apache.org/jira/browse/HDFS-14435
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, nn
>Affects Versions: 3.3.0
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Attachments: HDFS-14435.000.patch, HDFS-14435.001.patch, 
> HDFS-14435.002.patch, HDFS-14435.003.patch, HDFS-14435.004.patch, 
> HDFS-14435.005.patch, dt_stack_trace.png
>
>
> We have been seeing issues during testing of the Consistent Read from Standby 
> feature that indicate that ORPP is unable to call {{getHAServiceState}} on 
> Standby NNs, as they are rejected with a {{StandbyException}}. Upon further 
> investigation, we realized that although the Standby allows the 
> {{getHAServiceState()}} call, reading a delegation token is not allowed in 
> Standby state, thus the call will fail when using DT-based authentication. 
> This hasn't caused issues in practice, since ORPP assumes that the state is 
> Standby if it is unable to fetch the state, but we should fix the logic to 
> properly handle this scenario.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1452) All chunks should happen to a single file for a block in datanode

2019-04-22 Thread Arpit Agarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823368#comment-16823368
 ] 

Arpit Agarwal commented on HDDS-1452:
-

Thanks for filing this [~shashikant].

1KB files can probably just go into RocksDB!

> All chunks should happen to a single file for a block in datanode
> -
>
> Key: HDDS-1452
> URL: https://issues.apache.org/jira/browse/HDDS-1452
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.5.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.5.0
>
>
> Currently, all chunks of a block happen to individual chunk files in 
> datanode. This idea here is to write all individual chunks to a single file 
> in datanode.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1403) KeyOutputStream writes fails after max retries while writing to a closed container

2019-04-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1403?focusedWorklogId=230851&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-230851
 ]

ASF GitHub Bot logged work on HDDS-1403:


Author: ASF GitHub Bot
Created on: 22/Apr/19 19:21
Start Date: 22/Apr/19 19:21
Worklog Time Spent: 10m 
  Work Description: arp7 commented on pull request #753: HDDS-1403. 
KeyOutputStream writes fails after max retries while writing to a closed 
container
URL: https://github.com/apache/hadoop/pull/753#discussion_r276863308
 
 

 ##
 File path: hadoop-hdds/common/src/main/resources/ozone-default.xml
 ##
 @@ -429,12 +429,20 @@
   
   
 ozone.client.max.retries
-5
+100
 OZONE, CLIENT
 Maximum number of retries by Ozone Client on encountering
   exception while writing a key.
 
   
+  
+ozone.client.retry.interval.ms
 
 Review comment:
   Don't hardcode the unit (ms). We can specify the unit with the config key. 
See Configuration#getTimeDuration.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 230851)
Time Spent: 50m  (was: 40m)

> KeyOutputStream writes fails after max retries while writing to a closed 
> container
> --
>
> Key: HDDS-1403
> URL: https://issues.apache.org/jira/browse/HDDS-1403
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Hanisha Koneru
>Priority: Major
>  Labels: MiniOzoneChaosCluster, pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Currently a Ozone Client retries a write operation 5 times. It is possible 
> that the container being written to is already closed by the time it is 
> written to. The key write will fail after retrying multiple times with this 
> error. This needs to be fixed as this is an internal error.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14445) TestTrySendErrorReportWhenNNThrowsIOException fails in trunk

2019-04-22 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823360#comment-16823360
 ] 

Hadoop QA commented on HDFS-14445:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
23s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
5s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
3m 36s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
50s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 28s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}102m 38s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
35s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}149m 49s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e |
| JIRA Issue | HDFS-14445 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12966637/HDFS-14445-02.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 9d00c30db3b3 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 96e3027 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_191 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/26683/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/26683/testReport/ |
| Max. process+thread count | 3144 (vs. ulimit of 1) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/26683/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> TestTrySendErrorReportWhenNNThrowsIOExcept

[jira] [Commented] (HDDS-1452) All chunks should happen to a single file for a block in datanode

2019-04-22 Thread Anu Engineer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823359#comment-16823359
 ] 

Anu Engineer commented on HDDS-1452:


Just a thought: Would it make sense to write to Data files until they become 
say 1GB? so we can take any chunk write to a file until it is large enough. 
This addresses the uses case where we are writing say 1 KB Ozone Keys. In the 
current proposal, if I write all 1 KB would we end up having 1 KB block files ? 
Just a thought since you are planning to address this issue.

> All chunks should happen to a single file for a block in datanode
> -
>
> Key: HDDS-1452
> URL: https://issues.apache.org/jira/browse/HDDS-1452
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.5.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.5.0
>
>
> Currently, all chunks of a block happen to individual chunk files in 
> datanode. This idea here is to write all individual chunks to a single file 
> in datanode.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1065) OM and DN should persist SCM certificate as the trust root.

2019-04-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1065?focusedWorklogId=230849&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-230849
 ]

ASF GitHub Bot logged work on HDDS-1065:


Author: ASF GitHub Bot
Created on: 22/Apr/19 19:16
Start Date: 22/Apr/19 19:16
Worklog Time Spent: 10m 
  Work Description: xiaoyuyao commented on pull request #754: HDDS-1065. OM 
and DN should persist SCM certificate as the trust root. Contributed by Ajay 
Kumar.
URL: https://github.com/apache/hadoop/pull/754#discussion_r277401501
 
 

 ##
 File path: 
hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/security/x509/certificate/client/DefaultCertificateClient.java
 ##
 @@ -80,6 +80,7 @@
 public abstract class DefaultCertificateClient implements CertificateClient {
 
   private static final String CERT_FILE_NAME_FORMAT = "%s.crt";
+  private static final String CA_CERT_PREFIX = "CA-";
 
 Review comment:
   Can you remind me where do we actually use this root CA certificate in the 
code? I don't see reference in this patch. Should we use it in Block token 
verification?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 230849)
Time Spent: 50m  (was: 40m)

> OM and DN should persist SCM certificate as the trust root.
> ---
>
> Key: HDDS-1065
> URL: https://issues.apache.org/jira/browse/HDDS-1065
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Ajay Kumar
>Assignee: Ajay Kumar
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> OM and DN should persist SCM certificate as the trust root.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1403) KeyOutputStream writes fails after max retries while writing to a closed container

2019-04-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1403?focusedWorklogId=230846&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-230846
 ]

ASF GitHub Bot logged work on HDDS-1403:


Author: ASF GitHub Bot
Created on: 22/Apr/19 19:13
Start Date: 22/Apr/19 19:13
Worklog Time Spent: 10m 
  Work Description: bshashikant commented on issue #753: HDDS-1403. 
KeyOutputStream writes fails after max retries while writing to a closed 
container
URL: https://github.com/apache/hadoop/pull/753#issuecomment-485518604
 
 
   Thanks Hanisha for . The patch adds a retry interval while doing a retry of 
a client write request. But, this may not address the problem holistically, as 
client can still get allocated blocks from a container and while the actual 
write happens to the datanode, the container might get closed. The problem gets 
aggravated if we have large no of preallocated blocks, but client write happens 
much later.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 230846)
Time Spent: 0.5h  (was: 20m)

> KeyOutputStream writes fails after max retries while writing to a closed 
> container
> --
>
> Key: HDDS-1403
> URL: https://issues.apache.org/jira/browse/HDDS-1403
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Hanisha Koneru
>Priority: Major
>  Labels: MiniOzoneChaosCluster, pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently a Ozone Client retries a write operation 5 times. It is possible 
> that the container being written to is already closed by the time it is 
> written to. The key write will fail after retrying multiple times with this 
> error. This needs to be fixed as this is an internal error.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1065) OM and DN should persist SCM certificate as the trust root.

2019-04-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1065?focusedWorklogId=230848&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-230848
 ]

ASF GitHub Bot logged work on HDDS-1065:


Author: ASF GitHub Bot
Created on: 22/Apr/19 19:13
Start Date: 22/Apr/19 19:13
Worklog Time Spent: 10m 
  Work Description: xiaoyuyao commented on pull request #754: HDDS-1065. OM 
and DN should persist SCM certificate as the trust root. Contributed by Ajay 
Kumar.
URL: https://github.com/apache/hadoop/pull/754#discussion_r277400775
 
 

 ##
 File path: 
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/HddsDatanodeService.java
 ##
 @@ -268,10 +268,13 @@ private void getSCMSignedCert(OzoneConfiguration config) 
{
 
   String pemEncodedCert = secureScmClient.getDataNodeCertificate(
   datanodeDetails.getProtoBufMessage(), getEncodedString(csr));
-  dnCertClient.storeCertificate(pemEncodedCert, true);
+  dnCertClient.storeCertificate(pemEncodedCert, true, false);
   datanodeDetails.setCertSerialId(getX509Certificate(pemEncodedCert).
   getSerialNumber().toString());
   persistDatanodeDetails(datanodeDetails);
+  // Get SCM CA certificate and store it in filesystem.
+  String pemEncodedRootCert = secureScmClient.getCACertificate();
 
 Review comment:
   Should we get the CA certificate based on the DN certificate signed by CA? 
Does that container a signer certificate id?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 230848)
Time Spent: 40m  (was: 0.5h)

> OM and DN should persist SCM certificate as the trust root.
> ---
>
> Key: HDDS-1065
> URL: https://issues.apache.org/jira/browse/HDDS-1065
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Ajay Kumar
>Assignee: Ajay Kumar
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> OM and DN should persist SCM certificate as the trust root.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1403) KeyOutputStream writes fails after max retries while writing to a closed container

2019-04-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1403?focusedWorklogId=230847&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-230847
 ]

ASF GitHub Bot logged work on HDDS-1403:


Author: ASF GitHub Bot
Created on: 22/Apr/19 19:13
Start Date: 22/Apr/19 19:13
Worklog Time Spent: 10m 
  Work Description: bshashikant commented on issue #753: HDDS-1403. 
KeyOutputStream writes fails after max retries while writing to a closed 
container
URL: https://github.com/apache/hadoop/pull/753#issuecomment-485518604
 
 
   Thanks Hanisha for updating the patch. The patch adds a retry interval while 
doing a retry of a client write request. But, this may not address the problem 
holistically, as client can still get allocated blocks from a container and 
while the actual write happens to the datanode, the container might get closed. 
The problem gets aggravated if we have large no of preallocated blocks, but 
client write happens much later.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 230847)
Time Spent: 40m  (was: 0.5h)

> KeyOutputStream writes fails after max retries while writing to a closed 
> container
> --
>
> Key: HDDS-1403
> URL: https://issues.apache.org/jira/browse/HDDS-1403
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Hanisha Koneru
>Priority: Major
>  Labels: MiniOzoneChaosCluster, pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Currently a Ozone Client retries a write operation 5 times. It is possible 
> that the container being written to is already closed by the time it is 
> written to. The key write will fail after retrying multiple times with this 
> error. This needs to be fixed as this is an internal error.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-1449) JVM Exit in datanode while committing a key

2019-04-22 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee reassigned HDDS-1449:
-

Assignee: Shashikant Banerjee

> JVM Exit in datanode while committing a key
> ---
>
> Key: HDDS-1449
> URL: https://issues.apache.org/jira/browse/HDDS-1449
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.3.0
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: MiniOzoneChaosCluster
> Attachments: 2019-04-22--20-23-56-IST.MiniOzoneChaosCluster.log, 
> hs_err_pid67466.log
>
>
> Saw the following trace in MiniOzoneChaosCluster run.
> {code}
> C  [librocksdbjni17271331491728127.jnilib+0x9755c]  
> Java_org_rocksdb_RocksDB_write0+0x1c
> J 13917  org.rocksdb.RocksDB.write0(JJJ)V (0 bytes) @ 0x0001102ff62e 
> [0x0001102ff580+0xae]
> J 17167 C2 
> org.apache.hadoop.utils.RocksDBStore.writeBatch(Lorg/apache/hadoop/utils/BatchOperation;)V
>  (260 bytes) @ 0x000111bbd01c [0x000111bbcde0+0x23c]
> J 20434 C1 
> org.apache.hadoop.ozone.container.keyvalue.impl.BlockManagerImpl.putBlock(Lorg/apache/hadoop/ozone/container/common/interfaces/Container;Lorg/apache/hadoop/ozone/container/common/helpers/BlockData;)J
>  (261 bytes) @ 0x000111c267ac [0x000111c25640+0x116c]
> J 19262 C2 
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(Lorg/apache/hadoop/hdds/protocol/datanode/proto/ContainerProtos$ContainerCommandRequestProto;Lorg/apache/hadoop/ozone/container/common/transport/server/ratis/DispatcherContext;)Lorg/apache/hadoop/hdds/protocol/datanode/proto/ContainerProtos$ContainerCommandResponseProto;
>  (866 bytes) @ 0x0001125c5aa0 [0x0001125c1560+0x4540]
> J 15095 C2 
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(Lorg/apache/hadoop/hdds/protocol/datanode/proto/ContainerProtos$ContainerCommandRequestProto;Lorg/apache/hadoop/ozone/container/common/transport/server/ratis/DispatcherContext;)Lorg/apache/hadoop/hdds/protocol/datanode/proto/ContainerProtos$ContainerCommandResponseProto;
>  (142 bytes) @ 0x000110ffc940 [0x000110ffc0c0+0x880]
> J 19301 C2 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatchCommand(Lorg/apache/hadoop/hdds/protocol/datanode/proto/ContainerProtos$ContainerCommandRequestProto;Lorg/apache/hadoop/ozone/container/common/transport/server/ratis/DispatcherContext;)Lorg/apache/hadoop/hdds/protocol/datanode/proto/ContainerProtos$ContainerCommandResponseProto;
>  (146 bytes) @ 0x000111396144 [0x000111395e60+0x2e4]
> J 15997 C2 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine$$Lambda$776.get()Ljava/lang/Object;
>  (16 bytes) @ 0x000110138e54 [0x000110138d80+0xd4]
> J 15970 C2 java.util.concurrent.CompletableFuture$AsyncSupply.run()V (61 
> bytes) @ 0x00010fc80094 [0x00010fc8+0x94]
> J 17368 C2 
> java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V
>  (225 bytes) @ 0x000110b0a7a0 [0x000110b0a5a0+0x200]
> J 7389 C1 java.util.concurrent.ThreadPoolExecutor$Worker.run()V (9 bytes) @ 
> 0x00011012a004 [0x000110129f00+0x104]
> J 6837 C1 java.lang.Thread.run()V (17 bytes) @ 0x00011002b144 
> [0x00011002b000+0x144]
> v  ~StubRoutines::call_stub
> V  [libjvm.dylib+0x2ef1f6]  JavaCalls::call_helper(JavaValue*, methodHandle*, 
> JavaCallArguments*, Thread*)+0x6ae
> V  [libjvm.dylib+0x2ef99a]  JavaCalls::call_virtual(JavaValue*, KlassHandle, 
> Symbol*, Symbol*, JavaCallArguments*, Thread*)+0x164
> V  [libjvm.dylib+0x2efb46]  JavaCalls::call_virtual(JavaValue*, Handle, 
> KlassHandle, Symbol*, Symbol*, Thread*)+0x4a
> V  [libjvm.dylib+0x34a46d]  thread_entry(JavaThread*, Thread*)+0x7c
> V  [libjvm.dylib+0x56eb0f]  JavaThread::thread_main_inner()+0x9b
> V  [libjvm.dylib+0x57020a]  JavaThread::run()+0x1c2
> V  [libjvm.dylib+0x48d4a6]  java_start(Thread*)+0xf6
> C  [libsystem_pthread.dylib+0x3305]  _pthread_body+0x7e
> C  [libsystem_pthread.dylib+0x626f]  _pthread_start+0x46
> C  [libsystem_pthread.dylib+0x2415]  thread_start+0xd
> C  0x
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1368) Cleanup old ReplicationManager code from SCM

2019-04-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1368?focusedWorklogId=230833&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-230833
 ]

ASF GitHub Bot logged work on HDDS-1368:


Author: ASF GitHub Bot
Created on: 22/Apr/19 18:52
Start Date: 22/Apr/19 18:52
Worklog Time Spent: 10m 
  Work Description: nandakumar131 commented on issue #711: HDDS-1368. 
Cleanup old ReplicationManager code from SCM.
URL: https://github.com/apache/hadoop/pull/711#issuecomment-485511736
 
 
   /retest
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 230833)
Time Spent: 2h  (was: 1h 50m)

> Cleanup old ReplicationManager code from SCM
> 
>
> Key: HDDS-1368
> URL: https://issues.apache.org/jira/browse/HDDS-1368
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: SCM
>Reporter: Nanda kumar
>Assignee: Nanda kumar
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> HDDS-1205 brings in new ReplicationManager and HDDS-1207 plugs in the new 
> code, this jira is for removing the old ReplicationManager and related code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-1452) All chunks should happen to a single file for a block in datanode

2019-04-22 Thread Shashikant Banerjee (JIRA)
Shashikant Banerjee created HDDS-1452:
-

 Summary: All chunks should happen to a single file for a block in 
datanode
 Key: HDDS-1452
 URL: https://issues.apache.org/jira/browse/HDDS-1452
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Datanode
Affects Versions: 0.5.0
Reporter: Shashikant Banerjee
Assignee: Shashikant Banerjee
 Fix For: 0.5.0


Currently, all chunks of a block happen to individual chunk files in datanode. 
This idea here is to write all individual chunks to a single file in datanode.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1425) Ozone compose files are not compatible with the latest docker-compose

2019-04-22 Thread Arpit Agarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823343#comment-16823343
 ] 

Arpit Agarwal commented on HDDS-1425:
-

Thanks [~anu]. Updating the fix version to 0.4.0 since it looks like we will be 
rolling 0.4.0 RC1.

> Ozone compose files are not compatible with the latest docker-compose
> -
>
> Key: HDDS-1425
> URL: https://issues.apache.org/jira/browse/HDDS-1425
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.4.0, 0.5.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> I upgraded my docker-compose to the latest available one (1.24.0)
> But after the upgrade I can't start the docker-compose based cluster any more:
> {code}
> ./test.sh 
> -
> Executing test(s): [basic]
>   Cluster type:  ozone
>   Compose file:  
> /home/elek/projects/hadoop-review/hadoop-ozone/dist/target/ozone-0.4.0-SNAPSHOT/smoketest/../compose/ozone/docker-compose.yaml
>   Output dir:
> /home/elek/projects/hadoop-review/hadoop-ozone/dist/target/ozone-0.4.0-SNAPSHOT/smoketest/result
>   Command to rerun:  ./test.sh --keep --env ozone basic
> -
> ERROR: In file 
> /home/elek/projects/hadoop-review/hadoop-ozone/dist/target/ozone-0.4.0-SNAPSHOT/compose/ozone/docker-config:
>  environment variable name 'LOG4J2.PROPERTIES_appender.rolling.file 
> {code}
> It turned out that the line of LOG4J2.PROPERTIES_appender.rolling.file 
> contains an unnecessary space which is not accepted by the latest 
> docker-compose any more.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1425) Ozone compose files are not compatible with the latest docker-compose

2019-04-22 Thread Arpit Agarwal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDDS-1425:

Fix Version/s: (was: 0.4.1)
   0.4.0

> Ozone compose files are not compatible with the latest docker-compose
> -
>
> Key: HDDS-1425
> URL: https://issues.apache.org/jira/browse/HDDS-1425
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.4.0, 0.5.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> I upgraded my docker-compose to the latest available one (1.24.0)
> But after the upgrade I can't start the docker-compose based cluster any more:
> {code}
> ./test.sh 
> -
> Executing test(s): [basic]
>   Cluster type:  ozone
>   Compose file:  
> /home/elek/projects/hadoop-review/hadoop-ozone/dist/target/ozone-0.4.0-SNAPSHOT/smoketest/../compose/ozone/docker-compose.yaml
>   Output dir:
> /home/elek/projects/hadoop-review/hadoop-ozone/dist/target/ozone-0.4.0-SNAPSHOT/smoketest/result
>   Command to rerun:  ./test.sh --keep --env ozone basic
> -
> ERROR: In file 
> /home/elek/projects/hadoop-review/hadoop-ozone/dist/target/ozone-0.4.0-SNAPSHOT/compose/ozone/docker-config:
>  environment variable name 'LOG4J2.PROPERTIES_appender.rolling.file 
> {code}
> It turned out that the line of LOG4J2.PROPERTIES_appender.rolling.file 
> contains an unnecessary space which is not accepted by the latest 
> docker-compose any more.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1450) Fix nightly run failures after HDDS-976

2019-04-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1450?focusedWorklogId=230822&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-230822
 ]

ASF GitHub Bot logged work on HDDS-1450:


Author: ASF GitHub Bot
Created on: 22/Apr/19 18:28
Start Date: 22/Apr/19 18:28
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on issue #757: HDDS-1450. Fix 
nightly run failures after HDDS-976. Contributed by Xi…
URL: https://github.com/apache/hadoop/pull/757#issuecomment-485504373
 
 
   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | 0 | reexec | 26 | Docker mode activated. |
   ||| _ Prechecks _ |
   | +1 | @author | 0 | The patch does not contain any @author tags. |
   | -1 | test4tests | 0 | The patch doesn't appear to include any new or 
modified tests.  Please justify why no new tests are needed for this patch. 
Also please list what manual steps were performed to verify this patch. |
   ||| _ trunk Compile Tests _ |
   | +1 | mvninstall | 1023 | trunk passed |
   | +1 | compile | 44 | trunk passed |
   | +1 | checkstyle | 23 | trunk passed |
   | +1 | mvnsite | 41 | trunk passed |
   | +1 | shadedclient | 748 | branch has no errors when building and testing 
our client artifacts. |
   | +1 | findbugs | 69 | trunk passed |
   | +1 | javadoc | 44 | trunk passed |
   ||| _ Patch Compile Tests _ |
   | +1 | mvninstall | 40 | the patch passed |
   | +1 | compile | 33 | the patch passed |
   | +1 | javac | 33 | the patch passed |
   | +1 | checkstyle | 16 | the patch passed |
   | +1 | mvnsite | 34 | the patch passed |
   | +1 | whitespace | 1 | The patch has no whitespace issues. |
   | +1 | xml | 1 | The patch has no ill-formed XML file. |
   | +1 | shadedclient | 751 | patch has no errors when building and testing 
our client artifacts. |
   | +1 | findbugs | 77 | the patch passed |
   | +1 | javadoc | 37 | the patch passed |
   ||| _ Other Tests _ |
   | +1 | unit | 67 | common in the patch passed. |
   | +1 | asflicense | 29 | The patch does not generate ASF License warnings. |
   | | | 3187 | |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | Client=17.05.0-ce Server=17.05.0-ce base: 
https://builds.apache.org/job/hadoop-multibranch/job/PR-757/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/757 |
   | Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall 
 mvnsite  unit  shadedclient  findbugs  checkstyle  xml  |
   | uname | Linux 4814bfbb092b 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 
10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | personality/hadoop.sh |
   | git revision | trunk / 96e3027 |
   | maven | version: Apache Maven 3.3.9 |
   | Default Java | 1.8.0_191 |
   | findbugs | v3.1.0-RC1 |
   |  Test Results | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-757/1/testReport/ |
   | Max. process+thread count | 444 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdds/common U: hadoop-hdds/common |
   | Console output | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-757/1/console |
   | Powered by | Apache Yetus 0.9.0 http://yetus.apache.org |
   
   
   This message was automatically generated.
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 230822)
Time Spent: 0.5h  (was: 20m)

> Fix nightly run failures after HDDS-976
> ---
>
> Key: HDDS-1450
> URL: https://issues.apache.org/jira/browse/HDDS-1450
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> [https://ci.anzix.net/job/ozone-nightly/72/testReport/]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-1425) Ozone compose files are not compatible with the latest docker-compose

2019-04-22 Thread Anu Engineer (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-1425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823276#comment-16823276
 ] 

Anu Engineer commented on HDDS-1425:


we plan to release 0.4.1 and 0.4.2 soon ... So this is helpful. Thanks for the 
patch and commit.

 

> Ozone compose files are not compatible with the latest docker-compose
> -
>
> Key: HDDS-1425
> URL: https://issues.apache.org/jira/browse/HDDS-1425
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.5.0, 0.4.1
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> I upgraded my docker-compose to the latest available one (1.24.0)
> But after the upgrade I can't start the docker-compose based cluster any more:
> {code}
> ./test.sh 
> -
> Executing test(s): [basic]
>   Cluster type:  ozone
>   Compose file:  
> /home/elek/projects/hadoop-review/hadoop-ozone/dist/target/ozone-0.4.0-SNAPSHOT/smoketest/../compose/ozone/docker-compose.yaml
>   Output dir:
> /home/elek/projects/hadoop-review/hadoop-ozone/dist/target/ozone-0.4.0-SNAPSHOT/smoketest/result
>   Command to rerun:  ./test.sh --keep --env ozone basic
> -
> ERROR: In file 
> /home/elek/projects/hadoop-review/hadoop-ozone/dist/target/ozone-0.4.0-SNAPSHOT/compose/ozone/docker-config:
>  environment variable name 'LOG4J2.PROPERTIES_appender.rolling.file 
> {code}
> It turned out that the line of LOG4J2.PROPERTIES_appender.rolling.file 
> contains an unnecessary space which is not accepted by the latest 
> docker-compose any more.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14374) Expose total number of delegation tokens in AbstractDelegationTokenSecretManager

2019-04-22 Thread CR Hota (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823274#comment-16823274
 ] 

CR Hota commented on HDFS-14374:


[~elgoiri]  [~hexiaoqiao]

This is a fairly harmless and but important change for specific apps to pick up 
and report metrics. Could we commit HDFS-14374.002.patch and rebase router 
branch ? Want to make sure we have this in time to get ready for 3.3 release.

> Expose total number of delegation tokens in 
> AbstractDelegationTokenSecretManager
> 
>
> Key: HDFS-14374
> URL: https://issues.apache.org/jira/browse/HDFS-14374
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: CR Hota
>Assignee: CR Hota
>Priority: Major
> Attachments: HDFS-14374.001.patch, HDFS-14374.002.patch
>
>
> AbstractDelegationTokenSecretManager should expose total number of active 
> delegation tokens for specific implementations to track for observability.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1450) Fix nightly run failures after HDDS-976

2019-04-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1450?focusedWorklogId=230799&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-230799
 ]

ASF GitHub Bot logged work on HDDS-1450:


Author: ASF GitHub Bot
Created on: 22/Apr/19 17:36
Start Date: 22/Apr/19 17:36
Worklog Time Spent: 10m 
  Work Description: xiaoyuyao commented on issue #757: HDDS-1450. Fix 
nightly run failures after HDDS-976. Contributed by Xi…
URL: https://github.com/apache/hadoop/pull/757#issuecomment-485487701
 
 
   Remove the unnecessary configuration key 
"ozone.scm.network.topology.schema.file.type" and determine the type of schema 
based on the file extension of existing key 
"ozone.scm.network.topology.schema.file".
   
   cc: @cjjnjust 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 230799)
Time Spent: 20m  (was: 10m)

> Fix nightly run failures after HDDS-976
> ---
>
> Key: HDDS-1450
> URL: https://issues.apache.org/jira/browse/HDDS-1450
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> [https://ci.anzix.net/job/ozone-nightly/72/testReport/]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1450) Fix nightly run failures after HDDS-976

2019-04-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-1450:
-
Labels: pull-request-available  (was: )

> Fix nightly run failures after HDDS-976
> ---
>
> Key: HDDS-1450
> URL: https://issues.apache.org/jira/browse/HDDS-1450
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Minor
>  Labels: pull-request-available
>
> [https://ci.anzix.net/job/ozone-nightly/72/testReport/]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1450) Fix nightly run failures after HDDS-976

2019-04-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1450?focusedWorklogId=230798&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-230798
 ]

ASF GitHub Bot logged work on HDDS-1450:


Author: ASF GitHub Bot
Created on: 22/Apr/19 17:33
Start Date: 22/Apr/19 17:33
Worklog Time Spent: 10m 
  Work Description: xiaoyuyao commented on pull request #757: HDDS-1450. 
Fix nightly run failures after HDDS-976. Contributed by Xi…
URL: https://github.com/apache/hadoop/pull/757
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 230798)
Time Spent: 10m
Remaining Estimate: 0h

> Fix nightly run failures after HDDS-976
> ---
>
> Key: HDDS-1450
> URL: https://issues.apache.org/jira/browse/HDDS-1450
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> [https://ci.anzix.net/job/ozone-nightly/72/testReport/]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-1451) SCMBlockManager allocates pipelines in cases when the pipeline has already been allocated

2019-04-22 Thread Aravindan Vijayan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aravindan Vijayan reassigned HDDS-1451:
---

Assignee: Aravindan Vijayan

> SCMBlockManager allocates pipelines in cases when the pipeline has already 
> been allocated
> -
>
> Key: HDDS-1451
> URL: https://issues.apache.org/jira/browse/HDDS-1451
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Affects Versions: 0.3.0
>Reporter: Mukul Kumar Singh
>Assignee: Aravindan Vijayan
>Priority: Major
>  Labels: MiniOzoneChaosCluster
>
> SCM BlockManager may try to allocate pipelines in the cases when it is not 
> needed. This happens because BlockManagerImpl#allocateBlock is not lock 
> protected, so multiple pipelines can be allocated from it. One of the 
> pipeline allocation can fail even when one of the existing pipeline already 
> exists.
> {code}
> 2019-04-22 22:34:14,336 INFO  pipeline.RatisPipelineProvider 
> (RatisPipelineProvider.java:lambda$create$1(103)) -  pipeline Pipeline[ Id: 
> 6f4bb2d7-d660-4f9f-bc06-72b10f9a738e, Nodes: 76e1a493-fd55-4d67-9f5
> 5-c04fd6bd3a33{ip: 192.168.0.104, host: 192.168.0.104, certSerialId: 
> null}2b9850b2-aed3-4a40-91b5-2447dc5246bf{ip: 192.168.0.104, host: 
> 192.168.0.104, certSerialId: null}12248721-ea6a-453f-8dad-fc7fbe692f
> d2{ip: 192.168.0.104, host: 192.168.0.104, certSerialId: null}, Type:RATIS, 
> Factor:THREE, State:OPEN]
> 2019-04-22 22:34:14,386 INFO  impl.RoleInfo 
> (RoleInfo.java:shutdownLeaderElection(134)) - 
> e17b7852-4691-40c7-8791-ad0b0da5201f: shutdown LeaderElection
> 2019-04-22 22:34:14,388 INFO  pipeline.RatisPipelineProvider 
> (RatisPipelineProvider.java:lambda$create$1(103)) -  pipeline Pipeline[ Id: 
> 552e28f3-98d9-41f3-86e0-c1b9494838a5, Nodes: e17b7852-4691-40c7-879
> 1-ad0b0da5201f{ip: 192.168.0.104, host: 192.168.0.104, certSerialId: 
> null}fd365bac-e26e-4b11-afd8-9d08cd1b0521{ip: 192.168.0.104, host: 
> 192.168.0.104, certSerialId: null}9583a007-7f02-4074-9e26-19bc18e29e
> c5{ip: 192.168.0.104, host: 192.168.0.104, certSerialId: null}, Type:RATIS, 
> Factor:THREE, State:OPEN]
> 2019-04-22 22:34:14,388 INFO  impl.RoleInfo (RoleInfo.java:updateAndGet(143)) 
> - e17b7852-4691-40c7-8791-ad0b0da5201f: start FollowerState
> 2019-04-22 22:34:14,388 INFO  pipeline.RatisPipelineProvider 
> (RatisPipelineProvider.java:lambda$create$1(103)) -  pipeline Pipeline[ Id: 
> 5383151b-d625-4362-a7dd-c0d353acaf76, Nodes: 80f16ad6-3879-4a64-a3c
> 7-7719813cc139{ip: 192.168.0.104, host: 192.168.0.104, certSerialId: 
> null}082ce481-7fb0-4f88-ac21-82609290a6a2{ip: 192.168.0.104, host: 
> 192.168.0.104, certSerialId: null}dd5f5a70-0217-4577-b7a2-c42aa139d1
> 8a{ip: 192.168.0.104, host: 192.168.0.104, certSerialId: null}, Type:RATIS, 
> Factor:THREE, State:OPEN]
> 2019-04-22 22:34:14,389 INFO  pipeline.RatisPipelineProvider 
> (RatisPipelineProvider.java:lambda$create$1(103)) -  pipeline Pipeline[ Id: 
> be4854e5-7933-4caa-b32e-f482cf500247, Nodes: 6e2356f1-479d-498b-876
> a-1c90623c498b{ip: 192.168.0.104, host: 192.168.0.104, certSerialId: 
> null}8ac46d94-9975-4eea-9448-2618c69d7bf3{ip: 192.168.0.104, host: 
> 192.168.0.104, certSerialId: null}a3ed36a1-44ca-47b2-b9b3-5aeef04595
> 18{ip: 192.168.0.104, host: 192.168.0.104, certSerialId: null}, Type:RATIS, 
> Factor:THREE, State:OPEN]
> 2019-04-22 22:34:14,390 INFO  pipeline.RatisPipelineProvider 
> (RatisPipelineProvider.java:lambda$create$1(103)) -  pipeline Pipeline[ Id: 
> 21e368e2-f82a-4c61-9cc3-06e8de22ea6b, Nodes: 
> 82632040-5754-4122-b187-331879586842{ip: 192.168.0.104, host: 192.168.0.104, 
> certSerialId: null}923c8537-b869-4085-adcb-0a9accdcd089{ip: 192.168.0.104, 
> host: 192.168.0.104, certSerialId: 
> null}c6d790bf-e3a6-4064-acb5-f74796cd38a9{ip: 192.168.0.104, host: 
> 192.168.0.104, certSerialId: null}, Type:RATIS, Factor:THREE, State:OPEN]
> 2019-04-22 22:34:14,390 INFO  pipeline.RatisPipelineProvider 
> (RatisPipelineProvider.java:lambda$create$1(103)) -  pipeline Pipeline[ Id: 
> cccbc2ed-e0e2-4578-a8a2-94f4b645be52, Nodes: 
> 91ae6848-a778-43be-a4a1-5855f7adc0d8{ip: 192.168.0.104, host: 192.168.0.104, 
> certSerialId: null}8f330a03-40e2-4bd1-9b43-5e05b13d89f0{ip: 192.168.0.104, 
> host: 192.168.0.104, certSerialId: 
> null}4f3070dc-650b-48d7-87b5-d2076104e7b4{ip: 192.168.0.104, host: 
> 192.168.0.104, certSerialId: null}, Type:RATIS, Factor:THREE, State:OPEN]
> 2019-04-22 22:34:14,392 ERROR block.BlockManagerImpl 
> (BlockManagerImpl.java:allocateBlock(192)) - Pipeline creation failed for 
> type:RATIS factor:THREE
> org.apache.hadoop.hdds.scm.pipeline.InsufficientDatanodesException: Cannot 
> create pipeline of factor 3 using 2 nodes 20 healthy nodes 20 all nodes.
> at 
> org.apache.hadoop

[jira] [Created] (HDDS-1451) SCMBlockManager allocates pipelines in cases when the pipeline has already been allocated

2019-04-22 Thread Mukul Kumar Singh (JIRA)
Mukul Kumar Singh created HDDS-1451:
---

 Summary: SCMBlockManager allocates pipelines in cases when the 
pipeline has already been allocated
 Key: HDDS-1451
 URL: https://issues.apache.org/jira/browse/HDDS-1451
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: SCM
Affects Versions: 0.3.0
Reporter: Mukul Kumar Singh


SCM BlockManager may try to allocate pipelines in the cases when it is not 
needed. This happens because BlockManagerImpl#allocateBlock is not lock 
protected, so multiple pipelines can be allocated from it. One of the pipeline 
allocation can fail even when one of the existing pipeline already exists.


{code}
2019-04-22 22:34:14,336 INFO  pipeline.RatisPipelineProvider 
(RatisPipelineProvider.java:lambda$create$1(103)) -  pipeline Pipeline[ Id: 
6f4bb2d7-d660-4f9f-bc06-72b10f9a738e, Nodes: 76e1a493-fd55-4d67-9f5
5-c04fd6bd3a33{ip: 192.168.0.104, host: 192.168.0.104, certSerialId: 
null}2b9850b2-aed3-4a40-91b5-2447dc5246bf{ip: 192.168.0.104, host: 
192.168.0.104, certSerialId: null}12248721-ea6a-453f-8dad-fc7fbe692f
d2{ip: 192.168.0.104, host: 192.168.0.104, certSerialId: null}, Type:RATIS, 
Factor:THREE, State:OPEN]
2019-04-22 22:34:14,386 INFO  impl.RoleInfo 
(RoleInfo.java:shutdownLeaderElection(134)) - 
e17b7852-4691-40c7-8791-ad0b0da5201f: shutdown LeaderElection
2019-04-22 22:34:14,388 INFO  pipeline.RatisPipelineProvider 
(RatisPipelineProvider.java:lambda$create$1(103)) -  pipeline Pipeline[ Id: 
552e28f3-98d9-41f3-86e0-c1b9494838a5, Nodes: e17b7852-4691-40c7-879
1-ad0b0da5201f{ip: 192.168.0.104, host: 192.168.0.104, certSerialId: 
null}fd365bac-e26e-4b11-afd8-9d08cd1b0521{ip: 192.168.0.104, host: 
192.168.0.104, certSerialId: null}9583a007-7f02-4074-9e26-19bc18e29e
c5{ip: 192.168.0.104, host: 192.168.0.104, certSerialId: null}, Type:RATIS, 
Factor:THREE, State:OPEN]
2019-04-22 22:34:14,388 INFO  impl.RoleInfo (RoleInfo.java:updateAndGet(143)) - 
e17b7852-4691-40c7-8791-ad0b0da5201f: start FollowerState
2019-04-22 22:34:14,388 INFO  pipeline.RatisPipelineProvider 
(RatisPipelineProvider.java:lambda$create$1(103)) -  pipeline Pipeline[ Id: 
5383151b-d625-4362-a7dd-c0d353acaf76, Nodes: 80f16ad6-3879-4a64-a3c
7-7719813cc139{ip: 192.168.0.104, host: 192.168.0.104, certSerialId: 
null}082ce481-7fb0-4f88-ac21-82609290a6a2{ip: 192.168.0.104, host: 
192.168.0.104, certSerialId: null}dd5f5a70-0217-4577-b7a2-c42aa139d1
8a{ip: 192.168.0.104, host: 192.168.0.104, certSerialId: null}, Type:RATIS, 
Factor:THREE, State:OPEN]
2019-04-22 22:34:14,389 INFO  pipeline.RatisPipelineProvider 
(RatisPipelineProvider.java:lambda$create$1(103)) -  pipeline Pipeline[ Id: 
be4854e5-7933-4caa-b32e-f482cf500247, Nodes: 6e2356f1-479d-498b-876
a-1c90623c498b{ip: 192.168.0.104, host: 192.168.0.104, certSerialId: 
null}8ac46d94-9975-4eea-9448-2618c69d7bf3{ip: 192.168.0.104, host: 
192.168.0.104, certSerialId: null}a3ed36a1-44ca-47b2-b9b3-5aeef04595
18{ip: 192.168.0.104, host: 192.168.0.104, certSerialId: null}, Type:RATIS, 
Factor:THREE, State:OPEN]
2019-04-22 22:34:14,390 INFO  pipeline.RatisPipelineProvider 
(RatisPipelineProvider.java:lambda$create$1(103)) -  pipeline Pipeline[ Id: 
21e368e2-f82a-4c61-9cc3-06e8de22ea6b, Nodes: 
82632040-5754-4122-b187-331879586842{ip: 192.168.0.104, host: 192.168.0.104, 
certSerialId: null}923c8537-b869-4085-adcb-0a9accdcd089{ip: 192.168.0.104, 
host: 192.168.0.104, certSerialId: 
null}c6d790bf-e3a6-4064-acb5-f74796cd38a9{ip: 192.168.0.104, host: 
192.168.0.104, certSerialId: null}, Type:RATIS, Factor:THREE, State:OPEN]
2019-04-22 22:34:14,390 INFO  pipeline.RatisPipelineProvider 
(RatisPipelineProvider.java:lambda$create$1(103)) -  pipeline Pipeline[ Id: 
cccbc2ed-e0e2-4578-a8a2-94f4b645be52, Nodes: 
91ae6848-a778-43be-a4a1-5855f7adc0d8{ip: 192.168.0.104, host: 192.168.0.104, 
certSerialId: null}8f330a03-40e2-4bd1-9b43-5e05b13d89f0{ip: 192.168.0.104, 
host: 192.168.0.104, certSerialId: 
null}4f3070dc-650b-48d7-87b5-d2076104e7b4{ip: 192.168.0.104, host: 
192.168.0.104, certSerialId: null}, Type:RATIS, Factor:THREE, State:OPEN]
2019-04-22 22:34:14,392 ERROR block.BlockManagerImpl 
(BlockManagerImpl.java:allocateBlock(192)) - Pipeline creation failed for 
type:RATIS factor:THREE
org.apache.hadoop.hdds.scm.pipeline.InsufficientDatanodesException: Cannot 
create pipeline of factor 3 using 2 nodes 20 healthy nodes 20 all nodes.
at 
org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider.create(RatisPipelineProvider.java:122)
at 
org.apache.hadoop.hdds.scm.pipeline.PipelineFactory.create(PipelineFactory.java:57)
at 
org.apache.hadoop.hdds.scm.pipeline.SCMPipelineManager.createPipeline(SCMPipelineManager.java:148)
at 
org.apache.hadoop.hdds.scm.block.BlockManagerImpl.allocateBlock(BlockManagerImpl.java:190)
at 
org.apache.hadoop.hdds.scm.server.SCMBlockProtocolServer.allocateBlock(SCMBlockProtocolServ

[jira] [Updated] (HDDS-1451) SCMBlockManager allocates pipelines in cases when the pipeline has already been allocated

2019-04-22 Thread Mukul Kumar Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukul Kumar Singh updated HDDS-1451:

Labels: MiniOzoneChaosCluster  (was: )

> SCMBlockManager allocates pipelines in cases when the pipeline has already 
> been allocated
> -
>
> Key: HDDS-1451
> URL: https://issues.apache.org/jira/browse/HDDS-1451
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Affects Versions: 0.3.0
>Reporter: Mukul Kumar Singh
>Priority: Major
>  Labels: MiniOzoneChaosCluster
>
> SCM BlockManager may try to allocate pipelines in the cases when it is not 
> needed. This happens because BlockManagerImpl#allocateBlock is not lock 
> protected, so multiple pipelines can be allocated from it. One of the 
> pipeline allocation can fail even when one of the existing pipeline already 
> exists.
> {code}
> 2019-04-22 22:34:14,336 INFO  pipeline.RatisPipelineProvider 
> (RatisPipelineProvider.java:lambda$create$1(103)) -  pipeline Pipeline[ Id: 
> 6f4bb2d7-d660-4f9f-bc06-72b10f9a738e, Nodes: 76e1a493-fd55-4d67-9f5
> 5-c04fd6bd3a33{ip: 192.168.0.104, host: 192.168.0.104, certSerialId: 
> null}2b9850b2-aed3-4a40-91b5-2447dc5246bf{ip: 192.168.0.104, host: 
> 192.168.0.104, certSerialId: null}12248721-ea6a-453f-8dad-fc7fbe692f
> d2{ip: 192.168.0.104, host: 192.168.0.104, certSerialId: null}, Type:RATIS, 
> Factor:THREE, State:OPEN]
> 2019-04-22 22:34:14,386 INFO  impl.RoleInfo 
> (RoleInfo.java:shutdownLeaderElection(134)) - 
> e17b7852-4691-40c7-8791-ad0b0da5201f: shutdown LeaderElection
> 2019-04-22 22:34:14,388 INFO  pipeline.RatisPipelineProvider 
> (RatisPipelineProvider.java:lambda$create$1(103)) -  pipeline Pipeline[ Id: 
> 552e28f3-98d9-41f3-86e0-c1b9494838a5, Nodes: e17b7852-4691-40c7-879
> 1-ad0b0da5201f{ip: 192.168.0.104, host: 192.168.0.104, certSerialId: 
> null}fd365bac-e26e-4b11-afd8-9d08cd1b0521{ip: 192.168.0.104, host: 
> 192.168.0.104, certSerialId: null}9583a007-7f02-4074-9e26-19bc18e29e
> c5{ip: 192.168.0.104, host: 192.168.0.104, certSerialId: null}, Type:RATIS, 
> Factor:THREE, State:OPEN]
> 2019-04-22 22:34:14,388 INFO  impl.RoleInfo (RoleInfo.java:updateAndGet(143)) 
> - e17b7852-4691-40c7-8791-ad0b0da5201f: start FollowerState
> 2019-04-22 22:34:14,388 INFO  pipeline.RatisPipelineProvider 
> (RatisPipelineProvider.java:lambda$create$1(103)) -  pipeline Pipeline[ Id: 
> 5383151b-d625-4362-a7dd-c0d353acaf76, Nodes: 80f16ad6-3879-4a64-a3c
> 7-7719813cc139{ip: 192.168.0.104, host: 192.168.0.104, certSerialId: 
> null}082ce481-7fb0-4f88-ac21-82609290a6a2{ip: 192.168.0.104, host: 
> 192.168.0.104, certSerialId: null}dd5f5a70-0217-4577-b7a2-c42aa139d1
> 8a{ip: 192.168.0.104, host: 192.168.0.104, certSerialId: null}, Type:RATIS, 
> Factor:THREE, State:OPEN]
> 2019-04-22 22:34:14,389 INFO  pipeline.RatisPipelineProvider 
> (RatisPipelineProvider.java:lambda$create$1(103)) -  pipeline Pipeline[ Id: 
> be4854e5-7933-4caa-b32e-f482cf500247, Nodes: 6e2356f1-479d-498b-876
> a-1c90623c498b{ip: 192.168.0.104, host: 192.168.0.104, certSerialId: 
> null}8ac46d94-9975-4eea-9448-2618c69d7bf3{ip: 192.168.0.104, host: 
> 192.168.0.104, certSerialId: null}a3ed36a1-44ca-47b2-b9b3-5aeef04595
> 18{ip: 192.168.0.104, host: 192.168.0.104, certSerialId: null}, Type:RATIS, 
> Factor:THREE, State:OPEN]
> 2019-04-22 22:34:14,390 INFO  pipeline.RatisPipelineProvider 
> (RatisPipelineProvider.java:lambda$create$1(103)) -  pipeline Pipeline[ Id: 
> 21e368e2-f82a-4c61-9cc3-06e8de22ea6b, Nodes: 
> 82632040-5754-4122-b187-331879586842{ip: 192.168.0.104, host: 192.168.0.104, 
> certSerialId: null}923c8537-b869-4085-adcb-0a9accdcd089{ip: 192.168.0.104, 
> host: 192.168.0.104, certSerialId: 
> null}c6d790bf-e3a6-4064-acb5-f74796cd38a9{ip: 192.168.0.104, host: 
> 192.168.0.104, certSerialId: null}, Type:RATIS, Factor:THREE, State:OPEN]
> 2019-04-22 22:34:14,390 INFO  pipeline.RatisPipelineProvider 
> (RatisPipelineProvider.java:lambda$create$1(103)) -  pipeline Pipeline[ Id: 
> cccbc2ed-e0e2-4578-a8a2-94f4b645be52, Nodes: 
> 91ae6848-a778-43be-a4a1-5855f7adc0d8{ip: 192.168.0.104, host: 192.168.0.104, 
> certSerialId: null}8f330a03-40e2-4bd1-9b43-5e05b13d89f0{ip: 192.168.0.104, 
> host: 192.168.0.104, certSerialId: 
> null}4f3070dc-650b-48d7-87b5-d2076104e7b4{ip: 192.168.0.104, host: 
> 192.168.0.104, certSerialId: null}, Type:RATIS, Factor:THREE, State:OPEN]
> 2019-04-22 22:34:14,392 ERROR block.BlockManagerImpl 
> (BlockManagerImpl.java:allocateBlock(192)) - Pipeline creation failed for 
> type:RATIS factor:THREE
> org.apache.hadoop.hdds.scm.pipeline.InsufficientDatanodesException: Cannot 
> create pipeline of factor 3 using 2 nodes 20 healthy nodes 20 all nodes.
> at 
> org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvi

[jira] [Assigned] (HDDS-1448) RatisPipelineProvider should only consider open pipeline while excluding dn for pipeline allocation

2019-04-22 Thread Aravindan Vijayan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aravindan Vijayan reassigned HDDS-1448:
---

Assignee: Aravindan Vijayan

> RatisPipelineProvider should only consider open pipeline while excluding dn 
> for pipeline allocation
> ---
>
> Key: HDDS-1448
> URL: https://issues.apache.org/jira/browse/HDDS-1448
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Affects Versions: 0.3.0
>Reporter: Mukul Kumar Singh
>Assignee: Aravindan Vijayan
>Priority: Major
>  Labels: MiniOzoneChaosCluster
>
> While allocation pipelines, Ratis pipeline provider considers all the 
> pipelines irrespective of the state of the pipeline. This can lead to case 
> where all the datanodes are up but the pipelines are in closing state in SCM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14445) TestTrySendErrorReportWhenNNThrowsIOException fails in trunk

2019-04-22 Thread Ayush Saxena (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823233#comment-16823233
 ] 

Ayush Saxena commented on HDFS-14445:
-

updated v2 using lambda


> TestTrySendErrorReportWhenNNThrowsIOException fails in trunk
> 
>
> Key: HDFS-14445
> URL: https://issues.apache.org/jira/browse/HDFS-14445
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
> Attachments: HDFS-14445-01.patch, HDFS-14445-02.patch
>
>
> {noformat}
> Active namenode didn't add the report back to the queue when errorReport 
> threw IOException
> {noformat}
> Reference :::
> https://builds.apache.org/job/PreCommit-HDFS-Build/26676/testReport/org.apache.hadoop.hdfs.server.datanode/TestBPOfferService/testTrySendErrorReportWhenNNThrowsIOException/
> https://builds.apache.org/job/PreCommit-HDFS-Build/26662/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestBPOfferService/testTrySendErrorReportWhenNNThrowsIOException/
> https://builds.apache.org/job/PreCommit-HDFS-Build/26661/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestBPOfferService/testTrySendErrorReportWhenNNThrowsIOException/
> https://builds.apache.org/job/PreCommit-HDFS-Build/26644/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestBPOfferService/testTrySendErrorReportWhenNNThrowsIOException/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14445) TestTrySendErrorReportWhenNNThrowsIOException fails in trunk

2019-04-22 Thread Ayush Saxena (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HDFS-14445:

Attachment: HDFS-14445-02.patch

> TestTrySendErrorReportWhenNNThrowsIOException fails in trunk
> 
>
> Key: HDFS-14445
> URL: https://issues.apache.org/jira/browse/HDFS-14445
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
> Attachments: HDFS-14445-01.patch, HDFS-14445-02.patch
>
>
> {noformat}
> Active namenode didn't add the report back to the queue when errorReport 
> threw IOException
> {noformat}
> Reference :::
> https://builds.apache.org/job/PreCommit-HDFS-Build/26676/testReport/org.apache.hadoop.hdfs.server.datanode/TestBPOfferService/testTrySendErrorReportWhenNNThrowsIOException/
> https://builds.apache.org/job/PreCommit-HDFS-Build/26662/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestBPOfferService/testTrySendErrorReportWhenNNThrowsIOException/
> https://builds.apache.org/job/PreCommit-HDFS-Build/26661/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestBPOfferService/testTrySendErrorReportWhenNNThrowsIOException/
> https://builds.apache.org/job/PreCommit-HDFS-Build/26644/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestBPOfferService/testTrySendErrorReportWhenNNThrowsIOException/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14406) Add per user RPC Processing time

2019-04-22 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823227#comment-16823227
 ] 

Chao Sun commented on HDFS-14406:
-

[~xuel1]: I'm still in favor of reusing the existing {{RpcDetailedMetrics}} as 
most of the logic are the same between this and the proposed 
{{RpcUserMetrics}}. It would be good if we can use some prefix to differentiate 
the user metrics and the RPC method metrics.

> Add per user RPC Processing time
> 
>
> Key: HDFS-14406
> URL: https://issues.apache.org/jira/browse/HDFS-14406
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.2.0
>Reporter: Xue Liu
>Assignee: Xue Liu
>Priority: Minor
> Fix For: 3.2.0
>
> Attachments: HDFS-14406.001.patch, HDFS-14406.002.patch, 
> HDFS-14406.003.patch, HDFS-14406.004.patch, HDFS-14406.005.patch, 
> HDFS-14406.006.patch
>
>
> For a shared cluster we would want to separate users' resources, as well as 
> having our metrics reflecting on the usage, latency, etc, for each user. 
> This JIRA aims to add per user RPC processing time metrics and expose it via 
> JMX.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14440) RBF: Optimize the file write process in case of multiple destinations.

2019-04-22 Thread JIRA


[ 
https://issues.apache.org/jira/browse/HDFS-14440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823223#comment-16823223
 ] 

Íñigo Goiri commented on HDFS-14440:


I think we have some metrics in JMX for the number of RPC queries.
It might be worth evaluating the number of calls and the latency of the calls 
with one approach and the other.
It might also be good to have the Namenode metrics here.

> RBF: Optimize the file write process in case of multiple destinations.
> --
>
> Key: HDFS-14440
> URL: https://issues.apache.org/jira/browse/HDFS-14440
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
> Attachments: HDFS-14440-HDFS-13891-01.patch
>
>
> In case of multiple destinations, We need to check if the file already exists 
> in one of the subclusters for which we use the existing getBlockLocation() 
> API which is by default a sequential Call,
> In an ideal scenario where the file needs to be created each subcluster shall 
> be checked sequentially, this can be done concurrently to save time.
> In another case where the file is found and if the last block is null, we 
> need to do getFileInfo to all the locations to get the location where the 
> file exists. This also can be prevented by use of ConcurrentCall since we 
> shall be having the remoteLocation to where the getBlockLocation returned a 
> non null entry.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1449) JVM Exit in datanode while committing a key

2019-04-22 Thread Mukul Kumar Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukul Kumar Singh updated HDDS-1449:

Attachment: 2019-04-22--20-23-56-IST.MiniOzoneChaosCluster.log

> JVM Exit in datanode while committing a key
> ---
>
> Key: HDDS-1449
> URL: https://issues.apache.org/jira/browse/HDDS-1449
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.3.0
>Reporter: Mukul Kumar Singh
>Priority: Major
>  Labels: MiniOzoneChaosCluster
> Attachments: 2019-04-22--20-23-56-IST.MiniOzoneChaosCluster.log, 
> hs_err_pid67466.log
>
>
> Saw the following trace in MiniOzoneChaosCluster run.
> {code}
> C  [librocksdbjni17271331491728127.jnilib+0x9755c]  
> Java_org_rocksdb_RocksDB_write0+0x1c
> J 13917  org.rocksdb.RocksDB.write0(JJJ)V (0 bytes) @ 0x0001102ff62e 
> [0x0001102ff580+0xae]
> J 17167 C2 
> org.apache.hadoop.utils.RocksDBStore.writeBatch(Lorg/apache/hadoop/utils/BatchOperation;)V
>  (260 bytes) @ 0x000111bbd01c [0x000111bbcde0+0x23c]
> J 20434 C1 
> org.apache.hadoop.ozone.container.keyvalue.impl.BlockManagerImpl.putBlock(Lorg/apache/hadoop/ozone/container/common/interfaces/Container;Lorg/apache/hadoop/ozone/container/common/helpers/BlockData;)J
>  (261 bytes) @ 0x000111c267ac [0x000111c25640+0x116c]
> J 19262 C2 
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(Lorg/apache/hadoop/hdds/protocol/datanode/proto/ContainerProtos$ContainerCommandRequestProto;Lorg/apache/hadoop/ozone/container/common/transport/server/ratis/DispatcherContext;)Lorg/apache/hadoop/hdds/protocol/datanode/proto/ContainerProtos$ContainerCommandResponseProto;
>  (866 bytes) @ 0x0001125c5aa0 [0x0001125c1560+0x4540]
> J 15095 C2 
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(Lorg/apache/hadoop/hdds/protocol/datanode/proto/ContainerProtos$ContainerCommandRequestProto;Lorg/apache/hadoop/ozone/container/common/transport/server/ratis/DispatcherContext;)Lorg/apache/hadoop/hdds/protocol/datanode/proto/ContainerProtos$ContainerCommandResponseProto;
>  (142 bytes) @ 0x000110ffc940 [0x000110ffc0c0+0x880]
> J 19301 C2 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatchCommand(Lorg/apache/hadoop/hdds/protocol/datanode/proto/ContainerProtos$ContainerCommandRequestProto;Lorg/apache/hadoop/ozone/container/common/transport/server/ratis/DispatcherContext;)Lorg/apache/hadoop/hdds/protocol/datanode/proto/ContainerProtos$ContainerCommandResponseProto;
>  (146 bytes) @ 0x000111396144 [0x000111395e60+0x2e4]
> J 15997 C2 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine$$Lambda$776.get()Ljava/lang/Object;
>  (16 bytes) @ 0x000110138e54 [0x000110138d80+0xd4]
> J 15970 C2 java.util.concurrent.CompletableFuture$AsyncSupply.run()V (61 
> bytes) @ 0x00010fc80094 [0x00010fc8+0x94]
> J 17368 C2 
> java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V
>  (225 bytes) @ 0x000110b0a7a0 [0x000110b0a5a0+0x200]
> J 7389 C1 java.util.concurrent.ThreadPoolExecutor$Worker.run()V (9 bytes) @ 
> 0x00011012a004 [0x000110129f00+0x104]
> J 6837 C1 java.lang.Thread.run()V (17 bytes) @ 0x00011002b144 
> [0x00011002b000+0x144]
> v  ~StubRoutines::call_stub
> V  [libjvm.dylib+0x2ef1f6]  JavaCalls::call_helper(JavaValue*, methodHandle*, 
> JavaCallArguments*, Thread*)+0x6ae
> V  [libjvm.dylib+0x2ef99a]  JavaCalls::call_virtual(JavaValue*, KlassHandle, 
> Symbol*, Symbol*, JavaCallArguments*, Thread*)+0x164
> V  [libjvm.dylib+0x2efb46]  JavaCalls::call_virtual(JavaValue*, Handle, 
> KlassHandle, Symbol*, Symbol*, Thread*)+0x4a
> V  [libjvm.dylib+0x34a46d]  thread_entry(JavaThread*, Thread*)+0x7c
> V  [libjvm.dylib+0x56eb0f]  JavaThread::thread_main_inner()+0x9b
> V  [libjvm.dylib+0x57020a]  JavaThread::run()+0x1c2
> V  [libjvm.dylib+0x48d4a6]  java_start(Thread*)+0xf6
> C  [libsystem_pthread.dylib+0x3305]  _pthread_body+0x7e
> C  [libsystem_pthread.dylib+0x626f]  _pthread_start+0x46
> C  [libsystem_pthread.dylib+0x2415]  thread_start+0xd
> C  0x
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14353) Erasure Coding: metrics xmitsInProgress become to negative.

2019-04-22 Thread JIRA


[ 
https://issues.apache.org/jira/browse/HDFS-14353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823205#comment-16823205
 ] 

Íñigo Goiri commented on HDFS-14353:


Do you mind fixing the checkstyle warnings?

> Erasure Coding: metrics xmitsInProgress become to negative.
> ---
>
> Key: HDFS-14353
> URL: https://issues.apache.org/jira/browse/HDFS-14353
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, erasure-coding
>Affects Versions: 3.3.0
>Reporter: maobaolong
>Assignee: maobaolong
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14353.001.patch, HDFS-14353.002.patch, 
> screenshot-1.png
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1449) JVM Exit in datanode while committing a key

2019-04-22 Thread Mukul Kumar Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukul Kumar Singh updated HDDS-1449:

Attachment: hs_err_pid67466.log

> JVM Exit in datanode while committing a key
> ---
>
> Key: HDDS-1449
> URL: https://issues.apache.org/jira/browse/HDDS-1449
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.3.0
>Reporter: Mukul Kumar Singh
>Priority: Major
>  Labels: MiniOzoneChaosCluster
> Attachments: hs_err_pid67466.log
>
>
> Saw the following trace in MiniOzoneChaosCluster run.
> {code}
> C  [librocksdbjni17271331491728127.jnilib+0x9755c]  
> Java_org_rocksdb_RocksDB_write0+0x1c
> J 13917  org.rocksdb.RocksDB.write0(JJJ)V (0 bytes) @ 0x0001102ff62e 
> [0x0001102ff580+0xae]
> J 17167 C2 
> org.apache.hadoop.utils.RocksDBStore.writeBatch(Lorg/apache/hadoop/utils/BatchOperation;)V
>  (260 bytes) @ 0x000111bbd01c [0x000111bbcde0+0x23c]
> J 20434 C1 
> org.apache.hadoop.ozone.container.keyvalue.impl.BlockManagerImpl.putBlock(Lorg/apache/hadoop/ozone/container/common/interfaces/Container;Lorg/apache/hadoop/ozone/container/common/helpers/BlockData;)J
>  (261 bytes) @ 0x000111c267ac [0x000111c25640+0x116c]
> J 19262 C2 
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(Lorg/apache/hadoop/hdds/protocol/datanode/proto/ContainerProtos$ContainerCommandRequestProto;Lorg/apache/hadoop/ozone/container/common/transport/server/ratis/DispatcherContext;)Lorg/apache/hadoop/hdds/protocol/datanode/proto/ContainerProtos$ContainerCommandResponseProto;
>  (866 bytes) @ 0x0001125c5aa0 [0x0001125c1560+0x4540]
> J 15095 C2 
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(Lorg/apache/hadoop/hdds/protocol/datanode/proto/ContainerProtos$ContainerCommandRequestProto;Lorg/apache/hadoop/ozone/container/common/transport/server/ratis/DispatcherContext;)Lorg/apache/hadoop/hdds/protocol/datanode/proto/ContainerProtos$ContainerCommandResponseProto;
>  (142 bytes) @ 0x000110ffc940 [0x000110ffc0c0+0x880]
> J 19301 C2 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatchCommand(Lorg/apache/hadoop/hdds/protocol/datanode/proto/ContainerProtos$ContainerCommandRequestProto;Lorg/apache/hadoop/ozone/container/common/transport/server/ratis/DispatcherContext;)Lorg/apache/hadoop/hdds/protocol/datanode/proto/ContainerProtos$ContainerCommandResponseProto;
>  (146 bytes) @ 0x000111396144 [0x000111395e60+0x2e4]
> J 15997 C2 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine$$Lambda$776.get()Ljava/lang/Object;
>  (16 bytes) @ 0x000110138e54 [0x000110138d80+0xd4]
> J 15970 C2 java.util.concurrent.CompletableFuture$AsyncSupply.run()V (61 
> bytes) @ 0x00010fc80094 [0x00010fc8+0x94]
> J 17368 C2 
> java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V
>  (225 bytes) @ 0x000110b0a7a0 [0x000110b0a5a0+0x200]
> J 7389 C1 java.util.concurrent.ThreadPoolExecutor$Worker.run()V (9 bytes) @ 
> 0x00011012a004 [0x000110129f00+0x104]
> J 6837 C1 java.lang.Thread.run()V (17 bytes) @ 0x00011002b144 
> [0x00011002b000+0x144]
> v  ~StubRoutines::call_stub
> V  [libjvm.dylib+0x2ef1f6]  JavaCalls::call_helper(JavaValue*, methodHandle*, 
> JavaCallArguments*, Thread*)+0x6ae
> V  [libjvm.dylib+0x2ef99a]  JavaCalls::call_virtual(JavaValue*, KlassHandle, 
> Symbol*, Symbol*, JavaCallArguments*, Thread*)+0x164
> V  [libjvm.dylib+0x2efb46]  JavaCalls::call_virtual(JavaValue*, Handle, 
> KlassHandle, Symbol*, Symbol*, Thread*)+0x4a
> V  [libjvm.dylib+0x34a46d]  thread_entry(JavaThread*, Thread*)+0x7c
> V  [libjvm.dylib+0x56eb0f]  JavaThread::thread_main_inner()+0x9b
> V  [libjvm.dylib+0x57020a]  JavaThread::run()+0x1c2
> V  [libjvm.dylib+0x48d4a6]  java_start(Thread*)+0xf6
> C  [libsystem_pthread.dylib+0x3305]  _pthread_body+0x7e
> C  [libsystem_pthread.dylib+0x626f]  _pthread_start+0x46
> C  [libsystem_pthread.dylib+0x2415]  thread_start+0xd
> C  0x
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-1450) Fix nightly run failures after HDDS-976

2019-04-22 Thread Xiaoyu Yao (JIRA)
Xiaoyu Yao created HDDS-1450:


 Summary: Fix nightly run failures after HDDS-976
 Key: HDDS-1450
 URL: https://issues.apache.org/jira/browse/HDDS-1450
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Xiaoyu Yao
Assignee: Xiaoyu Yao


[https://ci.anzix.net/job/ozone-nightly/72/testReport/]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14445) TestTrySendErrorReportWhenNNThrowsIOException fails in trunk

2019-04-22 Thread JIRA


[ 
https://issues.apache.org/jira/browse/HDFS-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823203#comment-16823203
 ] 

Íñigo Goiri commented on HDFS-14445:


Removing a sleep() is always good.
Can we use a lambda for the waitFor?

> TestTrySendErrorReportWhenNNThrowsIOException fails in trunk
> 
>
> Key: HDFS-14445
> URL: https://issues.apache.org/jira/browse/HDFS-14445
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
> Attachments: HDFS-14445-01.patch
>
>
> {noformat}
> Active namenode didn't add the report back to the queue when errorReport 
> threw IOException
> {noformat}
> Reference :::
> https://builds.apache.org/job/PreCommit-HDFS-Build/26676/testReport/org.apache.hadoop.hdfs.server.datanode/TestBPOfferService/testTrySendErrorReportWhenNNThrowsIOException/
> https://builds.apache.org/job/PreCommit-HDFS-Build/26662/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestBPOfferService/testTrySendErrorReportWhenNNThrowsIOException/
> https://builds.apache.org/job/PreCommit-HDFS-Build/26661/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestBPOfferService/testTrySendErrorReportWhenNNThrowsIOException/
> https://builds.apache.org/job/PreCommit-HDFS-Build/26644/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestBPOfferService/testTrySendErrorReportWhenNNThrowsIOException/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



  1   2   >