[jira] [Resolved] (HDDS-2347) XCeiverClientGrpc's parallel use leads to NPE

2019-10-30 Thread Lokesh Jain (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lokesh Jain resolved HDDS-2347.
---
Fix Version/s: 0.5.0
   Resolution: Fixed

> XCeiverClientGrpc's parallel use leads to NPE
> -
>
> Key: HDDS-2347
> URL: https://issues.apache.org/jira/browse/HDDS-2347
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Client
>Reporter: Istvan Fajth
>Assignee: Istvan Fajth
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.5.0
>
> Attachments: changes.diff, logs.txt
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This issue came up when testing Hive with ORC tables on Ozone storage 
> backend, I so far I could not reproduce it locally within a JUnit test but 
> the issue.
> I am attaching a diff file that shows what logging I have added in 
> XCevierClientGrpc and in KeyInputStream to get the results that made me 
> arrive to the following understanding of the scenario:
> - Hive starts a couple of threads to work on the table data during query 
> execution
> - There is one RPCClient that is being used by these threads
> - The threads are opening different stream to read from the same key in ozone
> - The InputStreams internally are using the same XCeiverClientGrpc
> - XCeiverClientGrpc throws the following NPE intermittently:
> {code}
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandAsync(XceiverClientGrpc.java:398)
> at 
> org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithRetry(XceiverClientGrpc.java:295)
> at 
> org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithTraceIDAndRetry(XceiverClientGrpc.java:259)
> at 
> org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommand(XceiverClientGrpc.java:242)
> at 
> org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.getBlock(ContainerProtocolCalls.java:118)
> at 
> org.apache.hadoop.hdds.scm.storage.BlockInputStream.getChunkInfos(BlockInputStream.java:169)
> at 
> org.apache.hadoop.hdds.scm.storage.BlockInputStream.initialize(BlockInputStream.java:118)
> at 
> org.apache.hadoop.hdds.scm.storage.BlockInputStream.read(BlockInputStream.java:224)
> at 
> org.apache.hadoop.ozone.client.io.KeyInputStream.read(KeyInputStream.java:173)
> at 
> org.apache.hadoop.fs.ozone.OzoneFSInputStream.read(OzoneFSInputStream.java:52)
> at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75)
> at 
> org.apache.hadoop.fs.FSInputStream.readFully(FSInputStream.java:121)
> at 
> org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112)
> at org.apache.orc.impl.ReaderImpl.extractFileTail(ReaderImpl.java:555)
> at org.apache.orc.impl.ReaderImpl.(ReaderImpl.java:370)
> at 
> org.apache.hadoop.hive.ql.io.orc.ReaderImpl.(ReaderImpl.java:61)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:105)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.populateAndCacheStripeDetails(OrcInputFormat.java:1708)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.callInternal(OrcInputFormat.java:1596)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.access$2900(OrcInputFormat.java:1383)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator$1.run(OrcInputFormat.java:1568)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator$1.run(OrcInputFormat.java:1565)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:1565)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:1383)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> {code}
> I have two proposals to fix this issue, one is the easy answer to put 
> synchronization to the XCeiverClientGrpc code, the other one is a bit more 
> complicated, let me explain below.
> Naively I would assume that when I get a client SPI instance from 
> XCeiverClientManager, that instance is ready to use. In fact it is not, and 
> when the user of the SPI instance sends the first request that is the point 
> when the client gets essentially ready. Now if we put synchronization to this 
> code, that is the easy solution, but my pragmatic half screams for a better 
> solution, that ensures that the

[jira] [Work logged] (HDDS-2347) XCeiverClientGrpc's parallel use leads to NPE

2019-10-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2347?focusedWorklogId=336017&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-336017
 ]

ASF GitHub Bot logged work on HDDS-2347:


Author: ASF GitHub Bot
Created on: 30/Oct/19 07:02
Start Date: 30/Oct/19 07:02
Worklog Time Spent: 10m 
  Work Description: lokeshj1703 commented on pull request #81: HDDS-2347. 
XCeiverClientGrpc's parallel use leads to NPE
URL: https://github.com/apache/hadoop-ozone/pull/81
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 336017)
Time Spent: 20m  (was: 10m)

> XCeiverClientGrpc's parallel use leads to NPE
> -
>
> Key: HDDS-2347
> URL: https://issues.apache.org/jira/browse/HDDS-2347
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Client
>Reporter: Istvan Fajth
>Assignee: Istvan Fajth
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.5.0
>
> Attachments: changes.diff, logs.txt
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This issue came up when testing Hive with ORC tables on Ozone storage 
> backend, I so far I could not reproduce it locally within a JUnit test but 
> the issue.
> I am attaching a diff file that shows what logging I have added in 
> XCevierClientGrpc and in KeyInputStream to get the results that made me 
> arrive to the following understanding of the scenario:
> - Hive starts a couple of threads to work on the table data during query 
> execution
> - There is one RPCClient that is being used by these threads
> - The threads are opening different stream to read from the same key in ozone
> - The InputStreams internally are using the same XCeiverClientGrpc
> - XCeiverClientGrpc throws the following NPE intermittently:
> {code}
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandAsync(XceiverClientGrpc.java:398)
> at 
> org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithRetry(XceiverClientGrpc.java:295)
> at 
> org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithTraceIDAndRetry(XceiverClientGrpc.java:259)
> at 
> org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommand(XceiverClientGrpc.java:242)
> at 
> org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.getBlock(ContainerProtocolCalls.java:118)
> at 
> org.apache.hadoop.hdds.scm.storage.BlockInputStream.getChunkInfos(BlockInputStream.java:169)
> at 
> org.apache.hadoop.hdds.scm.storage.BlockInputStream.initialize(BlockInputStream.java:118)
> at 
> org.apache.hadoop.hdds.scm.storage.BlockInputStream.read(BlockInputStream.java:224)
> at 
> org.apache.hadoop.ozone.client.io.KeyInputStream.read(KeyInputStream.java:173)
> at 
> org.apache.hadoop.fs.ozone.OzoneFSInputStream.read(OzoneFSInputStream.java:52)
> at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75)
> at 
> org.apache.hadoop.fs.FSInputStream.readFully(FSInputStream.java:121)
> at 
> org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:112)
> at org.apache.orc.impl.ReaderImpl.extractFileTail(ReaderImpl.java:555)
> at org.apache.orc.impl.ReaderImpl.(ReaderImpl.java:370)
> at 
> org.apache.hadoop.hive.ql.io.orc.ReaderImpl.(ReaderImpl.java:61)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:105)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.populateAndCacheStripeDetails(OrcInputFormat.java:1708)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.callInternal(OrcInputFormat.java:1596)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.access$2900(OrcInputFormat.java:1383)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator$1.run(OrcInputFormat.java:1568)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator$1.run(OrcInputFormat.java:1565)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:1565)
> at 
> org.apa

[jira] [Commented] (HDFS-14882) Consider DataNode load when #getBlockLocation

2019-10-30 Thread Xiaoqiao He (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16962758#comment-16962758
 ] 

Xiaoqiao He commented on HDFS-14882:


Thanks [~pifta] for your suggestions,
{quote}I would suggest two more things to do, we might deprecate the old sorter 
methods, as we most likely won't need them on the long run, as their use 
effectively overrides the new setting, and an update would be nice to the 
APIDoc of these methods.{quote}
It make sense for me. I agree that the following methods should be removed. 
BTW, this common interface is invoke by different classes. I would like to 
update that later.
{code:java}
  public void sortByDistance(Node reader, Node[] nodes, int activeLen)
  public void sortByDistanceUsingNetworkLocation(Node reader, Node[] nodes,
  int activeLen)
{code}
cc [~ayushtkn],[~elgoiri] any other comments?

> Consider DataNode load when #getBlockLocation
> -
>
> Key: HDFS-14882
> URL: https://issues.apache.org/jira/browse/HDFS-14882
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Xiaoqiao He
>Assignee: Xiaoqiao He
>Priority: Major
> Attachments: HDFS-14882.001.patch, HDFS-14882.002.patch, 
> HDFS-14882.003.patch, HDFS-14882.004.patch, HDFS-14882.005.patch, 
> HDFS-14882.006.patch, HDFS-14882.007.patch, HDFS-14882.008.patch, 
> HDFS-14882.009.patch, HDFS-14882.suggestion
>
>
> Currently, we consider load of datanode when #chooseTarget for writer, 
> however not consider it for reader. Thus, the process slot of datanode could 
> be occupied by #BlockSender for reader, and disk/network will be busy 
> workload, then meet some slow node exception. IIRC same case is reported 
> times. Based on the fact, I propose to consider load for reader same as it 
> did #chooseTarget for writer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14882) Consider DataNode load when #getBlockLocation

2019-10-30 Thread Xiaoqiao He (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16962758#comment-16962758
 ] 

Xiaoqiao He edited comment on HDFS-14882 at 10/30/19 7:08 AM:
--

Thanks [~pifta] for your suggestions,
{quote}I would suggest two more things to do, we might deprecate the old sorter 
methods, as we most likely won't need them on the long run, as their use 
effectively overrides the new setting, and an update would be nice to the 
APIDoc of these methods.{quote}
It make sense for me. I agree that the following methods should be removed. 
BTW, this common interface is invoke by different classes. I would like to 
update that later.
{code:java}
  public void sortByDistance(Node reader, Node[] nodes, int activeLen)
  public void sortByDistanceUsingNetworkLocation(Node reader, Node[] nodes,
  int activeLen)
{code}
Check the failed unit test {{TestNetworkTopology}} is related to this changes. 
Will fix that later.
cc [~ayushtkn],[~elgoiri] any other comments?


was (Author: hexiaoqiao):
Thanks [~pifta] for your suggestions,
{quote}I would suggest two more things to do, we might deprecate the old sorter 
methods, as we most likely won't need them on the long run, as their use 
effectively overrides the new setting, and an update would be nice to the 
APIDoc of these methods.{quote}
It make sense for me. I agree that the following methods should be removed. 
BTW, this common interface is invoke by different classes. I would like to 
update that later.
{code:java}
  public void sortByDistance(Node reader, Node[] nodes, int activeLen)
  public void sortByDistanceUsingNetworkLocation(Node reader, Node[] nodes,
  int activeLen)
{code}
cc [~ayushtkn],[~elgoiri] any other comments?

> Consider DataNode load when #getBlockLocation
> -
>
> Key: HDFS-14882
> URL: https://issues.apache.org/jira/browse/HDFS-14882
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Xiaoqiao He
>Assignee: Xiaoqiao He
>Priority: Major
> Attachments: HDFS-14882.001.patch, HDFS-14882.002.patch, 
> HDFS-14882.003.patch, HDFS-14882.004.patch, HDFS-14882.005.patch, 
> HDFS-14882.006.patch, HDFS-14882.007.patch, HDFS-14882.008.patch, 
> HDFS-14882.009.patch, HDFS-14882.suggestion
>
>
> Currently, we consider load of datanode when #chooseTarget for writer, 
> however not consider it for reader. Thus, the process slot of datanode could 
> be occupied by #BlockSender for reader, and disk/network will be busy 
> workload, then meet some slow node exception. IIRC same case is reported 
> times. Based on the fact, I propose to consider load for reader same as it 
> did #chooseTarget for writer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14942) Change Log Level to warn in JournalNodeSyncer#syncWithJournalAtIndex

2019-10-30 Thread Lisheng Sun (Jira)
Lisheng Sun created HDFS-14942:
--

 Summary: Change Log Level to warn in 
JournalNodeSyncer#syncWithJournalAtIndex
 Key: HDFS-14942
 URL: https://issues.apache.org/jira/browse/HDFS-14942
 Project: Hadoop HDFS
  Issue Type: Improvement
 Environment: when hadoop 2.x upgrades to hadoop 3.x,  
InterQJournalProtocol is newly added,so  throw Unknown protocol. 

the newly InterQJournalProtocol is used to sychronize past log segments to JNs 
that missed them.  And when an error occurs does not affect normal service. I 
think it should not be a ERROR log,and that log a warn log is more reasonable.
{code:java}
 private void syncWithJournalAtIndex(int index) {
  ...
GetEditLogManifestResponseProto editLogManifest;
try {
  editLogManifest = jnProxy.getEditLogManifestFromJournal(jid,
  nameServiceId, 0, false);
} catch (IOException e) {
  LOG.error("Could not sync with Journal at " +
  otherJNProxies.get(journalNodeIndexForSync), e);
  return;
}
{code}
{code:java}
2019-10-30,15:11:17,388 ERROR 
org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer: Could not sync with 
Journal at mos1-hadoop-prc-ct17.ksru/10.85.3.59:111002019-10-30,15:11:17,388 
ERROR org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer: Could not sync 
with Journal at 
mos1-hadoop-prc-ct17.ksru/10.85.3.59:11100org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException):
 Unknown protocol: 
org.apache.hadoop.hdfs.qjournal.protocol.InterQJournalProtocol at 
org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1565) at 
org.apache.hadoop.ipc.Client.call(Client.java:1511) at 
org.apache.hadoop.ipc.Client.call(Client.java:1421) at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
 at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
 at com.sun.proxy.$Proxy16.getEditLogManifestFromJournal(Unknown Source) at 
org.apache.hadoop.hdfs.qjournal.protocolPB.InterQJournalProtocolTranslatorPB.getEditLogManifestFromJournal(InterQJournalProtocolTranslatorPB.java:75)
 at 
org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer.syncWithJournalAtIndex(JournalNodeSyncer.java:250)
 at 
org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer.syncJournals(JournalNodeSyncer.java:226)
 at 
org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer.lambda$startSyncJournalsDaemon$0(JournalNodeSyncer.java:186)
 at java.lang.Thread.run(Thread.java:748)
{code}
Reporter: Lisheng Sun






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14942) Change Log Level to warn in JournalNodeSyncer#syncWithJournalAtIndex

2019-10-30 Thread Lisheng Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-14942:
---
Environment: 
when hadoop 2.x upgrades to hadoop 3.x,  InterQJournalProtocol is newly 
added,so  throw Unknown protocol. 

the newly InterQJournalProtocol is used to sychronize past log segments to JNs 
that missed them.  And that an error occurs does not affect normal service. I 
think it should not be a ERROR log,and that log a warn log is more reasonable.
{code:java}
 private void syncWithJournalAtIndex(int index) {
  ...
GetEditLogManifestResponseProto editLogManifest;
try {
  editLogManifest = jnProxy.getEditLogManifestFromJournal(jid,
  nameServiceId, 0, false);
} catch (IOException e) {
  LOG.error("Could not sync with Journal at " +
  otherJNProxies.get(journalNodeIndexForSync), e);
  return;
}
{code}
{code:java}
2019-10-30,15:11:17,388 ERROR 
org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer: Could not sync with 
Journal at mos1-hadoop-prc-ct17.ksru/10.85.3.59:111002019-10-30,15:11:17,388 
ERROR org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer: Could not sync 
with Journal at 
mos1-hadoop-prc-ct17.ksru/10.85.3.59:11100org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException):
 Unknown protocol: 
org.apache.hadoop.hdfs.qjournal.protocol.InterQJournalProtocol at 
org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1565) at 
org.apache.hadoop.ipc.Client.call(Client.java:1511) at 
org.apache.hadoop.ipc.Client.call(Client.java:1421) at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
 at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
 at com.sun.proxy.$Proxy16.getEditLogManifestFromJournal(Unknown Source) at 
org.apache.hadoop.hdfs.qjournal.protocolPB.InterQJournalProtocolTranslatorPB.getEditLogManifestFromJournal(InterQJournalProtocolTranslatorPB.java:75)
 at 
org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer.syncWithJournalAtIndex(JournalNodeSyncer.java:250)
 at 
org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer.syncJournals(JournalNodeSyncer.java:226)
 at 
org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer.lambda$startSyncJournalsDaemon$0(JournalNodeSyncer.java:186)
 at java.lang.Thread.run(Thread.java:748)
{code}

  was:
when hadoop 2.x upgrades to hadoop 3.x,  InterQJournalProtocol is newly 
added,so  throw Unknown protocol. 

the newly InterQJournalProtocol is used to sychronize past log segments to JNs 
that missed them.  And when an error occurs does not affect normal service. I 
think it should not be a ERROR log,and that log a warn log is more reasonable.
{code:java}
 private void syncWithJournalAtIndex(int index) {
  ...
GetEditLogManifestResponseProto editLogManifest;
try {
  editLogManifest = jnProxy.getEditLogManifestFromJournal(jid,
  nameServiceId, 0, false);
} catch (IOException e) {
  LOG.error("Could not sync with Journal at " +
  otherJNProxies.get(journalNodeIndexForSync), e);
  return;
}
{code}
{code:java}
2019-10-30,15:11:17,388 ERROR 
org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer: Could not sync with 
Journal at mos1-hadoop-prc-ct17.ksru/10.85.3.59:111002019-10-30,15:11:17,388 
ERROR org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer: Could not sync 
with Journal at 
mos1-hadoop-prc-ct17.ksru/10.85.3.59:11100org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException):
 Unknown protocol: 
org.apache.hadoop.hdfs.qjournal.protocol.InterQJournalProtocol at 
org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1565) at 
org.apache.hadoop.ipc.Client.call(Client.java:1511) at 
org.apache.hadoop.ipc.Client.call(Client.java:1421) at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
 at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
 at com.sun.proxy.$Proxy16.getEditLogManifestFromJournal(Unknown Source) at 
org.apache.hadoop.hdfs.qjournal.protocolPB.InterQJournalProtocolTranslatorPB.getEditLogManifestFromJournal(InterQJournalProtocolTranslatorPB.java:75)
 at 
org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer.syncWithJournalAtIndex(JournalNodeSyncer.java:250)
 at 
org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer.syncJournals(JournalNodeSyncer.java:226)
 at 
org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer.lambda$startSyncJournalsDaemon$0(JournalNodeSyncer.java:186)
 at java.lang.Thread.run(Thread.java:748)
{code}


> Change Log Level to warn in JournalNodeSyncer#syncWithJournalAtIndex
> 
>
> Key: HDFS-14942
> URL: https://issues.apache.org/jira/browse/HDFS-14942
> Project: Hadoop HDFS
>  Issu

[jira] [Updated] (HDFS-14942) Change Log Level to warn in JournalNodeSyncer#syncWithJournalAtIndex

2019-10-30 Thread Lisheng Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-14942:
---
Attachment: HDFS-14942.001.patch
Status: Patch Available  (was: Open)

> Change Log Level to warn in JournalNodeSyncer#syncWithJournalAtIndex
> 
>
> Key: HDFS-14942
> URL: https://issues.apache.org/jira/browse/HDFS-14942
> Project: Hadoop HDFS
>  Issue Type: Improvement
> Environment: when hadoop 2.x upgrades to hadoop 3.x,  
> InterQJournalProtocol is newly added,so  throw Unknown protocol. 
> the newly InterQJournalProtocol is used to sychronize past log segments to 
> JNs that missed them.  And that an error occurs does not affect normal 
> service. I think it should not be a ERROR log,and that log a warn log is more 
> reasonable.
> {code:java}
>  private void syncWithJournalAtIndex(int index) {
>   ...
> GetEditLogManifestResponseProto editLogManifest;
> try {
>   editLogManifest = jnProxy.getEditLogManifestFromJournal(jid,
>   nameServiceId, 0, false);
> } catch (IOException e) {
>   LOG.error("Could not sync with Journal at " +
>   otherJNProxies.get(journalNodeIndexForSync), e);
>   return;
> }
> {code}
> {code:java}
> 2019-10-30,15:11:17,388 ERROR 
> org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer: Could not sync with 
> Journal at mos1-hadoop-prc-ct17.ksru/10.85.3.59:111002019-10-30,15:11:17,388 
> ERROR org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer: Could not 
> sync with Journal at 
> mos1-hadoop-prc-ct17.ksru/10.85.3.59:11100org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException):
>  Unknown protocol: 
> org.apache.hadoop.hdfs.qjournal.protocol.InterQJournalProtocol at 
> org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1565) at 
> org.apache.hadoop.ipc.Client.call(Client.java:1511) at 
> org.apache.hadoop.ipc.Client.call(Client.java:1421) at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>  at com.sun.proxy.$Proxy16.getEditLogManifestFromJournal(Unknown Source) at 
> org.apache.hadoop.hdfs.qjournal.protocolPB.InterQJournalProtocolTranslatorPB.getEditLogManifestFromJournal(InterQJournalProtocolTranslatorPB.java:75)
>  at 
> org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer.syncWithJournalAtIndex(JournalNodeSyncer.java:250)
>  at 
> org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer.syncJournals(JournalNodeSyncer.java:226)
>  at 
> org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer.lambda$startSyncJournalsDaemon$0(JournalNodeSyncer.java:186)
>  at java.lang.Thread.run(Thread.java:748)
> {code}
>Reporter: Lisheng Sun
>Priority: Minor
> Attachments: HDFS-14942.001.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-2370) Remove classpath in RunningWithHDFS.md ozone-hdfs/docker-compose as dir 'ozoneplugin' is not exist anymore

2019-10-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2370?focusedWorklogId=336035&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-336035
 ]

ASF GitHub Bot logged work on HDDS-2370:


Author: ASF GitHub Bot
Created on: 30/Oct/19 07:52
Start Date: 30/Oct/19 07:52
Worklog Time Spent: 10m 
  Work Description: chimney-lee commented on pull request #105: 
HDDS-2370.Support HddsDatanodeService run as DataNode Plugin
URL: https://github.com/apache/hadoop-ozone/pull/105
 
 
   ## What changes were proposed in this pull request?
   
   With cuuent version, cannot run hddsdatanodeservice as hdfs datanode pulgin, 
as there is no constructor without parameters.
   ` java.lang.NoSuchMethodException: 
org.apache.hadoop.ozone.HddsDatanodeService.()`
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-2370
   
   ## How was this patch tested?
   
   refer to RunningWithHDFS.md, config and run hddsdatanodeservice as datanode 
plugin, function go well with ozone.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 336035)
Remaining Estimate: 0h
Time Spent: 10m

> Remove classpath in RunningWithHDFS.md ozone-hdfs/docker-compose as dir 
> 'ozoneplugin' is not exist anymore
> --
>
> Key: HDDS-2370
> URL: https://issues.apache.org/jira/browse/HDDS-2370
> Project: Hadoop Distributed Data Store
>  Issue Type: Task
>  Components: documentation
>Reporter: luhuachao
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDDS-2370.1.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In RunningWithHDFS.md 
> {code:java}
> export 
> HADOOP_CLASSPATH=/opt/ozone/share/hadoop/ozoneplugin/hadoop-ozone-datanode-plugin.jar{code}
> ozone-hdfs/docker-compose.yaml
>  
> {code:java}
>   environment:
>  HADOOP_CLASSPATH: /opt/ozone/share/hadoop/ozoneplugin/*.jar
> {code}
> when i run hddsdatanodeservice as pulgin in hdfs datanode, it comes out with 
> the error below , there is no constructor without parameter.
>  
>  
> {code:java}
> 2019-10-21 21:38:56,391 ERROR datanode.DataNode 
> (DataNode.java:startPlugins(972)) - Unable to load DataNode plugins. 
> Specified list of plugins: org.apache.hadoop.ozone.HddsDatanodeService
> java.lang.RuntimeException: java.lang.NoSuchMethodException: 
> org.apache.hadoop.ozone.HddsDatanodeService.()
> {code}
> what i doubt is that, ozone-0.5 not support running as a plugin in hdfs 
> datanode now ? if so, 
> why donnot  we remove doc RunningWithHDFS.md ? 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2370) Remove classpath in RunningWithHDFS.md ozone-hdfs/docker-compose as dir 'ozoneplugin' is not exist anymore

2019-10-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-2370:
-
Labels: pull-request-available  (was: )

> Remove classpath in RunningWithHDFS.md ozone-hdfs/docker-compose as dir 
> 'ozoneplugin' is not exist anymore
> --
>
> Key: HDDS-2370
> URL: https://issues.apache.org/jira/browse/HDDS-2370
> Project: Hadoop Distributed Data Store
>  Issue Type: Task
>  Components: documentation
>Reporter: luhuachao
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDDS-2370.1.patch
>
>
> In RunningWithHDFS.md 
> {code:java}
> export 
> HADOOP_CLASSPATH=/opt/ozone/share/hadoop/ozoneplugin/hadoop-ozone-datanode-plugin.jar{code}
> ozone-hdfs/docker-compose.yaml
>  
> {code:java}
>   environment:
>  HADOOP_CLASSPATH: /opt/ozone/share/hadoop/ozoneplugin/*.jar
> {code}
> when i run hddsdatanodeservice as pulgin in hdfs datanode, it comes out with 
> the error below , there is no constructor without parameter.
>  
>  
> {code:java}
> 2019-10-21 21:38:56,391 ERROR datanode.DataNode 
> (DataNode.java:startPlugins(972)) - Unable to load DataNode plugins. 
> Specified list of plugins: org.apache.hadoop.ozone.HddsDatanodeService
> java.lang.RuntimeException: java.lang.NoSuchMethodException: 
> org.apache.hadoop.ozone.HddsDatanodeService.()
> {code}
> what i doubt is that, ozone-0.5 not support running as a plugin in hdfs 
> datanode now ? if so, 
> why donnot  we remove doc RunningWithHDFS.md ? 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-2382) Consider reducing number of file::exists() calls during write operation

2019-10-30 Thread Rajesh Balamohan (Jira)
Rajesh Balamohan created HDDS-2382:
--

 Summary: Consider reducing number of file::exists() calls during 
write operation
 Key: HDDS-2382
 URL: https://issues.apache.org/jira/browse/HDDS-2382
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Datanode
Reporter: Rajesh Balamohan


When writing 100-200 MB files with multiple threads, observed lots of 
{{[file::exists(])}} checks.

For every 16 MB chunk, it ends up checking whether {{chunksLoc}} directory 
exists or not. (ref: 
[https://github.com/apache/hadoop-ozone/blob/master/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/helpers/ChunkUtils.java#L239])

Also, this check ({{ChunkUtils.getChunkFile}}) happens from 2 places.

1.org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.lambda$handleWriteChunk

2.org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.lambda$applyTransaction

Note that these are folders and not actual chunk filenames. It would be helpful 
to reduce this check, if we track create/delete of these folders.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2382) Consider reducing number of file::exists() calls during write operation

2019-10-30 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HDDS-2382:
---
Labels: performance  (was: )

> Consider reducing number of file::exists() calls during write operation
> ---
>
> Key: HDDS-2382
> URL: https://issues.apache.org/jira/browse/HDDS-2382
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Rajesh Balamohan
>Priority: Major
>  Labels: performance
>
> When writing 100-200 MB files with multiple threads, observed lots of 
> {{[file::exists(])}} checks.
> For every 16 MB chunk, it ends up checking whether {{chunksLoc}} directory 
> exists or not. (ref: 
> [https://github.com/apache/hadoop-ozone/blob/master/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/helpers/ChunkUtils.java#L239])
> Also, this check ({{ChunkUtils.getChunkFile}}) happens from 2 places.
> 1.org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.lambda$handleWriteChunk
> 2.org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.lambda$applyTransaction
> Note that these are folders and not actual chunk filenames. It would be 
> helpful to reduce this check, if we track create/delete of these folders.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-2383) Closing open container via SCMCli throws exception

2019-10-30 Thread Rajesh Balamohan (Jira)
Rajesh Balamohan created HDDS-2383:
--

 Summary: Closing open container via SCMCli throws exception
 Key: HDDS-2383
 URL: https://issues.apache.org/jira/browse/HDDS-2383
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: SCM
Reporter: Rajesh Balamohan


This was observed in apache master branch.

Closing the container via {{SCMCli}} throws the following exception, though the 
container ends up getting closed eventually.

{noformat}
2019-10-30 02:44:41,794 INFO 
org.apache.hadoop.hdds.scm.block.SCMBlockDeletingService: Block deletion txnID 
mismatch in datanode 79626ba3-1957-46e5-a8b0-32d7f47fb801 for containerID 6. 
Datanode delete txnID: 0, SCM txnID: 1004
2019-10-30 02:44:41,810 INFO 
org.apache.hadoop.hdds.scm.container.IncrementalContainerReportHandler: Moving 
container #4 to CLOSED state, datanode 8885d4ba-228a-4fd2-bf5a-831f01594c6c{ip: 
10.17.234.37, host: vd1327.halxg.cloudera.com, networkLocation: /default-rack, 
certSerialId: null} reported CLOSED replica.
2019-10-30 02:44:41,826 INFO 
org.apache.hadoop.hdds.scm.server.SCMClientProtocolServer: Object type 
container id 4 op close new stage complete
2019-10-30 02:44:41,826 ERROR 
org.apache.hadoop.hdds.scm.container.ContainerStateManager: Failed to update 
container state #4, reason: invalid state transition from state: CLOSED upon 
event: CLOSE.
2019-10-30 02:44:41,826 INFO org.apache.hadoop.ipc.Server: IPC Server handler 6 
on 9860, call Call#3 Retry#0 
org.apache.hadoop.hdds.scm.protocol.StorageContainerLocationProtocol.submitRequest
 from 10.17.234.32:45926
org.apache.hadoop.hdds.scm.exceptions.SCMException: Failed to update container 
state #4, reason: invalid state transition from state: CLOSED upon event: CLOSE.
at 
org.apache.hadoop.hdds.scm.container.ContainerStateManager.updateContainerState(ContainerStateManager.java:338)
at 
org.apache.hadoop.hdds.scm.container.SCMContainerManager.updateContainerState(SCMContainerManager.java:326)
at 
org.apache.hadoop.hdds.scm.server.SCMClientProtocolServer.notifyObjectStageChange(SCMClientProtocolServer.java:388)
at 
org.apache.hadoop.hdds.scm.protocol.StorageContainerLocationProtocolServerSideTranslatorPB.notifyObjectStageChange(StorageContainerLocationProtocolServerSideTranslatorPB.java:303)
at 
org.apache.hadoop.hdds.scm.protocol.StorageContainerLocationProtocolServerSideTranslatorPB.processRequest(StorageContainerLocationProtocolServerSideTranslatorPB.java:158)
at 
org.apache.hadoop.hdds.scm.protocol.StorageContainerLocationProtocolServerSideTranslatorPB$$Lambda$152/2036820231.apply(Unknown
 Source)
at 
org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:72)
at 
org.apache.hadoop.hdds.scm.protocol.StorageContainerLocationProtocolServerSideTranslatorPB.submitRequest(StorageContainerLocationProtocolServerSideTranslatorPB.java:112)
at 
org.apache.hadoop.hdds.protocol.proto.StorageContainerLocationProtocolProtos$StorageContainerLocationProtocolService$2.callBlockingMethod(StorageContainerLocationProtocolProtos.java:30454)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14942) Change Log Level to warn in JournalNodeSyncer#syncWithJournalAtIndex

2019-10-30 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16962902#comment-16962902
 ] 

Hadoop QA commented on HDFS-14942:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
56s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
 3s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 26s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
14s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 27s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
11s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}103m 17s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
32s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}165m  0s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.TestFileCreation |
|   | hadoop.hdfs.server.namenode.TestNamenodeCapacityReport |
|   | hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks |
|   | hadoop.hdfs.TestMultipleNNPortQOP |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | HDFS-14942 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12984343/HDFS-14942.001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux b6fe14fcf6dd 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 
05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 012756a |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28210/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28210/testReport/ |
| Ma

[jira] [Updated] (HDDS-2384) Large chunks during write can have memory pressure on DN with multiple clients

2019-10-30 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HDDS-2384:
---
Labels: performance  (was: )

> Large chunks during write can have memory pressure on DN with multiple clients
> --
>
> Key: HDDS-2384
> URL: https://issues.apache.org/jira/browse/HDDS-2384
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Major
>  Labels: performance
>
> During large file writes, it ends up writing {{16 MB}} chunks.  
> https://github.com/apache/hadoop-ozone/blob/master/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java#L691
> In large clusters, 100s of clients may connect to DN. In such cases, 
> depending on the incoming write workload mem load on DN can increase 
> significantly. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-2384) Large chunks during write can have memory pressure on DN with multiple clients

2019-10-30 Thread Rajesh Balamohan (Jira)
Rajesh Balamohan created HDDS-2384:
--

 Summary: Large chunks during write can have memory pressure on DN 
with multiple clients
 Key: HDDS-2384
 URL: https://issues.apache.org/jira/browse/HDDS-2384
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
Reporter: Rajesh Balamohan


During large file writes, it ends up writing {{16 MB}} chunks.  

https://github.com/apache/hadoop-ozone/blob/master/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java#L691

In large clusters, 100s of clients may connect to DN. In such cases, depending 
on the incoming write workload mem load on DN can increase significantly. 





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2376) Fail to read data through XceiverClientGrpc

2019-10-30 Thread Mukul Kumar Singh (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16962928#comment-16962928
 ] 

Mukul Kumar Singh commented on HDDS-2376:
-

hi [~Sammi], are there any errors on the datanode ? I mean is the chunk file 
present on the datanode. Also we can verify checksum's locally on the datanode 
as well.

> Fail to read data through XceiverClientGrpc
> ---
>
> Key: HDDS-2376
> URL: https://issues.apache.org/jira/browse/HDDS-2376
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Sammi Chen
>Assignee: Hanisha Koneru
>Priority: Blocker
>
> Run teragen, application failed with following stack, 
> 19/10/29 14:35:42 INFO mapreduce.Job: Running job: job_1567133159094_0048
> 19/10/29 14:35:59 INFO mapreduce.Job: Job job_1567133159094_0048 running in 
> uber mode : false
> 19/10/29 14:35:59 INFO mapreduce.Job:  map 0% reduce 0%
> 19/10/29 14:35:59 INFO mapreduce.Job: Job job_1567133159094_0048 failed with 
> state FAILED due to: Application application_1567133159094_0048 failed 2 
> times due to AM Container for appattempt_1567133159094_0048_02 exited 
> with  exitCode: -1000
> For more detailed output, check application tracking 
> page:http://host183:8088/cluster/app/application_1567133159094_0048Then, 
> click on links to logs of each attempt.
> Diagnostics: Unexpected OzoneException: 
> org.apache.hadoop.ozone.common.OzoneChecksumException: Checksum mismatch at 
> index 0
> java.io.IOException: Unexpected OzoneException: 
> org.apache.hadoop.ozone.common.OzoneChecksumException: Checksum mismatch at 
> index 0
>   at 
> org.apache.hadoop.hdds.scm.storage.ChunkInputStream.readChunk(ChunkInputStream.java:342)
>   at 
> org.apache.hadoop.hdds.scm.storage.ChunkInputStream.readChunkFromContainer(ChunkInputStream.java:307)
>   at 
> org.apache.hadoop.hdds.scm.storage.ChunkInputStream.prepareRead(ChunkInputStream.java:259)
>   at 
> org.apache.hadoop.hdds.scm.storage.ChunkInputStream.read(ChunkInputStream.java:144)
>   at 
> org.apache.hadoop.hdds.scm.storage.BlockInputStream.read(BlockInputStream.java:239)
>   at 
> org.apache.hadoop.ozone.client.io.KeyInputStream.read(KeyInputStream.java:171)
>   at 
> org.apache.hadoop.fs.ozone.OzoneFSInputStream.read(OzoneFSInputStream.java:52)
>   at java.io.DataInputStream.read(DataInputStream.java:100)
>   at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:86)
>   at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:60)
>   at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:120)
>   at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:366)
>   at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:267)
>   at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63)
>   at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:361)
>   at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1754)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:359)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.ozone.common.OzoneChecksumException: Checksum 
> mismatch at index 0
>   at 
> org.apache.hadoop.ozone.common.ChecksumData.verifyChecksumDataMatches(ChecksumData.java:148)
>   at 
> org.apache.hadoop.ozone.common.Checksum.verifyChecksum(Checksum.java:275)
>   at 
> org.apache.hadoop.ozone.common.Checksum.verifyChecksum(Checksum.java:238)
>   at 
> org.apache.hadoop.hdds.scm.storage.ChunkInputStream.lambda$new$0(ChunkInputStream.java:375)
>   at 
> org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithRetry(XceiverClientGrpc.java:287)
>   at 
> org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithTraceIDAndRetry(XceiverClientGrpc.java:250)
>   at 
> org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommand(XceiverClientGrpc.java:233)
>   at 
> org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.readChunk(ContainerProtocolCalls.java:245)
>   at 
> org.apache.hadoop.hdds.scm.storage.ChunkInputStream.readChunk

[jira] [Work logged] (HDDS-2378) Change "OZONE" as string used in the code where OzoneConsts.OZONE is suitable

2019-10-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2378?focusedWorklogId=336219&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-336219
 ]

ASF GitHub Bot logged work on HDDS-2378:


Author: ASF GitHub Bot
Created on: 30/Oct/19 14:58
Start Date: 30/Oct/19 14:58
Worklog Time Spent: 10m 
  Work Description: dineshchitlangia commented on pull request #103: 
HDDS-2378 - Change OZONE as string used in the code where OzoneConsts.OZONE is 
suitable
URL: https://github.com/apache/hadoop-ozone/pull/103
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 336219)
Time Spent: 20m  (was: 10m)

> Change "OZONE" as string used in the code where OzoneConsts.OZONE is suitable
> -
>
> Key: HDDS-2378
> URL: https://issues.apache.org/jira/browse/HDDS-2378
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 0.4.1
>Reporter: Istvan Fajth
>Assignee: Istvan Fajth
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Based on a review I have done a quick check, and there are quite a few places 
> where we have hardcoded "ozone" as String literal or a capital version of it 
> into the code.
> Let's check then one by one, and where it is possible replace it with 
> OzoneConsts.OZONE, or if the lower case version is not acceptable at all 
> places, then create an other constant with the uppercase version and use that.
> This is the search, and the results:
> {code:bash}
> find . -name *.java | while read FILE; do NUM=`grep -c -i "\"OZONE\"" $FILE`; 
> if [ $NUM -gt 0 ]; then echo $FILE; fi; done | sort | uniq
> ./hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/utils/RocksDBStore.java
> ./hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/utils/db/RDBStore.java
> ./hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/OzoneConsts.java
> ./hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/common/impl/TestContainerDataYaml.java
> ./hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/keyvalue/TestBlockManagerImpl.java
> ./hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/keyvalue/TestKeyValueContainer.java
> ./hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/server/ServerUtils.java
> ./hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/metrics/SCMContainerManagerMetrics.java
> ./hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/SCMContainerPlacementMetrics.java
> ./hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/node/SCMNodeMetrics.java
> ./hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/pipeline/SCMPipelineMetrics.java
> ./hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/SCMContainerMetrics.java
> ./hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/StorageContainerManager.java
> ./hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/block/TestBlockManager.java
> ./hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/container/TestCloseContainerEventHandler.java
> ./hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/container/TestSCMContainerManager.java
> ./hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/node/TestContainerPlacement.java
> ./hadoop-hdds/tools/src/main/java/org/apache/hadoop/hdds/scm/cli/container/CreateSubcommand.java
> ./hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/util/OzoneVersionInfo.java
> ./hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/hdds/scm/container/TestContainerStateManagerIntegration.java
> ./hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/hdds/scm/container/metrics/TestSCMContainerManagerMetrics.java
> ./hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/TestContainerOperations.java
> ./hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/TestContainerStateMachineIdempotency.java
> ./hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/TestStorageContainerManager.java
> ./hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/client/rpc/Test2WayCommitInRatis.java
> ./hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/client/rpc/TestCommitWatch

[jira] [Resolved] (HDDS-2378) Change "OZONE" as string used in the code where OzoneConsts.OZONE is suitable

2019-10-30 Thread Dinesh Chitlangia (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Chitlangia resolved HDDS-2378.
-
Fix Version/s: 0.5.0
   Resolution: Fixed

[~pifta] Thank you for the contribution. This has been merged with master.

> Change "OZONE" as string used in the code where OzoneConsts.OZONE is suitable
> -
>
> Key: HDDS-2378
> URL: https://issues.apache.org/jira/browse/HDDS-2378
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 0.4.1
>Reporter: Istvan Fajth
>Assignee: Istvan Fajth
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Based on a review I have done a quick check, and there are quite a few places 
> where we have hardcoded "ozone" as String literal or a capital version of it 
> into the code.
> Let's check then one by one, and where it is possible replace it with 
> OzoneConsts.OZONE, or if the lower case version is not acceptable at all 
> places, then create an other constant with the uppercase version and use that.
> This is the search, and the results:
> {code:bash}
> find . -name *.java | while read FILE; do NUM=`grep -c -i "\"OZONE\"" $FILE`; 
> if [ $NUM -gt 0 ]; then echo $FILE; fi; done | sort | uniq
> ./hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/utils/RocksDBStore.java
> ./hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/utils/db/RDBStore.java
> ./hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/OzoneConsts.java
> ./hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/common/impl/TestContainerDataYaml.java
> ./hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/keyvalue/TestBlockManagerImpl.java
> ./hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/keyvalue/TestKeyValueContainer.java
> ./hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/server/ServerUtils.java
> ./hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/metrics/SCMContainerManagerMetrics.java
> ./hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/placement/algorithms/SCMContainerPlacementMetrics.java
> ./hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/node/SCMNodeMetrics.java
> ./hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/pipeline/SCMPipelineMetrics.java
> ./hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/SCMContainerMetrics.java
> ./hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/StorageContainerManager.java
> ./hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/block/TestBlockManager.java
> ./hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/container/TestCloseContainerEventHandler.java
> ./hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/container/TestSCMContainerManager.java
> ./hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/node/TestContainerPlacement.java
> ./hadoop-hdds/tools/src/main/java/org/apache/hadoop/hdds/scm/cli/container/CreateSubcommand.java
> ./hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/util/OzoneVersionInfo.java
> ./hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/hdds/scm/container/TestContainerStateManagerIntegration.java
> ./hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/hdds/scm/container/metrics/TestSCMContainerManagerMetrics.java
> ./hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/TestContainerOperations.java
> ./hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/TestContainerStateMachineIdempotency.java
> ./hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/TestStorageContainerManager.java
> ./hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/client/rpc/Test2WayCommitInRatis.java
> ./hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/client/rpc/TestCommitWatcher.java
> ./hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/client/rpc/TestOzoneRpcClientAbstract.java
> ./hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/client/rpc/TestWatchForCommit.java
> ./hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/ozShell/TestS3Shell.java
> ./hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/scm/TestAllocateContainer.java
> ./hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/scm/TestContainerSmallFile.java
> ./hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/scm/TestGetCommittedBlockLengthAndPutKey.java
> ./hadoop-ozone/integration-test/src/test

[jira] [Commented] (HDFS-14824) [Dynamometer] Dynamometer in org.apache.hadoop.tools does not output the benchmark results.

2019-10-30 Thread Erik Krogen (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16963152#comment-16963152
 ] 

Erik Krogen commented on HDFS-14824:


Hi [~tasanuma], the changes LGTM, but I think we should also pull in the 
follow-ons in [PR 92|https://github.com/linkedin/dynamometer/pull/92] and [PR 
96|https://github.com/linkedin/dynamometer/pull/96] which make this feature 
much more usable.

> [Dynamometer] Dynamometer in org.apache.hadoop.tools does not output the 
> benchmark results.
> ---
>
> Key: HDFS-14824
> URL: https://issues.apache.org/jira/browse/HDFS-14824
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Soya Miyoshi
>Assignee: Takanobu Asanuma
>Priority: Major
>
> According to the latest 
> [document|https://aajisaka.github.io/hadoop-document/hadoop-project/hadoop-dynamometer/Dynamometer.html
>  ], the benchmark results will be written in `Dauditreplay.output-path`. 
> However, current org.apache.hadooop.tools hasn't merged [this pull 
> request|https://github.com/linkedin/dynamometer/pull/76 ], so it does not 
> output the benchmark results.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-2366) Remove ozone.enabled flag

2019-10-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2366?focusedWorklogId=336265&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-336265
 ]

ASF GitHub Bot logged work on HDDS-2366:


Author: ASF GitHub Bot
Created on: 30/Oct/19 16:34
Start Date: 30/Oct/19 16:34
Worklog Time Spent: 10m 
  Work Description: anuengineer commented on pull request #90: HDDS-2366. 
Remove ozone.enabled as a flag and config item.
URL: https://github.com/apache/hadoop-ozone/pull/90
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 336265)
Time Spent: 20m  (was: 10m)

> Remove ozone.enabled flag
> -
>
> Key: HDDS-2366
> URL: https://issues.apache.org/jira/browse/HDDS-2366
> Project: Hadoop Distributed Data Store
>  Issue Type: Task
>Reporter: Bharat Viswanadham
>Assignee: Siddharth Wagle
>Priority: Major
>  Labels: newbie, pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Now when ozone is started the start-ozone.sh/stop-ozone.sh script check 
> whether this property is enabled or not to start ozone services. Now, this 
> property and this check can be removed.
>  
> This was needed when ozone is part of Hadoop, and we don't want to start 
> ozone services by default. Now there is no such requirement.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-2366) Remove ozone.enabled flag

2019-10-30 Thread Anu Engineer (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anu Engineer resolved HDDS-2366.

Fix Version/s: 0.5.0
   Resolution: Fixed

Committed to the master branch. [~swagle] Thank you for the contribution.

> Remove ozone.enabled flag
> -
>
> Key: HDDS-2366
> URL: https://issues.apache.org/jira/browse/HDDS-2366
> Project: Hadoop Distributed Data Store
>  Issue Type: Task
>Reporter: Bharat Viswanadham
>Assignee: Siddharth Wagle
>Priority: Major
>  Labels: newbie, pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Now when ozone is started the start-ozone.sh/stop-ozone.sh script check 
> whether this property is enabled or not to start ozone services. Now, this 
> property and this check can be removed.
>  
> This was needed when ozone is part of Hadoop, and we don't want to start 
> ozone services by default. Now there is no such requirement.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14778) BlockManager findAndMarkBlockAsCorrupt adds block to the map if the Storage state is failed

2019-10-30 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16963209#comment-16963209
 ] 

hemanthboyina commented on HDFS-14778:
--

thanks for your time [~weichiu] 
resolving  as cannot reproduce , as i am not able to reproduce

> BlockManager findAndMarkBlockAsCorrupt adds block to the map if the Storage 
> state is failed
> ---
>
> Key: HDFS-14778
> URL: https://issues.apache.org/jira/browse/HDFS-14778
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Priority: Major
> Attachments: HDFS-14778.001.patch, HDFS-14778.002.patch, 
> HDFS-14778.003.patch
>
>
> Should not mark the block as corrupt if the storage state is failed



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14284) RBF: Log Router identifier when reporting exceptions

2019-10-30 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16963211#comment-16963211
 ] 

hemanthboyina commented on HDFS-14284:
--

updated the patch with comments fixed , please review [~ayushtkn]

> RBF: Log Router identifier when reporting exceptions
> 
>
> Key: HDFS-14284
> URL: https://issues.apache.org/jira/browse/HDFS-14284
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Íñigo Goiri
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-14284.001.patch, HDFS-14284.002.patch, 
> HDFS-14284.003.patch, HDFS-14284.004.patch, HDFS-14284.005.patch, 
> HDFS-14284.006.patch, HDFS-14284.007.patch, HDFS-14284.008.patch
>
>
> The typical setup is to use multiple Routers through 
> ConfiguredFailoverProxyProvider.
> In a regular HA Namenode setup, it is easy to know which NN was used.
> However, in RBF, any Router can be the one reporting the exception and it is 
> hard to know which was the one.
> We should have a way to identify which Router/Namenode was the one triggering 
> the exception.
> This would also apply with Observer Namenodes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2366) Remove ozone.enabled flag

2019-10-30 Thread Anu Engineer (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16963204#comment-16963204
 ] 

Anu Engineer commented on HDDS-2366:


[~cxorm] I have updated the wiki page after committing this. Thank you for the 
reminder. We will scrub documenation and wiki for references to enabled.

> Remove ozone.enabled flag
> -
>
> Key: HDDS-2366
> URL: https://issues.apache.org/jira/browse/HDDS-2366
> Project: Hadoop Distributed Data Store
>  Issue Type: Task
>Reporter: Bharat Viswanadham
>Assignee: Siddharth Wagle
>Priority: Major
>  Labels: newbie, pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Now when ozone is started the start-ozone.sh/stop-ozone.sh script check 
> whether this property is enabled or not to start ozone services. Now, this 
> property and this check can be removed.
>  
> This was needed when ozone is part of Hadoop, and we don't want to start 
> ozone services by default. Now there is no such requirement.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-14803) Truncate return value was wrong

2019-10-30 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina reassigned HDFS-14803:


Assignee: (was: hemanthboyina)

> Truncate return value was wrong
> ---
>
> Key: HDFS-14803
> URL: https://issues.apache.org/jira/browse/HDFS-14803
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Priority: Major
> Attachments: HDFS-14803.patch
>
>
> Even the truncate block is updated as Under Construction  and set with new 
> time stamp , Truncates returns false as result 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14778) BlockManager findAndMarkBlockAsCorrupt adds block to the map if the Storage state is failed

2019-10-30 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-14778:
-
Resolution: Cannot Reproduce
Status: Resolved  (was: Patch Available)

> BlockManager findAndMarkBlockAsCorrupt adds block to the map if the Storage 
> state is failed
> ---
>
> Key: HDFS-14778
> URL: https://issues.apache.org/jira/browse/HDFS-14778
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Priority: Major
> Attachments: HDFS-14778.001.patch, HDFS-14778.002.patch, 
> HDFS-14778.003.patch
>
>
> Should not mark the block as corrupt if the storage state is failed



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-14778) BlockManager findAndMarkBlockAsCorrupt adds block to the map if the Storage state is failed

2019-10-30 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina reassigned HDFS-14778:


Assignee: (was: hemanthboyina)

> BlockManager findAndMarkBlockAsCorrupt adds block to the map if the Storage 
> state is failed
> ---
>
> Key: HDFS-14778
> URL: https://issues.apache.org/jira/browse/HDFS-14778
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Priority: Major
> Attachments: HDFS-14778.001.patch, HDFS-14778.002.patch, 
> HDFS-14778.003.patch
>
>
> Should not mark the block as corrupt if the storage state is failed



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14803) Truncate return value was wrong

2019-10-30 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-14803:
-
Resolution: Not A Problem
Status: Resolved  (was: Patch Available)

> Truncate return value was wrong
> ---
>
> Key: HDFS-14803
> URL: https://issues.apache.org/jira/browse/HDFS-14803
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Priority: Major
> Attachments: HDFS-14803.patch
>
>
> Even the truncate block is updated as Under Construction  and set with new 
> time stamp , Truncates returns false as result 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2383) Closing open container via SCMCli throws exception

2019-10-30 Thread Anu Engineer (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16963214#comment-16963214
 ] 

Anu Engineer commented on HDDS-2383:


Looks like the SCMCLI was trying to close a container that SCM is also trying 
to close. Would it be possible to close a good working container and see if you 
see this error again. If so, it is a bug. Thanks

> Closing open container via SCMCli throws exception
> --
>
> Key: HDDS-2383
> URL: https://issues.apache.org/jira/browse/HDDS-2383
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Reporter: Rajesh Balamohan
>Priority: Major
>
> This was observed in apache master branch.
> Closing the container via {{SCMCli}} throws the following exception, though 
> the container ends up getting closed eventually.
> {noformat}
> 2019-10-30 02:44:41,794 INFO 
> org.apache.hadoop.hdds.scm.block.SCMBlockDeletingService: Block deletion 
> txnID mismatch in datanode 79626ba3-1957-46e5-a8b0-32d7f47fb801 for 
> containerID 6. Datanode delete txnID: 0, SCM txnID: 1004
> 2019-10-30 02:44:41,810 INFO 
> org.apache.hadoop.hdds.scm.container.IncrementalContainerReportHandler: 
> Moving container #4 to CLOSED state, datanode 
> 8885d4ba-228a-4fd2-bf5a-831f01594c6c{ip: 10.17.234.37, host: 
> vd1327.halxg.cloudera.com, networkLocation: /default-rack, certSerialId: 
> null} reported CLOSED replica.
> 2019-10-30 02:44:41,826 INFO 
> org.apache.hadoop.hdds.scm.server.SCMClientProtocolServer: Object type 
> container id 4 op close new stage complete
> 2019-10-30 02:44:41,826 ERROR 
> org.apache.hadoop.hdds.scm.container.ContainerStateManager: Failed to update 
> container state #4, reason: invalid state transition from state: CLOSED upon 
> event: CLOSE.
> 2019-10-30 02:44:41,826 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 6 on 9860, call Call#3 Retry#0 
> org.apache.hadoop.hdds.scm.protocol.StorageContainerLocationProtocol.submitRequest
>  from 10.17.234.32:45926
> org.apache.hadoop.hdds.scm.exceptions.SCMException: Failed to update 
> container state #4, reason: invalid state transition from state: CLOSED upon 
> event: CLOSE.
> at 
> org.apache.hadoop.hdds.scm.container.ContainerStateManager.updateContainerState(ContainerStateManager.java:338)
> at 
> org.apache.hadoop.hdds.scm.container.SCMContainerManager.updateContainerState(SCMContainerManager.java:326)
> at 
> org.apache.hadoop.hdds.scm.server.SCMClientProtocolServer.notifyObjectStageChange(SCMClientProtocolServer.java:388)
> at 
> org.apache.hadoop.hdds.scm.protocol.StorageContainerLocationProtocolServerSideTranslatorPB.notifyObjectStageChange(StorageContainerLocationProtocolServerSideTranslatorPB.java:303)
> at 
> org.apache.hadoop.hdds.scm.protocol.StorageContainerLocationProtocolServerSideTranslatorPB.processRequest(StorageContainerLocationProtocolServerSideTranslatorPB.java:158)
> at 
> org.apache.hadoop.hdds.scm.protocol.StorageContainerLocationProtocolServerSideTranslatorPB$$Lambda$152/2036820231.apply(Unknown
>  Source)
> at 
> org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:72)
> at 
> org.apache.hadoop.hdds.scm.protocol.StorageContainerLocationProtocolServerSideTranslatorPB.submitRequest(StorageContainerLocationProtocolServerSideTranslatorPB.java:112)
> at 
> org.apache.hadoop.hdds.protocol.proto.StorageContainerLocationProtocolProtos$StorageContainerLocationProtocolService$2.callBlockingMethod(StorageContainerLocationProtocolProtos.java:30454)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11639) [PROVIDED Phase 2] Encode the BlockAlias in the client protocol

2019-10-30 Thread Virajith Jalaparti (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-11639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16963227#comment-16963227
 ] 

Virajith Jalaparti commented on HDFS-11639:
---

[~ehiggs] – it looks like this client protocol change is no longer needed if 
this work is going to use the final SPS implementation. Thoughts?

> [PROVIDED Phase 2] Encode the BlockAlias in the client protocol
> ---
>
> Key: HDFS-11639
> URL: https://issues.apache.org/jira/browse/HDFS-11639
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs
>Reporter: Ewan Higgs
>Assignee: Ewan Higgs
>Priority: Major
> Attachments: HDFS-11639-HDFS-9806.001.patch, 
> HDFS-11639-HDFS-9806.002.patch, HDFS-11639-HDFS-9806.003.patch, 
> HDFS-11639-HDFS-9806.004.patch, HDFS-11639-HDFS-9806.005.patch
>
>
> As part of the {{PROVIDED}} storage type, we have a {{BlockAlias}} type which 
> encodes information about where the data comes from. i.e. URI, offset, 
> length, and nonce value. This data should be encoded in the protocol 
> ({{LocatedBlockProto}} and the {{BlockTokenIdentifier}}) when a block is 
> available using the PROVIDED storage type.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14824) [Dynamometer] Dynamometer in org.apache.hadoop.tools does not output the benchmark results.

2019-10-30 Thread Takanobu Asanuma (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16963243#comment-16963243
 ] 

Takanobu Asanuma commented on HDFS-14824:
-

[~xkrogen] Thanks for your review and letting me know the PRs! I'll also merge 
them in this jira.

> [Dynamometer] Dynamometer in org.apache.hadoop.tools does not output the 
> benchmark results.
> ---
>
> Key: HDFS-14824
> URL: https://issues.apache.org/jira/browse/HDFS-14824
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Soya Miyoshi
>Assignee: Takanobu Asanuma
>Priority: Major
>
> According to the latest 
> [document|https://aajisaka.github.io/hadoop-document/hadoop-project/hadoop-dynamometer/Dynamometer.html
>  ], the benchmark results will be written in `Dauditreplay.output-path`. 
> However, current org.apache.hadooop.tools hasn't merged [this pull 
> request|https://github.com/linkedin/dynamometer/pull/76 ], so it does not 
> output the benchmark results.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2301) Write path: Reduce read contention in rocksDB

2019-10-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-2301:
-
Labels: performance pull-request-available  (was: performance)

> Write path: Reduce read contention in rocksDB
> -
>
> Key: HDDS-2301
> URL: https://issues.apache.org/jira/browse/HDDS-2301
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Affects Versions: 0.5.0
>Reporter: Rajesh Balamohan
>Assignee: Supratim Deka
>Priority: Major
>  Labels: performance, pull-request-available
> Attachments: om_write_profile.png
>
>
> Benchmark: 
>  
>  Simple benchmark which creates 100 and 1000s of keys (empty directory) in 
> OM. This is done in a tight loop and multiple threads from client side to add 
> enough load on CPU. Note that intention is to understand the bottlenecks in 
> OM (intentionally avoiding interactions with SCM & DN).
> Observation:
>  -
>  During write path, Ozone checks {{OMFileRequest.verifyFilesInPath}}. This 
> internally calls {{omMetadataManager.getKeyTable().get(dbKeyName)}} for every 
> write operation. This turns out to be expensive and chokes the write path.
> [https://github.com/apache/hadoop/blob/trunk/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/request/file/OMDirectoryCreateRequest.java#L155]
> [https://github.com/apache/hadoop/blob/trunk/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/request/file/OMFileRequest.java#L63]
> In most of the cases, directory creation would be fresh entry. In such cases, 
> it would be good to try with {{RocksDB::keyMayExist.}}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-2301) Write path: Reduce read contention in rocksDB

2019-10-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2301?focusedWorklogId=336309&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-336309
 ]

ASF GitHub Bot logged work on HDDS-2301:


Author: ASF GitHub Bot
Created on: 30/Oct/19 17:16
Start Date: 30/Oct/19 17:16
Worklog Time Spent: 10m 
  Work Description: supratimdeka commented on pull request #107: HDDS-2301. 
Write path: Reduce read contention in rocksDB.
URL: https://github.com/apache/hadoop-ozone/pull/107
 
 
   https://issues.apache.org/jira/browse/HDDS-2301
   
   This change introduces 'mkdir -p' behaviour in these 2 OM Requests:
   1. create directory and 
   2. create (file)
   
   The idea is to avoid iteration of the key table when creating new files with 
a path. Without the patch,  during file create - checking the existence of each 
parent directory requires a slow DB iterator. With the patch, this check 
becomes a point lookup which can be accelerated by the bloom filters in RocksDB.
   
   This patch is a work-in-progress and the following functionality is broken:
   1. interoperability with S3
   2. ACLs (setting correct ACLs for the parent directories created in the path)
   3. rename
   
   Soliciting early feedback.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 336309)
Remaining Estimate: 0h
Time Spent: 10m

> Write path: Reduce read contention in rocksDB
> -
>
> Key: HDDS-2301
> URL: https://issues.apache.org/jira/browse/HDDS-2301
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Affects Versions: 0.5.0
>Reporter: Rajesh Balamohan
>Assignee: Supratim Deka
>Priority: Major
>  Labels: performance, pull-request-available
> Attachments: om_write_profile.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Benchmark: 
>  
>  Simple benchmark which creates 100 and 1000s of keys (empty directory) in 
> OM. This is done in a tight loop and multiple threads from client side to add 
> enough load on CPU. Note that intention is to understand the bottlenecks in 
> OM (intentionally avoiding interactions with SCM & DN).
> Observation:
>  -
>  During write path, Ozone checks {{OMFileRequest.verifyFilesInPath}}. This 
> internally calls {{omMetadataManager.getKeyTable().get(dbKeyName)}} for every 
> write operation. This turns out to be expensive and chokes the write path.
> [https://github.com/apache/hadoop/blob/trunk/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/request/file/OMDirectoryCreateRequest.java#L155]
> [https://github.com/apache/hadoop/blob/trunk/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/request/file/OMFileRequest.java#L63]
> In most of the cases, directory creation would be fresh entry. In such cases, 
> it would be good to try with {{RocksDB::keyMayExist.}}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14891) RBF: namenode links in NameFederation Health page (federationhealth.html) cannot use https scheme

2019-10-30 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-14891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16963252#comment-16963252
 ] 

Íñigo Goiri commented on HDFS-14891:


The unit test looks good.
It takes around 6 seconds to run, not ideal but I think we cannot do much 
better than this.
Maybe fix the javadoc in TestRouterNamenodeWebScheme with a proper high level 
comment mentioning scheme in there.

> RBF: namenode links in NameFederation Health page (federationhealth.html)  
> cannot use https scheme
> --
>
> Key: HDFS-14891
> URL: https://issues.apache.org/jira/browse/HDFS-14891
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: rbf, ui
>Reporter: Xieming Li
>Assignee: Xieming Li
>Priority: Major
> Attachments: HDFS-14891.001.patch, HDFS-14891.002.patch, 
> HDFS-14891.003.patch, HDFS-14891.004.patch, HDFS-14891.005.patch, 
> HDFS-14891.006.patch, HDFS-14891.patch
>
>
> The scheme of links in federationhealth.html are hard coded as 'http'.
> It should be set to 'https' when dfs.http.policy is HTTPS_ONLY 
> (HTTP_AND_HTTPS also, maybe)
>  
> [https://github.com/apache/hadoop/blob/c99a12167ff9566012ef32104a3964887d62c899/hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/webapps/router/federationhealth.html#L168-L169]
> [https://github.com/apache/hadoop/blob/c99a12167ff9566012ef32104a3964887d62c899/hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/webapps/router/federationhealth.html#L236]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14940) HDFS Balancer : getBalancerBandwidth displaying wrong values for the maximum network bandwidth used by the datanode while network bandwidth set with values as 104857600

2019-10-30 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16963258#comment-16963258
 ] 

hemanthboyina commented on HDFS-14940:
--

{code:java}
  For example,
  "-1230k" will be converted to -1230 * 1024 = -1259520;
  "891g" will be converted to 891 * 1024^3 = 956703965184;

public static long string2long(String s) {
  final char lastchar = s.charAt(lastpos);
  
  prefix = TraditionalBinaryPrefix.valueOf(lastchar).value; {code}
[~SouryakantaDwivedy] i think this is the expected behaviour only

> HDFS Balancer : getBalancerBandwidth displaying wrong values for the maximum 
> network bandwidth used by the datanode while network bandwidth set with 
> values as 1048576000g/1048p/1e
> ---
>
> Key: HDFS-14940
> URL: https://issues.apache.org/jira/browse/HDFS-14940
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
>Affects Versions: 3.1.1
> Environment: 3 Node HA Setup
>Reporter: Souryakanta Dwivedy
>Priority: Minor
> Attachments: BalancerBW.PNG
>
>
> HDFS Balancer : getBalancerBandwidth displaying wrong values for the maximum 
> network bandwidth used by the datanode
>  while network bandwidth set with values as 1048576000g/1048p/1e
> Steps :-        
>  * Set balancer bandwith with command setBalancerBandwidth and vlaues as 
> [1048576000g/1048p/1e]
>  * - Check bandwidth used by the datanode during HDFS block balancing with 
> command :hdfs dfsadmin -getBalancerBandwidth "    check it will display some 
> different values not the same value as set



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-2379) OM terminates with RocksDB error while continuously writing keys.

2019-10-30 Thread Bharat Viswanadham (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharat Viswanadham resolved HDDS-2379.
--
Resolution: Fixed

> OM terminates with RocksDB error while continuously writing keys.
> -
>
> Key: HDDS-2379
> URL: https://issues.apache.org/jira/browse/HDDS-2379
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Reporter: Aravindan Vijayan
>Assignee: Bharat Viswanadham
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Exception trace after writing around 800,000 keys.
> {code}
> 2019-10-29 11:15:15,131 ERROR 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer: Terminating with 
> exit status 1: During flush to DB encountered err
> or in OMDoubleBuffer flush thread OMDoubleBufferFlushThread
> java.io.IOException: Unable to write the batch.
> at 
> org.apache.hadoop.hdds.utils.db.RDBBatchOperation.commit(RDBBatchOperation.java:48)
> at 
> org.apache.hadoop.hdds.utils.db.RDBStore.commitBatchOperation(RDBStore.java:240)
> at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:146)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.rocksdb.RocksDBException: unknown WriteBatch tag
> at org.rocksdb.RocksDB.write0(Native Method)
> at org.rocksdb.RocksDB.write(RocksDB.java:1421)
> at 
> org.apache.hadoop.hdds.utils.db.RDBBatchOperation.commit(RDBBatchOperation.java:46)
> ... 3 more
> {code}
> Assigning to [~bharat] since he has already started work on this. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-2355) Om double buffer flush termination with rocksdb error

2019-10-30 Thread Bharat Viswanadham (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharat Viswanadham resolved HDDS-2355.
--
Resolution: Fixed

> Om double buffer flush termination with rocksdb error
> -
>
> Key: HDDS-2355
> URL: https://issues.apache.org/jira/browse/HDDS-2355
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Bharat Viswanadham
>Assignee: Aravindan Vijayan
>Priority: Blocker
> Fix For: 0.5.0
>
>
> om_1    |java.io.IOException: Unable to write the batch.
> om_1    | at 
> [org.apache.hadoop.hdds.utils.db.RDBBatchOperation.commit(RDBBatchOperation.java:48|http://org.apache.hadoop.hdds.utils.db.rdbbatchoperation.commit%28rdbbatchoperation.java:48/])
> om_1    | at 
> [org.apache.hadoop.hdds.utils.db.RDBStore.commitBatchOperation(RDBStore.java:240|http://org.apache.hadoop.hdds.utils.db.rdbstore.commitbatchoperation%28rdbstore.java:240/])
> om_1    |at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:146)
> om_1    |at java.base/java.lang.Thread.run(Thread.java:834)
> om_1    |Caused by: org.rocksdb.RocksDBException: 
> WritePrepared/WriteUnprepared txn tag when write_after_commit_ is enabled (in 
> default WriteCommitted mode). If it is not due to corruption, the WAL must be 
> emptied before changing the WritePolicy.
> om_1    |at org.rocksdb.RocksDB.write0(Native Method)
> om_1    |at org.rocksdb.RocksDB.write(RocksDB.java:1421)
> om_1    | at 
> [org.apache.hadoop.hdds.utils.db.RDBBatchOperation.commit(RDBBatchOperation.java:46|http://org.apache.hadoop.hdds.utils.db.rdbbatchoperation.commit%28rdbbatchoperation.java:46/])
>  
> In few of my test run's i see this error and OM is terminated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2355) Om double buffer flush termination with rocksdb error

2019-10-30 Thread Bharat Viswanadham (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16963270#comment-16963270
 ] 

Bharat Viswanadham commented on HDDS-2355:
--

Thank You [~avijayan] for the confirmation.

I will close this out, if seen again, we can reopen this.

> Om double buffer flush termination with rocksdb error
> -
>
> Key: HDDS-2355
> URL: https://issues.apache.org/jira/browse/HDDS-2355
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Bharat Viswanadham
>Assignee: Aravindan Vijayan
>Priority: Blocker
> Fix For: 0.5.0
>
>
> om_1    |java.io.IOException: Unable to write the batch.
> om_1    | at 
> [org.apache.hadoop.hdds.utils.db.RDBBatchOperation.commit(RDBBatchOperation.java:48|http://org.apache.hadoop.hdds.utils.db.rdbbatchoperation.commit%28rdbbatchoperation.java:48/])
> om_1    | at 
> [org.apache.hadoop.hdds.utils.db.RDBStore.commitBatchOperation(RDBStore.java:240|http://org.apache.hadoop.hdds.utils.db.rdbstore.commitbatchoperation%28rdbstore.java:240/])
> om_1    |at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:146)
> om_1    |at java.base/java.lang.Thread.run(Thread.java:834)
> om_1    |Caused by: org.rocksdb.RocksDBException: 
> WritePrepared/WriteUnprepared txn tag when write_after_commit_ is enabled (in 
> default WriteCommitted mode). If it is not due to corruption, the WAL must be 
> emptied before changing the WritePolicy.
> om_1    |at org.rocksdb.RocksDB.write0(Native Method)
> om_1    |at org.rocksdb.RocksDB.write(RocksDB.java:1421)
> om_1    | at 
> [org.apache.hadoop.hdds.utils.db.RDBBatchOperation.commit(RDBBatchOperation.java:46|http://org.apache.hadoop.hdds.utils.db.rdbbatchoperation.commit%28rdbbatchoperation.java:46/])
>  
> In few of my test run's i see this error and OM is terminated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14922) On StartUp , Snapshot modification time got changed

2019-10-30 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-14922:
-
Attachment: HDFS-14922.002.patch

> On StartUp , Snapshot modification time got changed
> ---
>
> Key: HDFS-14922
> URL: https://issues.apache.org/jira/browse/HDFS-14922
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-14922.001.patch, HDFS-14922.002.patch
>
>
> Snapshot modification time got changed on namenode restart



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-2385) Ozone shell list volume command lists only user owned volumes and not all the volumes

2019-10-30 Thread Vivek Ratnavel Subramanian (Jira)
Vivek Ratnavel Subramanian created HDDS-2385:


 Summary: Ozone shell list volume command lists only user owned 
volumes and not all the volumes
 Key: HDDS-2385
 URL: https://issues.apache.org/jira/browse/HDDS-2385
 Project: Hadoop Distributed Data Store
  Issue Type: Task
  Components: Ozone CLI
Affects Versions: 0.4.1
Reporter: Vivek Ratnavel Subramanian


The command `ozone sh volume ls` lists only the volumes that are owned by the 
user.

 

Expected behavior: The command should list all the volumes in the system if the 
user is an ozone administrator. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13507) RBF: Remove update functionality from routeradmin's add cmd

2019-10-30 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16963286#comment-16963286
 ] 

hemanthboyina commented on HDFS-13507:
--

thanks [~ayushtkn] for comment
{quote} we should allow different targets.
{quote}
if we give multiple namespaces and multiple dest , how can we map a namespace 
with dest ?  should follow ordering ?

> RBF: Remove update functionality from routeradmin's add cmd
> ---
>
> Key: HDFS-13507
> URL: https://issues.apache.org/jira/browse/HDFS-13507
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Wei Yan
>Assignee: Gang Li
>Priority: Minor
>  Labels: incompatible
> Attachments: HDFS-13507-HDFS-13891.003.patch, 
> HDFS-13507-HDFS-13891.004.patch, HDFS-13507.000.patch, HDFS-13507.001.patch, 
> HDFS-13507.002.patch, HDFS-13507.003.patch
>
>
> Follow up the discussion in HDFS-13326. We should remove the "update" 
> functionality from routeradmin's add cmd, to make it consistent with RPC 
> calls.
> Note that: this is an incompatible change.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2385) As admin, volume list command should list all volumes not just admin user owned volumes

2019-10-30 Thread Dinesh Chitlangia (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Chitlangia updated HDDS-2385:

Summary: As admin, volume list command should list all volumes not just 
admin user owned volumes  (was: Ozone shell list volume command lists only user 
owned volumes and not all the volumes)

> As admin, volume list command should list all volumes not just admin user 
> owned volumes
> ---
>
> Key: HDDS-2385
> URL: https://issues.apache.org/jira/browse/HDDS-2385
> Project: Hadoop Distributed Data Store
>  Issue Type: Task
>  Components: Ozone CLI
>Affects Versions: 0.4.1
>Reporter: Vivek Ratnavel Subramanian
>Priority: Major
>
> The command `ozone sh volume ls` lists only the volumes that are owned by the 
> user.
>  
> Expected behavior: The command should list all the volumes in the system if 
> the user is an ozone administrator. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14943) RBF: Add Namespace to the Overview page in the Web UI

2019-10-30 Thread Jira
Íñigo Goiri created HDFS-14943:
--

 Summary: RBF: Add Namespace to the Overview page in the Web UI
 Key: HDFS-14943
 URL: https://issues.apache.org/jira/browse/HDFS-14943
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Íñigo Goiri


The Namenode shows the Namespace field which can be used to access it through 
HDFS.
The Router should also report its namespace.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14943) RBF: Add Namespace to the Overview page in the Web UI

2019-10-30 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-14943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16963338#comment-16963338
 ] 

Íñigo Goiri commented on HDFS-14943:


The Namenode in {{dfshealth.js}} uses {{dfs.nameservice.id}} to populate HAInfo.
We should do something similat.

> RBF: Add Namespace to the Overview page in the Web UI
> -
>
> Key: HDFS-14943
> URL: https://issues.apache.org/jira/browse/HDFS-14943
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Íñigo Goiri
>Priority: Minor
>
> The Namenode shows the Namespace field which can be used to access it through 
> HDFS.
> The Router should also report its namespace.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14941) Potential editlog race condition can cause corrupted file

2019-10-30 Thread Chen Liang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16963371#comment-16963371
 ] 

Chen Liang commented on HDFS-14941:
---

Summarizing possible fixes at high level I though of, for the record:
1. making {{OP_SET_GENSTAMP_V2}} and {{OP_ADD_BLOCK}} a single edit, so it's 
guaranteed they tailed together. The issue with this is that, we can't add a 
new op, which is going to be incompatible. We probably have to, say, reuse 
{{OP_ADD_BLOCK}} to bump gen stamp AND add block. But compatibility may still 
be tricky. e.g. old ANN sends these two commands, while new SbN expects a 
single one.
2. swapping the order of {{OP_SET_GENSTAMP_V2}} and {{OP_ADD_BLOCK}} (but 
swapping the adding block logic, only changing the edit log order). This 
ensures that when gen stamp bumps, block belonging to this gen has been tailed 
already. The problem with this approach is that, then SbN could be tailing 
block from a future  genstamp, I'm not sure what's the implication of this.
3. instead of messing with edits, we may also change the guarding logic. What 
I'm thinking is that, if SbN keeps track of most recent tailed gen stamp, say 
X. AND the highest gen stamp of blocks in its own block map, say Y. Here Y <= 
X. Then, if a DN reported block has a gen stamp *between* Y and X, then 
possibly it is the scenario that the block still needs to be tailed. So SbN 
requeue this message to process later. 

It would be great if we can get more eyes on this, open to comments on the 
options (and more options!).

> Potential editlog race condition can cause corrupted file
> -
>
> Key: HDFS-14941
> URL: https://issues.apache.org/jira/browse/HDFS-14941
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
>
> Recently we encountered an issue that, after a failover, NameNode complains 
> corrupted file/missing blocks. The blocks did recover after full block 
> reports, so the blocks are not actually missing. After further investigation, 
> we believe this is what happened:
> First of all, on SbN, it is possible that it receives block reports before 
> corresponding edit tailing happened. In which case SbN postpones processing 
> the DN block report, handled by the guarding logic below:
> {code:java}
>   if (shouldPostponeBlocksFromFuture &&
>   namesystem.isGenStampInFuture(iblk)) {
> queueReportedBlock(storageInfo, iblk, reportedState,
> QUEUE_REASON_FUTURE_GENSTAMP);
> continue;
>   }
> {code}
> Basically if reported block has a future generation stamp, the DN report gets 
> requeued.
> However, in {{FSNamesystem#storeAllocatedBlock}}, we have the following code:
> {code:java}
>   // allocate new block, record block locations in INode.
>   newBlock = createNewBlock();
>   INodesInPath inodesInPath = INodesInPath.fromINode(pendingFile);
>   saveAllocatedBlock(src, inodesInPath, newBlock, targets);
>   persistNewBlock(src, pendingFile);
>   offset = pendingFile.computeFileSize();
> {code}
> The line
>  {{newBlock = createNewBlock();}}
>  Would log an edit entry {{OP_SET_GENSTAMP_V2}} to bump generation stamp on 
> Standby
>  while the following line
>  {{persistNewBlock(src, pendingFile);}}
>  would log another edit entry {{OP_ADD_BLOCK}} to actually add the block on 
> Standby.
> Then the race condition is that, imagine Standby has just processed 
> {{OP_SET_GENSTAMP_V2}}, but not yet {{OP_ADD_BLOCK}} (if they just happen to 
> be in different setment). Now a block report with new generation stamp comes 
> in.
> Since the genstamp bump has already been processed, the reported block may 
> not be considered as future block. So the guarding logic passes. But 
> actually, the block hasn't been added to blockmap, because the second edit is 
> yet to be tailed. So, the block then gets added to invalidate block list and 
> we saw messages like:
> {code:java}
> BLOCK* addBlock: block XXX on node XXX size XXX does not belong to any file
> {code}
> Even worse, since this IBR is effectively lost, the NameNode has no 
> information about this block, until the next full block report. So after a 
> failover, the NN marks it as corrupt.
> This issue won't happen though, if both of the edit entries get tailed all 
> together, so no IBR processing can happen in between. But in our case, we set 
> edit tailing interval to super low (to allow Standby read), so when under 
> high workload, there is a much much higher chance that the two entries are 
> tailed separately, causing the issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-

[jira] [Updated] (HDDS-1987) Fix listStatus API

2019-10-30 Thread Siyao Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siyao Meng updated HDDS-1987:
-
Description: 
This Jira is to fix listStatus API in HA code path.

In HA, we have an in-memory cache, where we put the result to in-memory cache 
and return the response. It will be picked by double buffer thread and flushed 
to disk later. So when user call listStatus, it should use both in-memory cache 
and rocksdb key table to return the correct result.

  was:
This Jira is to fix listStatus API in HA code path.

In HA, we have an in-memory cache, where we put the result to in-memory cache 
and return the response, later it will be picked by double buffer thread and it 
will flush to disk. So, now when do listStatus, it should use both in-memory 
cache and rocksdb key table to listStatus in a bucket.


> Fix listStatus API
> --
>
> Key: HDDS-1987
> URL: https://issues.apache.org/jira/browse/HDDS-1987
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Bharat Viswanadham
>Assignee: Siyao Meng
>Priority: Major
>
> This Jira is to fix listStatus API in HA code path.
> In HA, we have an in-memory cache, where we put the result to in-memory cache 
> and return the response. It will be picked by double buffer thread and 
> flushed to disk later. So when user call listStatus, it should use both 
> in-memory cache and rocksdb key table to return the correct result.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-1987) Fix listStatus API

2019-10-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-1987:
-
Labels: pull-request-available  (was: )

> Fix listStatus API
> --
>
> Key: HDDS-1987
> URL: https://issues.apache.org/jira/browse/HDDS-1987
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Bharat Viswanadham
>Assignee: Siyao Meng
>Priority: Major
>  Labels: pull-request-available
>
> This Jira is to fix listStatus API in HA code path.
> In HA, we have an in-memory cache, where we put the result to in-memory cache 
> and return the response. It will be picked by double buffer thread and 
> flushed to disk later. So when user call listStatus, it should use both 
> in-memory cache and rocksdb key table to return the correct result.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-1987) Fix listStatus API

2019-10-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1987?focusedWorklogId=336439&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-336439
 ]

ASF GitHub Bot logged work on HDDS-1987:


Author: ASF GitHub Bot
Created on: 30/Oct/19 20:07
Start Date: 30/Oct/19 20:07
Worklog Time Spent: 10m 
  Work Description: smengcl commented on pull request #108: HDDS-1987. Fix 
listStatus API
URL: https://github.com/apache/hadoop-ozone/pull/108
 
 
   ## What changes were proposed in this pull request?
   
   Fix listStatus API in HA code path.
   
   In HA, we have an in-memory cache, where we put the result to in-memory 
cache and return the response. It will be picked by double buffer thread and 
flushed to disk later. So when user call listStatus, it should use both 
in-memory cache and rocksdb key table to return the correct result.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-1987
   
   ## How was this patch tested?
   
   1. `TestOzoneFileSystem#testListStatus` passed (for correctness under no key 
cache table usage).
   2. New unit test will be posted soon.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 336439)
Remaining Estimate: 0h
Time Spent: 10m

> Fix listStatus API
> --
>
> Key: HDDS-1987
> URL: https://issues.apache.org/jira/browse/HDDS-1987
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>Reporter: Bharat Viswanadham
>Assignee: Siyao Meng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This Jira is to fix listStatus API in HA code path.
> In HA, we have an in-memory cache, where we put the result to in-memory cache 
> and return the response. It will be picked by double buffer thread and 
> flushed to disk later. So when user call listStatus, it should use both 
> in-memory cache and rocksdb key table to return the correct result.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14922) On StartUp , Snapshot modification time got changed

2019-10-30 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16963418#comment-16963418
 ] 

Hadoop QA commented on HDFS-14922:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
48s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
 8s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
7s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 37s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
11s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 47s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 1 new + 623 unchanged - 1 fixed = 624 total (was 624) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 33s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
9s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}104m 19s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
33s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}166m 25s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | HDFS-14922 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12984396/HDFS-14922.002.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 16d7f12143ca 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 
05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / e3e7daa |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28211/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28211/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28211/testReport/ |
| Max. process+thread count | 2769 (vs

[jira] [Commented] (HDFS-14941) Potential editlog race condition can cause corrupted file

2019-10-30 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16963432#comment-16963432
 ] 

Wei-Chiu Chuang commented on HDFS-14941:


Quick question -- is this a problem with HA, multiple SbNN or with Consistent 
Read from Standby?
Looks to me like an existing problem but only manifest itself with CRFS

> Potential editlog race condition can cause corrupted file
> -
>
> Key: HDFS-14941
> URL: https://issues.apache.org/jira/browse/HDFS-14941
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
>
> Recently we encountered an issue that, after a failover, NameNode complains 
> corrupted file/missing blocks. The blocks did recover after full block 
> reports, so the blocks are not actually missing. After further investigation, 
> we believe this is what happened:
> First of all, on SbN, it is possible that it receives block reports before 
> corresponding edit tailing happened. In which case SbN postpones processing 
> the DN block report, handled by the guarding logic below:
> {code:java}
>   if (shouldPostponeBlocksFromFuture &&
>   namesystem.isGenStampInFuture(iblk)) {
> queueReportedBlock(storageInfo, iblk, reportedState,
> QUEUE_REASON_FUTURE_GENSTAMP);
> continue;
>   }
> {code}
> Basically if reported block has a future generation stamp, the DN report gets 
> requeued.
> However, in {{FSNamesystem#storeAllocatedBlock}}, we have the following code:
> {code:java}
>   // allocate new block, record block locations in INode.
>   newBlock = createNewBlock();
>   INodesInPath inodesInPath = INodesInPath.fromINode(pendingFile);
>   saveAllocatedBlock(src, inodesInPath, newBlock, targets);
>   persistNewBlock(src, pendingFile);
>   offset = pendingFile.computeFileSize();
> {code}
> The line
>  {{newBlock = createNewBlock();}}
>  Would log an edit entry {{OP_SET_GENSTAMP_V2}} to bump generation stamp on 
> Standby
>  while the following line
>  {{persistNewBlock(src, pendingFile);}}
>  would log another edit entry {{OP_ADD_BLOCK}} to actually add the block on 
> Standby.
> Then the race condition is that, imagine Standby has just processed 
> {{OP_SET_GENSTAMP_V2}}, but not yet {{OP_ADD_BLOCK}} (if they just happen to 
> be in different setment). Now a block report with new generation stamp comes 
> in.
> Since the genstamp bump has already been processed, the reported block may 
> not be considered as future block. So the guarding logic passes. But 
> actually, the block hasn't been added to blockmap, because the second edit is 
> yet to be tailed. So, the block then gets added to invalidate block list and 
> we saw messages like:
> {code:java}
> BLOCK* addBlock: block XXX on node XXX size XXX does not belong to any file
> {code}
> Even worse, since this IBR is effectively lost, the NameNode has no 
> information about this block, until the next full block report. So after a 
> failover, the NN marks it as corrupt.
> This issue won't happen though, if both of the edit entries get tailed all 
> together, so no IBR processing can happen in between. But in our case, we set 
> edit tailing interval to super low (to allow Standby read), so when under 
> high workload, there is a much much higher chance that the two entries are 
> tailed separately, causing the issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-2386) Implement incremental ChunkBuffer

2019-10-30 Thread Tsz-wo Sze (Jira)
Tsz-wo Sze created HDDS-2386:


 Summary: Implement incremental ChunkBuffer
 Key: HDDS-2386
 URL: https://issues.apache.org/jira/browse/HDDS-2386
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
  Components: Ozone Client
Reporter: Tsz-wo Sze
Assignee: Tsz-wo Sze


HDDS-2375 introduces a ChunkBuffer for flexible buffering. In this JIRA, we 
implement ChunkBuffer with an incremental buffering so that the memory spaces 
are allocated incrementally.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2331) Client OOME due to buffer retention

2019-10-30 Thread Tsz-wo Sze (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16963458#comment-16963458
 ] 

Tsz-wo Sze commented on HDDS-2331:
--

Filed HDDS-2386 for incremental buffering.

> Shall we resolve it, ...

+1, let's resolve this.


> Client OOME due to buffer retention
> ---
>
> Key: HDDS-2331
> URL: https://issues.apache.org/jira/browse/HDDS-2331
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Affects Versions: 0.5.0
>Reporter: Attila Doroszlai
>Assignee: Shashikant Banerjee
>Priority: Critical
> Attachments: profiler.png
>
>
> Freon random key generator exhausts default heap after just few hundred 1MB 
> keys.  Heap dump on OOME reveals 150+ instances of 
> {{ContainerCommandRequestMessage}}, each with 16MB {{byte[]}}.
> Steps to reproduce:
> # Start Ozone cluster with 1 datanode
> # Start Freon (5K keys of size 1MB)
> Result: OOME after a few hundred keys
> {noformat}
> $ cd hadoop-ozone/dist/target/ozone-0.5.0-SNAPSHOT/compose/ozone
> $ docker-compose up -d
> $ docker-compose exec scm bash
> $ export HADOOP_OPTS='-XX:+HeapDumpOnOutOfMemoryError'
> $ ozone freon rk --numOfThreads 1 --numOfVolumes 1 --numOfBuckets 1 
> --replicationType RATIS --factor ONE --keySize 1048576 --numOfKeys 5120 
> --bufferSize 65536
> ...
> java.lang.OutOfMemoryError: Java heap space
> Dumping heap to java_pid289.hprof ...
> Heap dump file created [1456141975 bytes in 7.760 secs]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2386) Implement incremental ChunkBuffer

2019-10-30 Thread Tsz-wo Sze (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16963460#comment-16963460
 ] 

Tsz-wo Sze commented on HDDS-2386:
--

o2386_20191030.patch: requires HDDS-2375.  Also, need to fix test failures.

> Implement incremental ChunkBuffer
> -
>
> Key: HDDS-2386
> URL: https://issues.apache.org/jira/browse/HDDS-2386
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Client
>Reporter: Tsz-wo Sze
>Assignee: Tsz-wo Sze
>Priority: Major
> Attachments: o2386_20191030.patch
>
>
> HDDS-2375 introduces a ChunkBuffer for flexible buffering. In this JIRA, we 
> implement ChunkBuffer with an incremental buffering so that the memory spaces 
> are allocated incrementally.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2386) Implement incremental ChunkBuffer

2019-10-30 Thread Tsz-wo Sze (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz-wo Sze updated HDDS-2386:
-
Attachment: o2386_20191030.patch

> Implement incremental ChunkBuffer
> -
>
> Key: HDDS-2386
> URL: https://issues.apache.org/jira/browse/HDDS-2386
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Client
>Reporter: Tsz-wo Sze
>Assignee: Tsz-wo Sze
>Priority: Major
> Attachments: o2386_20191030.patch
>
>
> HDDS-2375 introduces a ChunkBuffer for flexible buffering. In this JIRA, we 
> implement ChunkBuffer with an incremental buffering so that the memory spaces 
> are allocated incrementally.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14941) Potential editlog race condition can cause corrupted file

2019-10-30 Thread Chen Liang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16963468#comment-16963468
 ] 

Chen Liang commented on HDFS-14941:
---

[~weichiu], assuming the current understanding is correct, yes, this is a 
general problem that can happen in HA. 

It will and will only manifest often when both the two conditions are met:
1. system under high load, so there are a lot of addBlock calls
2. {{dfs.ha.tail-edits.period}} is set to some very low value, which greatly 
increases the chance that the two edits mentioned are tailed in different 
segments.
These two conditions are not specific to CRS, but CRS does require 
{{dfs.ha.tail-edits.period}} to be set low. 

> Potential editlog race condition can cause corrupted file
> -
>
> Key: HDFS-14941
> URL: https://issues.apache.org/jira/browse/HDFS-14941
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
>
> Recently we encountered an issue that, after a failover, NameNode complains 
> corrupted file/missing blocks. The blocks did recover after full block 
> reports, so the blocks are not actually missing. After further investigation, 
> we believe this is what happened:
> First of all, on SbN, it is possible that it receives block reports before 
> corresponding edit tailing happened. In which case SbN postpones processing 
> the DN block report, handled by the guarding logic below:
> {code:java}
>   if (shouldPostponeBlocksFromFuture &&
>   namesystem.isGenStampInFuture(iblk)) {
> queueReportedBlock(storageInfo, iblk, reportedState,
> QUEUE_REASON_FUTURE_GENSTAMP);
> continue;
>   }
> {code}
> Basically if reported block has a future generation stamp, the DN report gets 
> requeued.
> However, in {{FSNamesystem#storeAllocatedBlock}}, we have the following code:
> {code:java}
>   // allocate new block, record block locations in INode.
>   newBlock = createNewBlock();
>   INodesInPath inodesInPath = INodesInPath.fromINode(pendingFile);
>   saveAllocatedBlock(src, inodesInPath, newBlock, targets);
>   persistNewBlock(src, pendingFile);
>   offset = pendingFile.computeFileSize();
> {code}
> The line
>  {{newBlock = createNewBlock();}}
>  Would log an edit entry {{OP_SET_GENSTAMP_V2}} to bump generation stamp on 
> Standby
>  while the following line
>  {{persistNewBlock(src, pendingFile);}}
>  would log another edit entry {{OP_ADD_BLOCK}} to actually add the block on 
> Standby.
> Then the race condition is that, imagine Standby has just processed 
> {{OP_SET_GENSTAMP_V2}}, but not yet {{OP_ADD_BLOCK}} (if they just happen to 
> be in different setment). Now a block report with new generation stamp comes 
> in.
> Since the genstamp bump has already been processed, the reported block may 
> not be considered as future block. So the guarding logic passes. But 
> actually, the block hasn't been added to blockmap, because the second edit is 
> yet to be tailed. So, the block then gets added to invalidate block list and 
> we saw messages like:
> {code:java}
> BLOCK* addBlock: block XXX on node XXX size XXX does not belong to any file
> {code}
> Even worse, since this IBR is effectively lost, the NameNode has no 
> information about this block, until the next full block report. So after a 
> failover, the NN marks it as corrupt.
> This issue won't happen though, if both of the edit entries get tailed all 
> together, so no IBR processing can happen in between. But in our case, we set 
> edit tailing interval to super low (to allow Standby read), so when under 
> high workload, there is a much much higher chance that the two entries are 
> tailed separately, causing the issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-2387) Build is broken - needs fixing

2019-10-30 Thread Chris Teoh (Jira)
Chris Teoh created HDDS-2387:


 Summary: Build is broken - needs fixing
 Key: HDDS-2387
 URL: https://issues.apache.org/jira/browse/HDDS-2387
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Chris Teoh
Assignee: Chris Teoh


Maven build is looking for a file that doesn't exist. When attempting to run:-

mvn clean package -e -X -DskipShade -DskipRecon -DskipTests

Following output is observed:-

{{[DEBUG] Executing command line: [bash, 
/Users/cteoh/GitHub/hadoop-ozone/hadoop-ozone/dist/dev-support/bin/dist-layout-stitching,
 /Users/cteoh/GitHub/hadoop-ozone/hadoop-ozone/dist/target, 0.5.0-SNAPSHOT]
cp: /Users/cteoh/GitHub/hadoop-ozone/README.txt: No such file or directory

Current directory /Users/cteoh/GitHub/hadoop-ozone/hadoop-ozone/dist/target

$ rm -rf ozone-0.5.0-SNAPSHOT
$ mkdir ozone-0.5.0-SNAPSHOT
$ cd ozone-0.5.0-SNAPSHOT
$ cp -p 
/Users/cteoh/GitHub/hadoop-ozone/hadoop-ozone/dist/src/main/license/bin/NOTICE.txt
 NOTICE.txt
$ cp -p 
/Users/cteoh/GitHub/hadoop-ozone/hadoop-ozone/dist/src/main/license/bin/LICENSE.txt
 LICENSE.txt
$ cp -pr 
/Users/cteoh/GitHub/hadoop-ozone/hadoop-ozone/dist/src/main/license/bin/licenses
 licenses
$ cp -p 
/Users/cteoh/GitHub/hadoop-ozone/hadoop-ozone/recon/src/main/resources/webapps/recon/ozone-recon-web/LICENSE
 licenses/LICENSE-ozone-recon.txt
$ cp -p /Users/cteoh/GitHub/hadoop-ozone/README.txt .

Failed!

[INFO] 
[INFO] Reactor Summary:
[INFO]
[INFO] Apache Hadoop Ozone Main ... SUCCESS [  0.343 s]
[INFO] Apache Hadoop HDDS . SUCCESS [  2.404 s]
[INFO] Apache Hadoop HDDS Config .. SUCCESS [  1.370 s]
[INFO] Apache Hadoop HDDS Common .. SUCCESS [  7.500 s]
[INFO] Apache Hadoop HDDS Client .. SUCCESS [  1.085 s]
[INFO] Apache Hadoop HDDS Server Framework  SUCCESS [  1.342 s]
[INFO] Apache Hadoop HDDS Container Service ... SUCCESS [  3.146 s]
[INFO] Apache Hadoop HDDS/Ozone Documentation . SUCCESS [  0.702 s]
[INFO] Apache Hadoop HDDS SCM Server .. SUCCESS [  2.338 s]
[INFO] Apache Hadoop HDDS Tools ... SUCCESS [  0.864 s]
[INFO] Apache Hadoop Ozone  SUCCESS [  0.228 s]
[INFO] Apache Hadoop Ozone Common . SUCCESS [  4.275 s]
[INFO] Apache Hadoop Ozone Client . SUCCESS [  0.979 s]
[INFO] Apache Hadoop Ozone Manager Server . SUCCESS [  2.568 s]
[INFO] Apache Hadoop Ozone S3 Gateway . SUCCESS [  2.278 s]
[INFO] Apache Hadoop Ozone CSI service  SUCCESS [  3.622 s]
[INFO] Apache Hadoop Ozone Integration Tests .. SUCCESS [  2.546 s]
[INFO] Apache Hadoop Ozone Tools .. SUCCESS [  2.052 s]
[INFO] Apache Hadoop Ozone Datanode ... SUCCESS [  0.872 s]
[INFO] Apache Hadoop Ozone In-Place Upgrade ... SUCCESS [  0.509 s]
[INFO] Apache Hadoop Ozone Insight Tool ... SUCCESS [  0.893 s]
[INFO] Apache Hadoop Ozone Distribution ... FAILURE [  1.300 s]
[INFO] Apache Hadoop Ozone Fault Injection Tests .. SKIPPED
[INFO] Apache Hadoop Ozone Network Tests .. SKIPPED
[INFO] 
[INFO] BUILD FAILURE
[INFO] 
[INFO] Total time: 43.771 s
[INFO] Finished at: 2019-10-30T10:48:41+11:00
[INFO] Final Memory: 200M/3066M
[INFO] 
[ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.3.1:exec 
(dist) on project hadoop-ozone-dist: Command execution failed. Process exited 
with an error: 1 (Exit value: 1) -> [Help 1]
org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal 
org.codehaus.mojo:exec-maven-plugin:1.3.1:exec (dist) on project 
hadoop-ozone-dist: Command execution failed.
at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:212)
at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
at 
org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:116)
at 
org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:80)
at 
org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51)
at 
org.apache.maven.lifecycle.internal.LifecycleStarter.execute(Lifecycl

[jira] [Commented] (HDDS-2387) Build is broken - needs fixing

2019-10-30 Thread Dinesh Chitlangia (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16963511#comment-16963511
 ] 

Dinesh Chitlangia commented on HDDS-2387:
-

[~chris.t...@gmail.com] This seems to be a duplicate of HDDS-2292. Resolving 
this as PR is already in place for that.

> Build is broken - needs fixing
> --
>
> Key: HDDS-2387
> URL: https://issues.apache.org/jira/browse/HDDS-2387
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Chris Teoh
>Assignee: Chris Teoh
>Priority: Major
>
> Maven build is looking for a file that doesn't exist. When attempting to run:-
> mvn clean package -e -X -DskipShade -DskipRecon -DskipTests
> Following output is observed:-
> {{[DEBUG] Executing command line: [bash, 
> /Users/cteoh/GitHub/hadoop-ozone/hadoop-ozone/dist/dev-support/bin/dist-layout-stitching,
>  /Users/cteoh/GitHub/hadoop-ozone/hadoop-ozone/dist/target, 0.5.0-SNAPSHOT]
> cp: /Users/cteoh/GitHub/hadoop-ozone/README.txt: No such file or directory
> Current directory /Users/cteoh/GitHub/hadoop-ozone/hadoop-ozone/dist/target
> $ rm -rf ozone-0.5.0-SNAPSHOT
> $ mkdir ozone-0.5.0-SNAPSHOT
> $ cd ozone-0.5.0-SNAPSHOT
> $ cp -p 
> /Users/cteoh/GitHub/hadoop-ozone/hadoop-ozone/dist/src/main/license/bin/NOTICE.txt
>  NOTICE.txt
> $ cp -p 
> /Users/cteoh/GitHub/hadoop-ozone/hadoop-ozone/dist/src/main/license/bin/LICENSE.txt
>  LICENSE.txt
> $ cp -pr 
> /Users/cteoh/GitHub/hadoop-ozone/hadoop-ozone/dist/src/main/license/bin/licenses
>  licenses
> $ cp -p 
> /Users/cteoh/GitHub/hadoop-ozone/hadoop-ozone/recon/src/main/resources/webapps/recon/ozone-recon-web/LICENSE
>  licenses/LICENSE-ozone-recon.txt
> $ cp -p /Users/cteoh/GitHub/hadoop-ozone/README.txt .
> Failed!
> [INFO] 
> 
> [INFO] Reactor Summary:
> [INFO]
> [INFO] Apache Hadoop Ozone Main ... SUCCESS [  0.343 
> s]
> [INFO] Apache Hadoop HDDS . SUCCESS [  2.404 
> s]
> [INFO] Apache Hadoop HDDS Config .. SUCCESS [  1.370 
> s]
> [INFO] Apache Hadoop HDDS Common .. SUCCESS [  7.500 
> s]
> [INFO] Apache Hadoop HDDS Client .. SUCCESS [  1.085 
> s]
> [INFO] Apache Hadoop HDDS Server Framework  SUCCESS [  1.342 
> s]
> [INFO] Apache Hadoop HDDS Container Service ... SUCCESS [  3.146 
> s]
> [INFO] Apache Hadoop HDDS/Ozone Documentation . SUCCESS [  0.702 
> s]
> [INFO] Apache Hadoop HDDS SCM Server .. SUCCESS [  2.338 
> s]
> [INFO] Apache Hadoop HDDS Tools ... SUCCESS [  0.864 
> s]
> [INFO] Apache Hadoop Ozone  SUCCESS [  0.228 
> s]
> [INFO] Apache Hadoop Ozone Common . SUCCESS [  4.275 
> s]
> [INFO] Apache Hadoop Ozone Client . SUCCESS [  0.979 
> s]
> [INFO] Apache Hadoop Ozone Manager Server . SUCCESS [  2.568 
> s]
> [INFO] Apache Hadoop Ozone S3 Gateway . SUCCESS [  2.278 
> s]
> [INFO] Apache Hadoop Ozone CSI service  SUCCESS [  3.622 
> s]
> [INFO] Apache Hadoop Ozone Integration Tests .. SUCCESS [  2.546 
> s]
> [INFO] Apache Hadoop Ozone Tools .. SUCCESS [  2.052 
> s]
> [INFO] Apache Hadoop Ozone Datanode ... SUCCESS [  0.872 
> s]
> [INFO] Apache Hadoop Ozone In-Place Upgrade ... SUCCESS [  0.509 
> s]
> [INFO] Apache Hadoop Ozone Insight Tool ... SUCCESS [  0.893 
> s]
> [INFO] Apache Hadoop Ozone Distribution ... FAILURE [  1.300 
> s]
> [INFO] Apache Hadoop Ozone Fault Injection Tests .. SKIPPED
> [INFO] Apache Hadoop Ozone Network Tests .. SKIPPED
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time: 43.771 s
> [INFO] Finished at: 2019-10-30T10:48:41+11:00
> [INFO] Final Memory: 200M/3066M
> [INFO] 
> 
> [ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.3.1:exec 
> (dist) on project hadoop-ozone-dist: Command execution failed. Process exited 
> with an error: 1 (Exit value: 1) -> [Help 1]
> org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute 
> goal org.codehaus.mojo:exec-maven-plugin:1.3.1:exec (dist) on project 
> hadoop-ozone-dist: Command execution failed.
>   at 
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:212)
>   at 
> org.apache.maven.lifecycle.internal.MojoExecu

[jira] [Resolved] (HDDS-2387) Build is broken - needs fixing

2019-10-30 Thread Dinesh Chitlangia (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Chitlangia resolved HDDS-2387.
-
Resolution: Duplicate

> Build is broken - needs fixing
> --
>
> Key: HDDS-2387
> URL: https://issues.apache.org/jira/browse/HDDS-2387
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Chris Teoh
>Assignee: Chris Teoh
>Priority: Major
>
> Maven build is looking for a file that doesn't exist. When attempting to run:-
> mvn clean package -e -X -DskipShade -DskipRecon -DskipTests
> Following output is observed:-
> {{[DEBUG] Executing command line: [bash, 
> /Users/cteoh/GitHub/hadoop-ozone/hadoop-ozone/dist/dev-support/bin/dist-layout-stitching,
>  /Users/cteoh/GitHub/hadoop-ozone/hadoop-ozone/dist/target, 0.5.0-SNAPSHOT]
> cp: /Users/cteoh/GitHub/hadoop-ozone/README.txt: No such file or directory
> Current directory /Users/cteoh/GitHub/hadoop-ozone/hadoop-ozone/dist/target
> $ rm -rf ozone-0.5.0-SNAPSHOT
> $ mkdir ozone-0.5.0-SNAPSHOT
> $ cd ozone-0.5.0-SNAPSHOT
> $ cp -p 
> /Users/cteoh/GitHub/hadoop-ozone/hadoop-ozone/dist/src/main/license/bin/NOTICE.txt
>  NOTICE.txt
> $ cp -p 
> /Users/cteoh/GitHub/hadoop-ozone/hadoop-ozone/dist/src/main/license/bin/LICENSE.txt
>  LICENSE.txt
> $ cp -pr 
> /Users/cteoh/GitHub/hadoop-ozone/hadoop-ozone/dist/src/main/license/bin/licenses
>  licenses
> $ cp -p 
> /Users/cteoh/GitHub/hadoop-ozone/hadoop-ozone/recon/src/main/resources/webapps/recon/ozone-recon-web/LICENSE
>  licenses/LICENSE-ozone-recon.txt
> $ cp -p /Users/cteoh/GitHub/hadoop-ozone/README.txt .
> Failed!
> [INFO] 
> 
> [INFO] Reactor Summary:
> [INFO]
> [INFO] Apache Hadoop Ozone Main ... SUCCESS [  0.343 
> s]
> [INFO] Apache Hadoop HDDS . SUCCESS [  2.404 
> s]
> [INFO] Apache Hadoop HDDS Config .. SUCCESS [  1.370 
> s]
> [INFO] Apache Hadoop HDDS Common .. SUCCESS [  7.500 
> s]
> [INFO] Apache Hadoop HDDS Client .. SUCCESS [  1.085 
> s]
> [INFO] Apache Hadoop HDDS Server Framework  SUCCESS [  1.342 
> s]
> [INFO] Apache Hadoop HDDS Container Service ... SUCCESS [  3.146 
> s]
> [INFO] Apache Hadoop HDDS/Ozone Documentation . SUCCESS [  0.702 
> s]
> [INFO] Apache Hadoop HDDS SCM Server .. SUCCESS [  2.338 
> s]
> [INFO] Apache Hadoop HDDS Tools ... SUCCESS [  0.864 
> s]
> [INFO] Apache Hadoop Ozone  SUCCESS [  0.228 
> s]
> [INFO] Apache Hadoop Ozone Common . SUCCESS [  4.275 
> s]
> [INFO] Apache Hadoop Ozone Client . SUCCESS [  0.979 
> s]
> [INFO] Apache Hadoop Ozone Manager Server . SUCCESS [  2.568 
> s]
> [INFO] Apache Hadoop Ozone S3 Gateway . SUCCESS [  2.278 
> s]
> [INFO] Apache Hadoop Ozone CSI service  SUCCESS [  3.622 
> s]
> [INFO] Apache Hadoop Ozone Integration Tests .. SUCCESS [  2.546 
> s]
> [INFO] Apache Hadoop Ozone Tools .. SUCCESS [  2.052 
> s]
> [INFO] Apache Hadoop Ozone Datanode ... SUCCESS [  0.872 
> s]
> [INFO] Apache Hadoop Ozone In-Place Upgrade ... SUCCESS [  0.509 
> s]
> [INFO] Apache Hadoop Ozone Insight Tool ... SUCCESS [  0.893 
> s]
> [INFO] Apache Hadoop Ozone Distribution ... FAILURE [  1.300 
> s]
> [INFO] Apache Hadoop Ozone Fault Injection Tests .. SKIPPED
> [INFO] Apache Hadoop Ozone Network Tests .. SKIPPED
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time: 43.771 s
> [INFO] Finished at: 2019-10-30T10:48:41+11:00
> [INFO] Final Memory: 200M/3066M
> [INFO] 
> 
> [ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.3.1:exec 
> (dist) on project hadoop-ozone-dist: Command execution failed. Process exited 
> with an error: 1 (Exit value: 1) -> [Help 1]
> org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute 
> goal org.codehaus.mojo:exec-maven-plugin:1.3.1:exec (dist) on project 
> hadoop-ozone-dist: Command execution failed.
>   at 
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:212)
>   at 
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
>   at 
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
>   at 
> org.ap

[jira] [Work logged] (HDDS-2292) Create Ozone specific README.md to the new hadoop-ozone repository

2019-10-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2292?focusedWorklogId=336528&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-336528
 ]

ASF GitHub Bot logged work on HDDS-2292:


Author: ASF GitHub Bot
Created on: 30/Oct/19 23:22
Start Date: 30/Oct/19 23:22
Worklog Time Spent: 10m 
  Work Description: dineshchitlangia commented on pull request #106: 
HDDS-2292. Create Ozone specific README.md to the new hadoop-ozone repository
URL: https://github.com/apache/hadoop-ozone/pull/106
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 336528)
Time Spent: 0.5h  (was: 20m)

> Create Ozone specific README.md to the new hadoop-ozone repository
> --
>
> Key: HDDS-2292
> URL: https://issues.apache.org/jira/browse/HDDS-2292
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Marton Elek
>Assignee: Marton Elek
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The current README is main Hadoop specific. We can create an ozone specific.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2292) Create Ozone specific README.md to the new hadoop-ozone repository

2019-10-30 Thread Dinesh Chitlangia (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Chitlangia updated HDDS-2292:

Fix Version/s: 0.5.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

[~elek] Thank you for contribution.

[~arp] Thank you for flagging the issue.

[~aengineer] Thank you for reviews.

 

Committed this to master.

> Create Ozone specific README.md to the new hadoop-ozone repository
> --
>
> Key: HDDS-2292
> URL: https://issues.apache.org/jira/browse/HDDS-2292
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Marton Elek
>Assignee: Marton Elek
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The current README is main Hadoop specific. We can create an ozone specific.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14941) Potential editlog race condition can cause corrupted file

2019-10-30 Thread Konstantin Shvachko (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16963518#comment-16963518
 ] 

Konstantin Shvachko commented on HDFS-14941:


Great find [~vagarychen].
 Yes this is _*existing problem in current code*_, but it is exacerbated by 
fast journal tailing since  small tail-edits.period increases the probability 
of the race condition. In fact we occasionally see missing blocks after failing 
over to SBN on our clusters with pre-CRS code.

Want to suggest another _*possible solution 4*_.
 4. Delay incrementing the global {{generationStamp}} on standby until it 
actually sees the block with that stamp. That is, when {{OP_SET_GENSTAMP_V2}} 
comes to SBN it records the new value in a new variable 
{{lastReportedGenStamp}}. When {{OP_ADD_BLOCK}} with the genStamp that equals 
{{lastReportedGenStamp}} comes the global {{generationStamp}} is set to 
{{lastReportedGenStamp}}. This should also solve the race condition.

We were looking at {{updateBlockForPipeline()}} and found out that it could be 
_*another source of missing blocks on SBN*_, because it only increments the 
global {{generationStamp}}, but does not update the generation stamp of that 
block. The new gen stamp will be eventually updated by subsequent {{OP_ADD}}, 
or {{OP_CLOSE}}, or {{OP_UPDATE_BLOCKS}}. But the race condition with IBRs will 
be still present. If an IBR comes after incrementing the global genStamp the 
replica will not be in the future, but since the block genStamp has not be yet 
updated by the subsequent {{OP_ADD}} or such, this replica will be invalidated 
on SBN.

> Potential editlog race condition can cause corrupted file
> -
>
> Key: HDFS-14941
> URL: https://issues.apache.org/jira/browse/HDFS-14941
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
>
> Recently we encountered an issue that, after a failover, NameNode complains 
> corrupted file/missing blocks. The blocks did recover after full block 
> reports, so the blocks are not actually missing. After further investigation, 
> we believe this is what happened:
> First of all, on SbN, it is possible that it receives block reports before 
> corresponding edit tailing happened. In which case SbN postpones processing 
> the DN block report, handled by the guarding logic below:
> {code:java}
>   if (shouldPostponeBlocksFromFuture &&
>   namesystem.isGenStampInFuture(iblk)) {
> queueReportedBlock(storageInfo, iblk, reportedState,
> QUEUE_REASON_FUTURE_GENSTAMP);
> continue;
>   }
> {code}
> Basically if reported block has a future generation stamp, the DN report gets 
> requeued.
> However, in {{FSNamesystem#storeAllocatedBlock}}, we have the following code:
> {code:java}
>   // allocate new block, record block locations in INode.
>   newBlock = createNewBlock();
>   INodesInPath inodesInPath = INodesInPath.fromINode(pendingFile);
>   saveAllocatedBlock(src, inodesInPath, newBlock, targets);
>   persistNewBlock(src, pendingFile);
>   offset = pendingFile.computeFileSize();
> {code}
> The line
>  {{newBlock = createNewBlock();}}
>  Would log an edit entry {{OP_SET_GENSTAMP_V2}} to bump generation stamp on 
> Standby
>  while the following line
>  {{persistNewBlock(src, pendingFile);}}
>  would log another edit entry {{OP_ADD_BLOCK}} to actually add the block on 
> Standby.
> Then the race condition is that, imagine Standby has just processed 
> {{OP_SET_GENSTAMP_V2}}, but not yet {{OP_ADD_BLOCK}} (if they just happen to 
> be in different setment). Now a block report with new generation stamp comes 
> in.
> Since the genstamp bump has already been processed, the reported block may 
> not be considered as future block. So the guarding logic passes. But 
> actually, the block hasn't been added to blockmap, because the second edit is 
> yet to be tailed. So, the block then gets added to invalidate block list and 
> we saw messages like:
> {code:java}
> BLOCK* addBlock: block XXX on node XXX size XXX does not belong to any file
> {code}
> Even worse, since this IBR is effectively lost, the NameNode has no 
> information about this block, until the next full block report. So after a 
> failover, the NN marks it as corrupt.
> This issue won't happen though, if both of the edit entries get tailed all 
> together, so no IBR processing can happen in between. But in our case, we set 
> edit tailing interval to super low (to allow Standby read), so when under 
> high workload, there is a much much higher chance that the two entries are 
> tailed separately, causing the issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-

[jira] [Work logged] (HDDS-2381) In ExcludeList, add if not exist only

2019-10-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2381?focusedWorklogId=336539&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-336539
 ]

ASF GitHub Bot logged work on HDDS-2381:


Author: ASF GitHub Bot
Created on: 30/Oct/19 23:44
Start Date: 30/Oct/19 23:44
Worklog Time Spent: 10m 
  Work Description: dineshchitlangia commented on pull request #104: 
HDDS-2381. In ExcludeList, add if not exist only.
URL: https://github.com/apache/hadoop-ozone/pull/104
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 336539)
Time Spent: 20m  (was: 10m)

> In ExcludeList, add if not exist only
> -
>
> Key: HDDS-2381
> URL: https://issues.apache.org/jira/browse/HDDS-2381
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Created based on comment from [~chinseone] in HDDS-2356
> https://issues.apache.org/jira/browse/HDDS-2356?focusedCommentId=16960796&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16960796
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14941) Potential editlog race condition can cause corrupted file

2019-10-30 Thread Konstantin Shvachko (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16963518#comment-16963518
 ] 

Konstantin Shvachko edited comment on HDFS-14941 at 10/30/19 11:44 PM:
---

Great find [~vagarychen].
 Yes this is _*existing problem in current code*_, but it is exacerbated by 
fast journal tailing since  small tail-edits.period increases the probability 
of the race condition. In fact we occasionally see missing blocks after failing 
over to SBN on our clusters with pre-CRS code.

Want to suggest another _*possible solution 4*_.
 4. Delay incrementing the global {{generationStamp}} on standby until it 
actually sees the block with that stamp. That is, when {{OP_SET_GENSTAMP_V2}} 
comes to SBN it records the new value in a new variable 
{{lastReportedGenStamp}}. When {{OP_ADD_BLOCK}} with the genStamp that equals 
{{lastReportedGenStamp}} comes the global {{generationStamp}} is set to 
{{lastReportedGenStamp}}. This should also solve the race condition.

We were looking at {{updateBlockForPipeline()}} and found out that it could be 
_*another source of missing blocks on SBN*_, because it only increments the 
global {{generationStamp}}, but does not update the generation stamp of that 
block. The new gen stamp will be eventually updated by subsequent {{OP_ADD}}, 
or {{OP_CLOSE}}, or {{OP_UPDATE_BLOCKS}}. But the race condition with IBRs will 
be still present. If an IBR comes after incrementing the global genStamp the 
replica will not be queued as the future one. Instead since the block genStamp 
has not been yet updated by the subsequent {{OP_ADD}} or such, this replica 
will be invalidated on SBN.


was (Author: shv):
Great find [~vagarychen].
 Yes this is _*existing problem in current code*_, but it is exacerbated by 
fast journal tailing since  small tail-edits.period increases the probability 
of the race condition. In fact we occasionally see missing blocks after failing 
over to SBN on our clusters with pre-CRS code.

Want to suggest another _*possible solution 4*_.
 4. Delay incrementing the global {{generationStamp}} on standby until it 
actually sees the block with that stamp. That is, when {{OP_SET_GENSTAMP_V2}} 
comes to SBN it records the new value in a new variable 
{{lastReportedGenStamp}}. When {{OP_ADD_BLOCK}} with the genStamp that equals 
{{lastReportedGenStamp}} comes the global {{generationStamp}} is set to 
{{lastReportedGenStamp}}. This should also solve the race condition.

We were looking at {{updateBlockForPipeline()}} and found out that it could be 
_*another source of missing blocks on SBN*_, because it only increments the 
global {{generationStamp}}, but does not update the generation stamp of that 
block. The new gen stamp will be eventually updated by subsequent {{OP_ADD}}, 
or {{OP_CLOSE}}, or {{OP_UPDATE_BLOCKS}}. But the race condition with IBRs will 
be still present. If an IBR comes after incrementing the global genStamp the 
replica will not be in the future, but since the block genStamp has not be yet 
updated by the subsequent {{OP_ADD}} or such, this replica will be invalidated 
on SBN.

> Potential editlog race condition can cause corrupted file
> -
>
> Key: HDFS-14941
> URL: https://issues.apache.org/jira/browse/HDFS-14941
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
>
> Recently we encountered an issue that, after a failover, NameNode complains 
> corrupted file/missing blocks. The blocks did recover after full block 
> reports, so the blocks are not actually missing. After further investigation, 
> we believe this is what happened:
> First of all, on SbN, it is possible that it receives block reports before 
> corresponding edit tailing happened. In which case SbN postpones processing 
> the DN block report, handled by the guarding logic below:
> {code:java}
>   if (shouldPostponeBlocksFromFuture &&
>   namesystem.isGenStampInFuture(iblk)) {
> queueReportedBlock(storageInfo, iblk, reportedState,
> QUEUE_REASON_FUTURE_GENSTAMP);
> continue;
>   }
> {code}
> Basically if reported block has a future generation stamp, the DN report gets 
> requeued.
> However, in {{FSNamesystem#storeAllocatedBlock}}, we have the following code:
> {code:java}
>   // allocate new block, record block locations in INode.
>   newBlock = createNewBlock();
>   INodesInPath inodesInPath = INodesInPath.fromINode(pendingFile);
>   saveAllocatedBlock(src, inodesInPath, newBlock, targets);
>   persistNewBlock(src, pendingFile);
>   offset = pendingFile.computeFileSize();
> {code}
> The line
>  {{newBlock = createNewBlock();}}
>  Would log an edit entry {{OP_

[jira] [Resolved] (HDDS-2381) In ExcludeList, add if not exist only

2019-10-30 Thread Dinesh Chitlangia (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Chitlangia resolved HDDS-2381.
-
   Fix Version/s: 0.5.0
Target Version/s: 0.5.0
  Resolution: Fixed

[~bharat] Thank you for the contribution.

This has been committed to master.

> In ExcludeList, add if not exist only
> -
>
> Key: HDDS-2381
> URL: https://issues.apache.org/jira/browse/HDDS-2381
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Created based on comment from [~chinseone] in HDDS-2356
> https://issues.apache.org/jira/browse/HDDS-2356?focusedCommentId=16960796&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16960796
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDDS-2381) In ExcludeList, add if not exist only

2019-10-30 Thread Dinesh Chitlangia (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16963520#comment-16963520
 ] 

Dinesh Chitlangia edited comment on HDDS-2381 at 10/30/19 11:46 PM:


[~bharat] Thank you for the contribution.

[~chinseone] Thank you for flagging this.

This has been committed to master.


was (Author: dineshchitlangia):
[~bharat] Thank you for the contribution.

This has been committed to master.

> In ExcludeList, add if not exist only
> -
>
> Key: HDDS-2381
> URL: https://issues.apache.org/jira/browse/HDDS-2381
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Created based on comment from [~chinseone] in HDDS-2356
> https://issues.apache.org/jira/browse/HDDS-2356?focusedCommentId=16960796&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16960796
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14941) Potential editlog race condition can cause corrupted file

2019-10-30 Thread Konstantin Shvachko (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16963518#comment-16963518
 ] 

Konstantin Shvachko edited comment on HDFS-14941 at 10/30/19 11:49 PM:
---

Great find [~vagarychen].
 Yes this is _*existing problem in current code*_, but it is exacerbated by 
fast journal tailing since  small tail-edits.period increases the probability 
of the race condition. In fact we occasionally see missing blocks after failing 
over to SBN on our clusters with pre-CRS code.

Want to suggest another _*possible solution 4*_.
 4. Delay incrementing the global {{generationStamp}} on standby until it 
actually sees the block with that stamp. That is, when {{OP_SET_GENSTAMP_V2}} 
comes to SBN it records the new value in a new variable 
{{lastReportedGenStamp}}. When {{OP_ADD_BLOCK}} with the genStamp that equals 
{{lastReportedGenStamp}} comes the global {{generationStamp}} is set to 
{{lastReportedGenStamp}}. This should also solve the race condition.

We were looking at {{updateBlockForPipeline()}} and found out that it could be 
_*another source of missing blocks on SBN*_, because it only increments the 
global {{generationStamp}}, but does not update the generation stamp of that 
block. The new gen stamp will be eventually updated by subsequent 
{{OP_ADD_BLOCK}}, {{OP_ADD}}, {{OP_CLOSE}}, or {{OP_UPDATE_BLOCKS}}. But the 
race condition with IBRs will be still present. If an IBR comes after 
incrementing the global genStamp the replica will not be queued as the future 
one. Instead since the block genStamp has not been yet updated by the 
subsequent {{OP_ADD_BLOCK}} or such, this replica will be invalidated on SBN.


was (Author: shv):
Great find [~vagarychen].
 Yes this is _*existing problem in current code*_, but it is exacerbated by 
fast journal tailing since  small tail-edits.period increases the probability 
of the race condition. In fact we occasionally see missing blocks after failing 
over to SBN on our clusters with pre-CRS code.

Want to suggest another _*possible solution 4*_.
 4. Delay incrementing the global {{generationStamp}} on standby until it 
actually sees the block with that stamp. That is, when {{OP_SET_GENSTAMP_V2}} 
comes to SBN it records the new value in a new variable 
{{lastReportedGenStamp}}. When {{OP_ADD_BLOCK}} with the genStamp that equals 
{{lastReportedGenStamp}} comes the global {{generationStamp}} is set to 
{{lastReportedGenStamp}}. This should also solve the race condition.

We were looking at {{updateBlockForPipeline()}} and found out that it could be 
_*another source of missing blocks on SBN*_, because it only increments the 
global {{generationStamp}}, but does not update the generation stamp of that 
block. The new gen stamp will be eventually updated by subsequent {{OP_ADD}}, 
or {{OP_CLOSE}}, or {{OP_UPDATE_BLOCKS}}. But the race condition with IBRs will 
be still present. If an IBR comes after incrementing the global genStamp the 
replica will not be queued as the future one. Instead since the block genStamp 
has not been yet updated by the subsequent {{OP_ADD}} or such, this replica 
will be invalidated on SBN.

> Potential editlog race condition can cause corrupted file
> -
>
> Key: HDFS-14941
> URL: https://issues.apache.org/jira/browse/HDFS-14941
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
>
> Recently we encountered an issue that, after a failover, NameNode complains 
> corrupted file/missing blocks. The blocks did recover after full block 
> reports, so the blocks are not actually missing. After further investigation, 
> we believe this is what happened:
> First of all, on SbN, it is possible that it receives block reports before 
> corresponding edit tailing happened. In which case SbN postpones processing 
> the DN block report, handled by the guarding logic below:
> {code:java}
>   if (shouldPostponeBlocksFromFuture &&
>   namesystem.isGenStampInFuture(iblk)) {
> queueReportedBlock(storageInfo, iblk, reportedState,
> QUEUE_REASON_FUTURE_GENSTAMP);
> continue;
>   }
> {code}
> Basically if reported block has a future generation stamp, the DN report gets 
> requeued.
> However, in {{FSNamesystem#storeAllocatedBlock}}, we have the following code:
> {code:java}
>   // allocate new block, record block locations in INode.
>   newBlock = createNewBlock();
>   INodesInPath inodesInPath = INodesInPath.fromINode(pendingFile);
>   saveAllocatedBlock(src, inodesInPath, newBlock, targets);
>   persistNewBlock(src, pendingFile);
>   offset = pendingFile.computeFileSize();
> {code}
> The line
>  {{newBlock = createNewBlock

[jira] [Commented] (HDFS-14941) Potential editlog race condition can cause corrupted file

2019-10-30 Thread Konstantin Shvachko (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16963529#comment-16963529
 ] 

Konstantin Shvachko commented on HDFS-14941:


Or yet another way of fixing it _*solution 5*_.
5. Eliminate journal transaction {{OP_SET_GENSTAMP_V2}}. That is, Active should 
never send this transaction to Stanby, while Standby treats it as no-op. 
Instead Standby should update global genStamp whenever it receives a block with 
a large genStamp than the global one. This is analogous to what we do with 
incremental inodeIDs and blocksIDs, see {{resetLastInodeId()}} and 
{{setGenerationStamp()}}.

I am in favor of (5).

> Potential editlog race condition can cause corrupted file
> -
>
> Key: HDFS-14941
> URL: https://issues.apache.org/jira/browse/HDFS-14941
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
>
> Recently we encountered an issue that, after a failover, NameNode complains 
> corrupted file/missing blocks. The blocks did recover after full block 
> reports, so the blocks are not actually missing. After further investigation, 
> we believe this is what happened:
> First of all, on SbN, it is possible that it receives block reports before 
> corresponding edit tailing happened. In which case SbN postpones processing 
> the DN block report, handled by the guarding logic below:
> {code:java}
>   if (shouldPostponeBlocksFromFuture &&
>   namesystem.isGenStampInFuture(iblk)) {
> queueReportedBlock(storageInfo, iblk, reportedState,
> QUEUE_REASON_FUTURE_GENSTAMP);
> continue;
>   }
> {code}
> Basically if reported block has a future generation stamp, the DN report gets 
> requeued.
> However, in {{FSNamesystem#storeAllocatedBlock}}, we have the following code:
> {code:java}
>   // allocate new block, record block locations in INode.
>   newBlock = createNewBlock();
>   INodesInPath inodesInPath = INodesInPath.fromINode(pendingFile);
>   saveAllocatedBlock(src, inodesInPath, newBlock, targets);
>   persistNewBlock(src, pendingFile);
>   offset = pendingFile.computeFileSize();
> {code}
> The line
>  {{newBlock = createNewBlock();}}
>  Would log an edit entry {{OP_SET_GENSTAMP_V2}} to bump generation stamp on 
> Standby
>  while the following line
>  {{persistNewBlock(src, pendingFile);}}
>  would log another edit entry {{OP_ADD_BLOCK}} to actually add the block on 
> Standby.
> Then the race condition is that, imagine Standby has just processed 
> {{OP_SET_GENSTAMP_V2}}, but not yet {{OP_ADD_BLOCK}} (if they just happen to 
> be in different setment). Now a block report with new generation stamp comes 
> in.
> Since the genstamp bump has already been processed, the reported block may 
> not be considered as future block. So the guarding logic passes. But 
> actually, the block hasn't been added to blockmap, because the second edit is 
> yet to be tailed. So, the block then gets added to invalidate block list and 
> we saw messages like:
> {code:java}
> BLOCK* addBlock: block XXX on node XXX size XXX does not belong to any file
> {code}
> Even worse, since this IBR is effectively lost, the NameNode has no 
> information about this block, until the next full block report. So after a 
> failover, the NN marks it as corrupt.
> This issue won't happen though, if both of the edit entries get tailed all 
> together, so no IBR processing can happen in between. But in our case, we set 
> edit tailing interval to super low (to allow Standby read), so when under 
> high workload, there is a much much higher chance that the two entries are 
> tailed separately, causing the issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14941) Potential editlog race condition can cause corrupted file

2019-10-30 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-14941:
---
Labels: ha  (was: )

> Potential editlog race condition can cause corrupted file
> -
>
> Key: HDFS-14941
> URL: https://issues.apache.org/jira/browse/HDFS-14941
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
>  Labels: ha
>
> Recently we encountered an issue that, after a failover, NameNode complains 
> corrupted file/missing blocks. The blocks did recover after full block 
> reports, so the blocks are not actually missing. After further investigation, 
> we believe this is what happened:
> First of all, on SbN, it is possible that it receives block reports before 
> corresponding edit tailing happened. In which case SbN postpones processing 
> the DN block report, handled by the guarding logic below:
> {code:java}
>   if (shouldPostponeBlocksFromFuture &&
>   namesystem.isGenStampInFuture(iblk)) {
> queueReportedBlock(storageInfo, iblk, reportedState,
> QUEUE_REASON_FUTURE_GENSTAMP);
> continue;
>   }
> {code}
> Basically if reported block has a future generation stamp, the DN report gets 
> requeued.
> However, in {{FSNamesystem#storeAllocatedBlock}}, we have the following code:
> {code:java}
>   // allocate new block, record block locations in INode.
>   newBlock = createNewBlock();
>   INodesInPath inodesInPath = INodesInPath.fromINode(pendingFile);
>   saveAllocatedBlock(src, inodesInPath, newBlock, targets);
>   persistNewBlock(src, pendingFile);
>   offset = pendingFile.computeFileSize();
> {code}
> The line
>  {{newBlock = createNewBlock();}}
>  Would log an edit entry {{OP_SET_GENSTAMP_V2}} to bump generation stamp on 
> Standby
>  while the following line
>  {{persistNewBlock(src, pendingFile);}}
>  would log another edit entry {{OP_ADD_BLOCK}} to actually add the block on 
> Standby.
> Then the race condition is that, imagine Standby has just processed 
> {{OP_SET_GENSTAMP_V2}}, but not yet {{OP_ADD_BLOCK}} (if they just happen to 
> be in different setment). Now a block report with new generation stamp comes 
> in.
> Since the genstamp bump has already been processed, the reported block may 
> not be considered as future block. So the guarding logic passes. But 
> actually, the block hasn't been added to blockmap, because the second edit is 
> yet to be tailed. So, the block then gets added to invalidate block list and 
> we saw messages like:
> {code:java}
> BLOCK* addBlock: block XXX on node XXX size XXX does not belong to any file
> {code}
> Even worse, since this IBR is effectively lost, the NameNode has no 
> information about this block, until the next full block report. So after a 
> failover, the NN marks it as corrupt.
> This issue won't happen though, if both of the edit entries get tailed all 
> together, so no IBR processing can happen in between. But in our case, we set 
> edit tailing interval to super low (to allow Standby read), so when under 
> high workload, there is a much much higher chance that the two entries are 
> tailed separately, causing the issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14936) Add getNumOfChildren() for interface InnerNode

2019-10-30 Thread Lisheng Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16963567#comment-16963567
 ] 

Lisheng Sun commented on HDFS-14936:


hi [~elgoiri] [~ayushtkn] Should we commit this patch to trunk? Thank you.

> Add getNumOfChildren() for interface InnerNode
> --
>
> Key: HDFS-14936
> URL: https://issues.apache.org/jira/browse/HDFS-14936
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Minor
> Attachments: HDFS-14936.001.patch, HDFS-14936.002.patch, 
> HDFS-14936.003.patch
>
>
> current code InnerNode subclass InnerNodeImpl and DFSTopologyNodeImpl both 
> have getNumOfChildren(). 
> so Add getNumOfChildren() for interface InnerNode and remove unnessary 
> getNumOfChildren() in DFSTopologyNodeImpl.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14938) Add check if excludedNodes contain scope in DFSNetworkTopology#chooseRandomWithStorageType()

2019-10-30 Thread Lisheng Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16963568#comment-16963568
 ] 

Lisheng Sun commented on HDFS-14938:


hi [~ayushtkn] [~xkrogen] [~elgoiri] Could you help you reiview this patch? 
Thank you.

 

> Add check if excludedNodes contain scope in 
> DFSNetworkTopology#chooseRandomWithStorageType() 
> -
>
> Key: HDFS-14938
> URL: https://issues.apache.org/jira/browse/HDFS-14938
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14938.001.patch, HDFS-14938.002.patch, 
> HDFS-14938.003.patch
>
>
> Add check if excludedNodes contain scope in 
> DFSNetworkTopology#chooseRandomWithStorageType().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-2385) As admin, volume list command should list all volumes not just admin user owned volumes

2019-10-30 Thread YiSheng Lien (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

YiSheng Lien reassigned HDDS-2385:
--

Assignee: YiSheng Lien

> As admin, volume list command should list all volumes not just admin user 
> owned volumes
> ---
>
> Key: HDDS-2385
> URL: https://issues.apache.org/jira/browse/HDDS-2385
> Project: Hadoop Distributed Data Store
>  Issue Type: Task
>  Components: Ozone CLI
>Affects Versions: 0.4.1
>Reporter: Vivek Ratnavel Subramanian
>Assignee: YiSheng Lien
>Priority: Major
>
> The command `ozone sh volume ls` lists only the volumes that are owned by the 
> user.
>  
> Expected behavior: The command should list all the volumes in the system if 
> the user is an ozone administrator. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2376) Fail to read data through XceiverClientGrpc

2019-10-30 Thread Hanisha Koneru (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16963573#comment-16963573
 ] 

Hanisha Koneru commented on HDDS-2376:
--

[~Sammi], I ran teragen on my cluster and it ran successfully. Is this error 
reproducible?

> Fail to read data through XceiverClientGrpc
> ---
>
> Key: HDDS-2376
> URL: https://issues.apache.org/jira/browse/HDDS-2376
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Sammi Chen
>Assignee: Hanisha Koneru
>Priority: Blocker
>
> Run teragen, application failed with following stack, 
> 19/10/29 14:35:42 INFO mapreduce.Job: Running job: job_1567133159094_0048
> 19/10/29 14:35:59 INFO mapreduce.Job: Job job_1567133159094_0048 running in 
> uber mode : false
> 19/10/29 14:35:59 INFO mapreduce.Job:  map 0% reduce 0%
> 19/10/29 14:35:59 INFO mapreduce.Job: Job job_1567133159094_0048 failed with 
> state FAILED due to: Application application_1567133159094_0048 failed 2 
> times due to AM Container for appattempt_1567133159094_0048_02 exited 
> with  exitCode: -1000
> For more detailed output, check application tracking 
> page:http://host183:8088/cluster/app/application_1567133159094_0048Then, 
> click on links to logs of each attempt.
> Diagnostics: Unexpected OzoneException: 
> org.apache.hadoop.ozone.common.OzoneChecksumException: Checksum mismatch at 
> index 0
> java.io.IOException: Unexpected OzoneException: 
> org.apache.hadoop.ozone.common.OzoneChecksumException: Checksum mismatch at 
> index 0
>   at 
> org.apache.hadoop.hdds.scm.storage.ChunkInputStream.readChunk(ChunkInputStream.java:342)
>   at 
> org.apache.hadoop.hdds.scm.storage.ChunkInputStream.readChunkFromContainer(ChunkInputStream.java:307)
>   at 
> org.apache.hadoop.hdds.scm.storage.ChunkInputStream.prepareRead(ChunkInputStream.java:259)
>   at 
> org.apache.hadoop.hdds.scm.storage.ChunkInputStream.read(ChunkInputStream.java:144)
>   at 
> org.apache.hadoop.hdds.scm.storage.BlockInputStream.read(BlockInputStream.java:239)
>   at 
> org.apache.hadoop.ozone.client.io.KeyInputStream.read(KeyInputStream.java:171)
>   at 
> org.apache.hadoop.fs.ozone.OzoneFSInputStream.read(OzoneFSInputStream.java:52)
>   at java.io.DataInputStream.read(DataInputStream.java:100)
>   at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:86)
>   at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:60)
>   at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:120)
>   at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:366)
>   at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:267)
>   at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63)
>   at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:361)
>   at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1754)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:359)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.ozone.common.OzoneChecksumException: Checksum 
> mismatch at index 0
>   at 
> org.apache.hadoop.ozone.common.ChecksumData.verifyChecksumDataMatches(ChecksumData.java:148)
>   at 
> org.apache.hadoop.ozone.common.Checksum.verifyChecksum(Checksum.java:275)
>   at 
> org.apache.hadoop.ozone.common.Checksum.verifyChecksum(Checksum.java:238)
>   at 
> org.apache.hadoop.hdds.scm.storage.ChunkInputStream.lambda$new$0(ChunkInputStream.java:375)
>   at 
> org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithRetry(XceiverClientGrpc.java:287)
>   at 
> org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithTraceIDAndRetry(XceiverClientGrpc.java:250)
>   at 
> org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommand(XceiverClientGrpc.java:233)
>   at 
> org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.readChunk(ContainerProtocolCalls.java:245)
>   at 
> org.apache.hadoop.hdds.scm.storage.ChunkInputStream.readChunk(ChunkInputStream.java:335)
>   ... 26 more
> Caused by: Checksum mismatch at

[jira] [Commented] (HDFS-14943) RBF: Add Namespace to the Overview page in the Web UI

2019-10-30 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16963587#comment-16963587
 ] 

hemanthboyina commented on HDFS-14943:
--

Thanks for putting this up [~elgoiri] , will work on this 

> RBF: Add Namespace to the Overview page in the Web UI
> -
>
> Key: HDFS-14943
> URL: https://issues.apache.org/jira/browse/HDFS-14943
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Íñigo Goiri
>Assignee: hemanthboyina
>Priority: Minor
>
> The Namenode shows the Namespace field which can be used to access it through 
> HDFS.
> The Router should also report its namespace.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-14943) RBF: Add Namespace to the Overview page in the Web UI

2019-10-30 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina reassigned HDFS-14943:


Assignee: hemanthboyina

> RBF: Add Namespace to the Overview page in the Web UI
> -
>
> Key: HDFS-14943
> URL: https://issues.apache.org/jira/browse/HDFS-14943
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Íñigo Goiri
>Assignee: hemanthboyina
>Priority: Minor
>
> The Namenode shows the Namespace field which can be used to access it through 
> HDFS.
> The Router should also report its namespace.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work started] (HDDS-426) Add field modificationTime for Volume and Bucket

2019-10-30 Thread YiSheng Lien (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDDS-426 started by YiSheng Lien.
-
> Add field modificationTime for Volume and Bucket
> 
>
> Key: HDDS-426
> URL: https://issues.apache.org/jira/browse/HDDS-426
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Manager
>Reporter: Dinesh Chitlangia
>Assignee: YiSheng Lien
>Priority: Major
>  Labels: newbie
>
> There are update operations that can be performed for Volume, Bucket and Key.
> While Key records the modification time, Volume and & Bucket do not capture 
> this.
>  
> This Jira proposes to add the required field to Volume and Bucket in order to 
> capture the modficationTime.
>  
> Current Status:
> {noformat}
> hadoop@1987b5de4203:~$ ./bin/ozone oz -infoVolume /dummyvol
> 2018-09-10 17:16:12 WARN NativeCodeLoader:60 - Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> {
> "owner" : {
> "name" : "bilbo"
> },
> "quota" : {
> "unit" : "TB",
> "size" : 1048576
> },
> "volumeName" : "dummyvol",
> "createdOn" : "Mon, 10 Sep 2018 17:11:32 GMT",
> "createdBy" : "bilbo"
> }
> hadoop@1987b5de4203:~$ ./bin/ozone oz -infoBucket /dummyvol/mybuck
> 2018-09-10 17:15:25 WARN NativeCodeLoader:60 - Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> {
> "volumeName" : "dummyvol",
> "bucketName" : "mybuck",
> "createdOn" : "Mon, 10 Sep 2018 17:12:09 GMT",
> "acls" : [ {
> "type" : "USER",
> "name" : "hadoop",
> "rights" : "READ_WRITE"
> }, {
> "type" : "GROUP",
> "name" : "users",
> "rights" : "READ_WRITE"
> }, {
> "type" : "USER",
> "name" : "spark",
> "rights" : "READ_WRITE"
> } ],
> "versioning" : "DISABLED",
> "storageType" : "DISK"
> }
> hadoop@1987b5de4203:~$ ./bin/ozone oz -infoKey /dummyvol/mybuck/myk1
> 2018-09-10 17:19:43 WARN NativeCodeLoader:60 - Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> {
> "version" : 0,
> "md5hash" : null,
> "createdOn" : "Mon, 10 Sep 2018 17:19:04 GMT",
> "modifiedOn" : "Mon, 10 Sep 2018 17:19:04 GMT",
> "size" : 0,
> "keyName" : "myk1",
> "keyLocations" : [ ]
> }{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14936) Add getNumOfChildren() for interface InnerNode

2019-10-30 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16963641#comment-16963641
 ] 

Ayush Saxena commented on HDFS-14936:
-

v003 LGTM +1
Will commit later today, if no further comments.

> Add getNumOfChildren() for interface InnerNode
> --
>
> Key: HDFS-14936
> URL: https://issues.apache.org/jira/browse/HDFS-14936
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Minor
> Attachments: HDFS-14936.001.patch, HDFS-14936.002.patch, 
> HDFS-14936.003.patch
>
>
> current code InnerNode subclass InnerNodeImpl and DFSTopologyNodeImpl both 
> have getNumOfChildren(). 
> so Add getNumOfChildren() for interface InnerNode and remove unnessary 
> getNumOfChildren() in DFSTopologyNodeImpl.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2355) Om double buffer flush termination with rocksdb error

2019-10-30 Thread Aravindan Vijayan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aravindan Vijayan updated HDDS-2355:

Status: Reopened  (was: Reopened)

> Om double buffer flush termination with rocksdb error
> -
>
> Key: HDDS-2355
> URL: https://issues.apache.org/jira/browse/HDDS-2355
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Bharat Viswanadham
>Assignee: Aravindan Vijayan
>Priority: Blocker
> Fix For: 0.5.0
>
>
> om_1    |java.io.IOException: Unable to write the batch.
> om_1    | at 
> [org.apache.hadoop.hdds.utils.db.RDBBatchOperation.commit(RDBBatchOperation.java:48|http://org.apache.hadoop.hdds.utils.db.rdbbatchoperation.commit%28rdbbatchoperation.java:48/])
> om_1    | at 
> [org.apache.hadoop.hdds.utils.db.RDBStore.commitBatchOperation(RDBStore.java:240|http://org.apache.hadoop.hdds.utils.db.rdbstore.commitbatchoperation%28rdbstore.java:240/])
> om_1    |at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:146)
> om_1    |at java.base/java.lang.Thread.run(Thread.java:834)
> om_1    |Caused by: org.rocksdb.RocksDBException: 
> WritePrepared/WriteUnprepared txn tag when write_after_commit_ is enabled (in 
> default WriteCommitted mode). If it is not due to corruption, the WAL must be 
> emptied before changing the WritePolicy.
> om_1    |at org.rocksdb.RocksDB.write0(Native Method)
> om_1    |at org.rocksdb.RocksDB.write(RocksDB.java:1421)
> om_1    | at 
> [org.apache.hadoop.hdds.utils.db.RDBBatchOperation.commit(RDBBatchOperation.java:46|http://org.apache.hadoop.hdds.utils.db.rdbbatchoperation.commit%28rdbbatchoperation.java:46/])
>  
> In few of my test run's i see this error and OM is terminated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Reopened] (HDDS-2355) Om double buffer flush termination with rocksdb error

2019-10-30 Thread Aravindan Vijayan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aravindan Vijayan reopened HDDS-2355:
-

Reopening for changing Resolution.

> Om double buffer flush termination with rocksdb error
> -
>
> Key: HDDS-2355
> URL: https://issues.apache.org/jira/browse/HDDS-2355
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Bharat Viswanadham
>Assignee: Aravindan Vijayan
>Priority: Blocker
> Fix For: 0.5.0
>
>
> om_1    |java.io.IOException: Unable to write the batch.
> om_1    | at 
> [org.apache.hadoop.hdds.utils.db.RDBBatchOperation.commit(RDBBatchOperation.java:48|http://org.apache.hadoop.hdds.utils.db.rdbbatchoperation.commit%28rdbbatchoperation.java:48/])
> om_1    | at 
> [org.apache.hadoop.hdds.utils.db.RDBStore.commitBatchOperation(RDBStore.java:240|http://org.apache.hadoop.hdds.utils.db.rdbstore.commitbatchoperation%28rdbstore.java:240/])
> om_1    |at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:146)
> om_1    |at java.base/java.lang.Thread.run(Thread.java:834)
> om_1    |Caused by: org.rocksdb.RocksDBException: 
> WritePrepared/WriteUnprepared txn tag when write_after_commit_ is enabled (in 
> default WriteCommitted mode). If it is not due to corruption, the WAL must be 
> emptied before changing the WritePolicy.
> om_1    |at org.rocksdb.RocksDB.write0(Native Method)
> om_1    |at org.rocksdb.RocksDB.write(RocksDB.java:1421)
> om_1    | at 
> [org.apache.hadoop.hdds.utils.db.RDBBatchOperation.commit(RDBBatchOperation.java:46|http://org.apache.hadoop.hdds.utils.db.rdbbatchoperation.commit%28rdbbatchoperation.java:46/])
>  
> In few of my test run's i see this error and OM is terminated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-2355) Om double buffer flush termination with rocksdb error

2019-10-30 Thread Aravindan Vijayan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aravindan Vijayan resolved HDDS-2355.
-
Resolution: Cannot Reproduce

> Om double buffer flush termination with rocksdb error
> -
>
> Key: HDDS-2355
> URL: https://issues.apache.org/jira/browse/HDDS-2355
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Bharat Viswanadham
>Assignee: Aravindan Vijayan
>Priority: Blocker
> Fix For: 0.5.0
>
>
> om_1    |java.io.IOException: Unable to write the batch.
> om_1    | at 
> [org.apache.hadoop.hdds.utils.db.RDBBatchOperation.commit(RDBBatchOperation.java:48|http://org.apache.hadoop.hdds.utils.db.rdbbatchoperation.commit%28rdbbatchoperation.java:48/])
> om_1    | at 
> [org.apache.hadoop.hdds.utils.db.RDBStore.commitBatchOperation(RDBStore.java:240|http://org.apache.hadoop.hdds.utils.db.rdbstore.commitbatchoperation%28rdbstore.java:240/])
> om_1    |at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:146)
> om_1    |at java.base/java.lang.Thread.run(Thread.java:834)
> om_1    |Caused by: org.rocksdb.RocksDBException: 
> WritePrepared/WriteUnprepared txn tag when write_after_commit_ is enabled (in 
> default WriteCommitted mode). If it is not due to corruption, the WAL must be 
> emptied before changing the WritePolicy.
> om_1    |at org.rocksdb.RocksDB.write0(Native Method)
> om_1    |at org.rocksdb.RocksDB.write(RocksDB.java:1421)
> om_1    | at 
> [org.apache.hadoop.hdds.utils.db.RDBBatchOperation.commit(RDBBatchOperation.java:46|http://org.apache.hadoop.hdds.utils.db.rdbbatchoperation.commit%28rdbbatchoperation.java:46/])
>  
> In few of my test run's i see this error and OM is terminated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14927) RBF: Add metrics for async callers thread pool

2019-10-30 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16963646#comment-16963646
 ] 

Ayush Saxena commented on HDFS-14927:
-

Thanx [~LeonG] for the patch.
Had a quick look, Just a doubt :

{code:java}
425 future.get();
426 exec.shutdown();
{code}

This isn't in finally, if the test fails. after executor creation then these 
lines won't be executed. Just give a check once.
Other than this, v007 LGTM


> RBF: Add metrics for async callers thread pool
> --
>
> Key: HDFS-14927
> URL: https://issues.apache.org/jira/browse/HDFS-14927
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Reporter: Leon Gao
>Assignee: Leon Gao
>Priority: Minor
> Attachments: HDFS-14927.001.patch, HDFS-14927.002.patch, 
> HDFS-14927.003.patch, HDFS-14927.004.patch, HDFS-14927.005.patch, 
> HDFS-14927.006.patch, HDFS-14927.007.patch
>
>
> It is good to add some monitoring on the async caller thread pool to handle 
> fan-out RPC client requests, so we know the utilization and when to bump up 
> dfs.federation.router.client.thread-size



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14891) RBF: namenode links in NameFederation Health page (federationhealth.html) cannot use https scheme

2019-10-30 Thread Xieming Li (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16963656#comment-16963656
 ] 

Xieming Li commented on HDFS-14891:
---

[~elgoiri] , thank you  for your comment.
{quote}Maybe fix the javadoc in TestRouterNamenodeWebScheme with a proper high 
level comment mentioning scheme in there.
{quote}
I could not understand the meaning of this line. Can you elaborate, please?

> RBF: namenode links in NameFederation Health page (federationhealth.html)  
> cannot use https scheme
> --
>
> Key: HDFS-14891
> URL: https://issues.apache.org/jira/browse/HDFS-14891
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: rbf, ui
>Reporter: Xieming Li
>Assignee: Xieming Li
>Priority: Major
> Attachments: HDFS-14891.001.patch, HDFS-14891.002.patch, 
> HDFS-14891.003.patch, HDFS-14891.004.patch, HDFS-14891.005.patch, 
> HDFS-14891.006.patch, HDFS-14891.patch
>
>
> The scheme of links in federationhealth.html are hard coded as 'http'.
> It should be set to 'https' when dfs.http.policy is HTTPS_ONLY 
> (HTTP_AND_HTTPS also, maybe)
>  
> [https://github.com/apache/hadoop/blob/c99a12167ff9566012ef32104a3964887d62c899/hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/webapps/router/federationhealth.html#L168-L169]
> [https://github.com/apache/hadoop/blob/c99a12167ff9566012ef32104a3964887d62c899/hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/webapps/router/federationhealth.html#L236]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14940) HDFS Balancer : getBalancerBandwidth displaying wrong values for the maximum network bandwidth used by the datanode while network bandwidth set with values as 104857600

2019-10-30 Thread Souryakanta Dwivedy (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16963657#comment-16963657
 ] 

Souryakanta Dwivedy commented on HDFS-14940:


[~hemanthboyina] my concern is not about converting the value to exact 
equivalent ,while setting the value anyway its displaying that.The issue is 
whatever value its displaying during setting network bandwidth is not 
reflecting for data nodes while getting the balancer bandwidth.Please check the 
attachment.For Ex : 

install/hadoop/namenode/bin> ./hdfs dfsadmin -setBalancerBandwidth 1048576000g
Balancer bandwidth is set to 1125899906842624000
install/hadoop/namenode/bin> ./hdfs dfsadmin -getBalancerBandwidth 
linux-226:50077
Balancer bandwidth is -17798225727368200 bytes per second.

> HDFS Balancer : getBalancerBandwidth displaying wrong values for the maximum 
> network bandwidth used by the datanode while network bandwidth set with 
> values as 1048576000g/1048p/1e
> ---
>
> Key: HDFS-14940
> URL: https://issues.apache.org/jira/browse/HDFS-14940
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
>Affects Versions: 3.1.1
> Environment: 3 Node HA Setup
>Reporter: Souryakanta Dwivedy
>Priority: Minor
> Attachments: BalancerBW.PNG
>
>
> HDFS Balancer : getBalancerBandwidth displaying wrong values for the maximum 
> network bandwidth used by the datanode
>  while network bandwidth set with values as 1048576000g/1048p/1e
> Steps :-        
>  * Set balancer bandwith with command setBalancerBandwidth and vlaues as 
> [1048576000g/1048p/1e]
>  * - Check bandwidth used by the datanode during HDFS block balancing with 
> command :hdfs dfsadmin -getBalancerBandwidth "    check it will display some 
> different values not the same value as set



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline

2019-10-30 Thread Li Cheng (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16963684#comment-16963684
 ] 

Li Cheng commented on HDDS-2356:


[~bharat] I tried debug_s3 and debug_fuse in goofys, but I believe it would 
turn into a single-thread mode in goofys and we won't see the reproduction. 
What are you looking for in audit logs?

> Multipart upload report errors while writing to ozone Ratis pipeline
> 
>
> Key: HDDS-2356
> URL: https://issues.apache.org/jira/browse/HDDS-2356
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Affects Versions: 0.4.1
> Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM 
> on a separate VM
>Reporter: Li Cheng
>Assignee: Bharat Viswanadham
>Priority: Blocker
> Fix For: 0.5.0
>
>
> Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say 
> it's VM0.
> I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path 
> on VM0, while reading data from VM0 local disk and write to mount path. The 
> dataset has various sizes of files from 0 byte to GB-level and it has a 
> number of ~50,000 files. 
> The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I 
> look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors 
> related with Multipart upload. This error eventually causes the  writing to 
> terminate and OM to be closed. 
>  
> 2019-10-24 16:01:59,527 [OMDoubleBufferFlushThread] ERROR - Terminating with 
> exit status 2: OMDoubleBuffer flush 
> threadOMDoubleBufferFlushThreadencountered Throwable error
> java.util.ConcurrentModificationException
>  at java.util.TreeMap.forEach(TreeMap.java:1004)
>  at 
> org.apache.hadoop.ozone.om.helpers.OmMultipartKeyInfo.getProto(OmMultipartKeyInfo.java:111)
>  at 
> org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:38)
>  at 
> org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:31)
>  at 
> org.apache.hadoop.hdds.utils.db.CodecRegistry.asRawData(CodecRegistry.java:68)
>  at 
> org.apache.hadoop.hdds.utils.db.TypedTable.putWithBatch(TypedTable.java:125)
>  at 
> org.apache.hadoop.ozone.om.response.s3.multipart.S3MultipartUploadCommitPartResponse.addToDBBatch(S3MultipartUploadCommitPartResponse.java:112)
>  at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.lambda$flushTransactions$0(OzoneManagerDoubleBuffer.java:137)
>  at java.util.Iterator.forEachRemaining(Iterator.java:116)
>  at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:135)
>  at java.lang.Thread.run(Thread.java:745)
> 2019-10-24 16:01:59,629 [shutdown-hook-0] INFO - SHUTDOWN_MSG:



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline

2019-10-30 Thread Li Cheng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Cheng updated HDDS-2356:
---
Description: 
Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say 
it's VM0.

I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path on 
VM0, while reading data from VM0 local disk and write to mount path. The 
dataset has various sizes of files from 0 byte to GB-level and it has a number 
of ~50,000 files. 

The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I look 
at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors related 
with Multipart upload. This error eventually causes the  writing to terminate 
and OM to be closed. 

 

2019-10-28 11:44:34,079 [qtp1383524016-70] ERROR - Error in Complete Multipart 
Upload Request for bucket: ozone-test, key: 20191012/plc_1570863541668_927
8
MISMATCH_MULTIPART_LIST org.apache.hadoop.ozone.om.exceptions.OMException: 
Complete Multipart Upload Failed: volume: 
s3c89e813c80ffcea9543004d57b2a1239bucket:
ozone-testkey: 20191012/plc_1570863541668_9278
at 
org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.handleError(OzoneManagerProtocolClientSideTranslatorPB.java:732)
at 
org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.completeMultipartUpload(OzoneManagerProtocolClientSideTranslatorPB
.java:1104)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.hadoop.hdds.tracing.TraceAllMethod.invoke(TraceAllMethod.java:66)
at com.sun.proxy.$Proxy82.completeMultipartUpload(Unknown Source)
at 
org.apache.hadoop.ozone.client.rpc.RpcClient.completeMultipartUpload(RpcClient.java:883)
at 
org.apache.hadoop.ozone.client.OzoneBucket.completeMultipartUpload(OzoneBucket.java:445)
at 
org.apache.hadoop.ozone.s3.endpoint.ObjectEndpoint.completeMultipartUpload(ObjectEndpoint.java:498)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at 
org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:76)
at 
org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:148)
at 
org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:191)
at 
org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$ResponseOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:200)
at 
org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:103)
at 
org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:493)

 

The following errors has been resolved in 

2019-10-24 16:01:59,527 [OMDoubleBufferFlushThread] ERROR - Terminating with 
exit status 2: OMDoubleBuffer flush threadOMDoubleBufferFlushThreadencountered 
Throwable error
 java.util.ConcurrentModificationException
 at java.util.TreeMap.forEach(TreeMap.java:1004)
 at 
org.apache.hadoop.ozone.om.helpers.OmMultipartKeyInfo.getProto(OmMultipartKeyInfo.java:111)
 at 
org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:38)
 at 
org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:31)
 at 
org.apache.hadoop.hdds.utils.db.CodecRegistry.asRawData(CodecRegistry.java:68)
 at org.apache.hadoop.hdds.utils.db.TypedTable.putWithBatch(TypedTable.java:125)
 at 
org.apache.hadoop.ozone.om.response.s3.multipart.S3MultipartUploadCommitPartResponse.addToDBBatch(S3MultipartUploadCommitPartResponse.java:112)
 at 
org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.lambda$flushTransactions$0(OzoneManagerDoubleBuffer.java:137)
 at java.util.Iterator.forEachRemaining(Iterator.java:116)
 at 
org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:135)
 at java.lang.Thread.run(Thread.java:745)
 2019-10-24 16:01:59,629 [shutdown-hook-0] INFO - SHUTDOWN_MSG:

  was:
Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say 
it's VM0.

I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path on 
VM0, while reading data from VM0 local disk and write to mount path. The 
dataset has various sizes of files from 0 byte to GB-level and it ha

[jira] [Created] (HDFS-14944) ec -enablePolicy should support multi federation namespace not only the default namespace in core-site.xml

2019-10-30 Thread zhuqi (Jira)
zhuqi created HDFS-14944:


 Summary: ec -enablePolicy should support multi federation 
namespace not only the default namespace in core-site.xml
 Key: HDFS-14944
 URL: https://issues.apache.org/jira/browse/HDFS-14944
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 3.2.0, 3.0.0
Reporter: zhuqi






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline

2019-10-30 Thread Li Cheng (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Cheng updated HDDS-2356:
---
Description: 
Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say 
it's VM0.

I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path on 
VM0, while reading data from VM0 local disk and write to mount path. The 
dataset has various sizes of files from 0 byte to GB-level and it has a number 
of ~50,000 files. 

The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I look 
at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors related 
with Multipart upload. This error eventually causes the  writing to terminate 
and OM to be closed. 

 

2019-10-28 11:44:34,079 [qtp1383524016-70] ERROR - Error in Complete Multipart 
Upload Request for bucket: ozone-test, key: 20191012/plc_1570863541668_927
 8
 MISMATCH_MULTIPART_LIST org.apache.hadoop.ozone.om.exceptions.OMException: 
Complete Multipart Upload Failed: volume: 
s3c89e813c80ffcea9543004d57b2a1239bucket:
 ozone-testkey: 20191012/plc_1570863541668_9278
 at 
org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.handleError(OzoneManagerProtocolClientSideTranslatorPB.java:732)
 at 
org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.completeMultipartUpload(OzoneManagerProtocolClientSideTranslatorPB
 .java:1104)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:497)
 at org.apache.hadoop.hdds.tracing.TraceAllMethod.invoke(TraceAllMethod.java:66)
 at com.sun.proxy.$Proxy82.completeMultipartUpload(Unknown Source)
 at 
org.apache.hadoop.ozone.client.rpc.RpcClient.completeMultipartUpload(RpcClient.java:883)
 at 
org.apache.hadoop.ozone.client.OzoneBucket.completeMultipartUpload(OzoneBucket.java:445)
 at 
org.apache.hadoop.ozone.s3.endpoint.ObjectEndpoint.completeMultipartUpload(ObjectEndpoint.java:498)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:497)
 at 
org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:76)
 at 
org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:148)
 at 
org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:191)
 at 
org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$ResponseOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:200)
 at 
org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:103)
 at 
org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:493)

 

The following errors has been resolved in 
https://issues.apache.org/jira/browse/HDDS-2322. 

2019-10-24 16:01:59,527 [OMDoubleBufferFlushThread] ERROR - Terminating with 
exit status 2: OMDoubleBuffer flush threadOMDoubleBufferFlushThreadencountered 
Throwable error
 java.util.ConcurrentModificationException
 at java.util.TreeMap.forEach(TreeMap.java:1004)
 at 
org.apache.hadoop.ozone.om.helpers.OmMultipartKeyInfo.getProto(OmMultipartKeyInfo.java:111)
 at 
org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:38)
 at 
org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:31)
 at 
org.apache.hadoop.hdds.utils.db.CodecRegistry.asRawData(CodecRegistry.java:68)
 at org.apache.hadoop.hdds.utils.db.TypedTable.putWithBatch(TypedTable.java:125)
 at 
org.apache.hadoop.ozone.om.response.s3.multipart.S3MultipartUploadCommitPartResponse.addToDBBatch(S3MultipartUploadCommitPartResponse.java:112)
 at 
org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.lambda$flushTransactions$0(OzoneManagerDoubleBuffer.java:137)
 at java.util.Iterator.forEachRemaining(Iterator.java:116)
 at 
org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:135)
 at java.lang.Thread.run(Thread.java:745)
 2019-10-24 16:01:59,629 [shutdown-hook-0] INFO - SHUTDOWN_MSG:

  was:
Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say 
it's VM0.

I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path on 
VM0, while reading data from VM0 local disk and write to mount pa

[jira] [Updated] (HDFS-14944) ec admin such as : -enablePolicy should support multi federation namespace not only the default namespace in core-site.xml

2019-10-30 Thread zhuqi (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhuqi updated HDFS-14944:
-
Summary: ec admin such as : -enablePolicy should support multi federation 
namespace not only the default namespace in core-site.xml  (was: ec 
-enablePolicy should support multi federation namespace not only the default 
namespace in core-site.xml)

> ec admin such as : -enablePolicy should support multi federation namespace 
> not only the default namespace in core-site.xml
> --
>
> Key: HDFS-14944
> URL: https://issues.apache.org/jira/browse/HDFS-14944
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.0.0, 3.2.0
>Reporter: zhuqi
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14944) ec admin such as : -enablePolicy should support multi federation namespace not only the default namespace in core-site.xml

2019-10-30 Thread zhuqi (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhuqi updated HDFS-14944:
-
Description: when we use the ec -enablePolicy, we only can enable the 
defaultFs namespace, we should improve to support more namespace in our 
federation environment.

> ec admin such as : -enablePolicy should support multi federation namespace 
> not only the default namespace in core-site.xml
> --
>
> Key: HDFS-14944
> URL: https://issues.apache.org/jira/browse/HDFS-14944
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.0.0, 3.2.0
>Reporter: zhuqi
>Priority: Major
>
> when we use the ec -enablePolicy, we only can enable the defaultFs namespace, 
> we should improve to support more namespace in our federation environment.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14944) ec admin such as : -enablePolicy should support multi federation namespace not only the default namespace in core-site.xml

2019-10-30 Thread zhuqi (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhuqi updated HDFS-14944:
-
Affects Version/s: 3.1.0

> ec admin such as : -enablePolicy should support multi federation namespace 
> not only the default namespace in core-site.xml
> --
>
> Key: HDFS-14944
> URL: https://issues.apache.org/jira/browse/HDFS-14944
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.0.0, 3.1.0, 3.2.0
>Reporter: zhuqi
>Priority: Major
>
> when we use the ec -enablePolicy, we only can enable the defaultFs namespace, 
> we should improve to support more namespace in our federation environment. We 
> can move the ecadmin to support multi namespace.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14944) ec admin such as : -enablePolicy should support multi federation namespace not only the default namespace in core-site.xml

2019-10-30 Thread zhuqi (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhuqi updated HDFS-14944:
-
Description: when we use the ec -enablePolicy, we only can enable the 
defaultFs namespace, we should improve to support more namespace in our 
federation environment. We can move the ecadmin to support multi namespace.  
(was: when we use the ec -enablePolicy, we only can enable the defaultFs 
namespace, we should improve to support more namespace in our federation 
environment.)

> ec admin such as : -enablePolicy should support multi federation namespace 
> not only the default namespace in core-site.xml
> --
>
> Key: HDFS-14944
> URL: https://issues.apache.org/jira/browse/HDFS-14944
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.0.0, 3.2.0
>Reporter: zhuqi
>Priority: Major
>
> when we use the ec -enablePolicy, we only can enable the defaultFs namespace, 
> we should improve to support more namespace in our federation environment. We 
> can move the ecadmin to support multi namespace.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org