[jira] [Commented] (HDFS-14768) EC : Busy DN replica should be consider in live replica check.
[ https://issues.apache.org/jira/browse/HDFS-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964627#comment-16964627 ] Surendra Singh Lilhore commented on HDFS-14768: --- [~gjhkael], we appreciate your hard work and patience. You are reporting good quality issues. > EC : Busy DN replica should be consider in live replica check. > -- > > Key: HDFS-14768 > URL: https://issues.apache.org/jira/browse/HDFS-14768 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, erasure-coding, hdfs, namenode >Affects Versions: 3.0.2 >Reporter: guojh >Assignee: guojh >Priority: Major > Labels: patch > Attachments: 1568275810244.jpg, 1568276338275.jpg, 1568771471942.jpg, > HDFS-14768.000.patch, HDFS-14768.001.patch, HDFS-14768.002.patch, > HDFS-14768.003.patch, HDFS-14768.004.patch, HDFS-14768.005.patch, > HDFS-14768.006.patch, HDFS-14768.007.patch, HDFS-14768.008.patch, > HDFS-14768.009.patch, HDFS-14768.010.patch, HDFS-14768.jpg, > guojh_UT_after_deomission.txt, guojh_UT_before_deomission.txt, > zhaoyiming_UT_after_deomission.txt, zhaoyiming_UT_beofre_deomission.txt > > > Policy is RS-6-3-1024K, version is hadoop 3.0.2; > We suppose a file's block Index is [0,1,2,3,4,5,6,7,8], And decommission > index[3,4], increase the index 6 datanode's > pendingReplicationWithoutTargets that make it large than > replicationStreamsHardLimit(we set 14). Then, After the method > chooseSourceDatanodes of BlockMananger, the liveBlockIndices is > [0,1,2,3,4,5,7,8], Block Counter is, Live:7, Decommission:2. > In method scheduleReconstruction of BlockManager, the additionalReplRequired > is 9 - 7 = 2. After Namenode choose two target Datanode, will assign a > erasureCode task to target datanode. > When datanode get the task will build targetIndices from liveBlockIndices > and target length. the code is blow. > {code:java} > // code placeholder > targetIndices = new short[targets.length]; > private void initTargetIndices() { > BitSet bitset = reconstructor.getLiveBitSet(); > int m = 0; hasValidTargets = false; > for (int i = 0; i < dataBlkNum + parityBlkNum; i++) { > if (!bitset.get) { > if (reconstructor.getBlockLen > 0) { > if (m < targets.length) { > targetIndices[m++] = (short)i; > hasValidTargets = true; > } > } > } > } > {code} > targetIndices[0]=6, and targetIndices[1] is aways 0 from initial value. > The StripedReader is aways create reader from first 6 index block, and is > [0,1,2,3,4,5] > Use the index [0,1,2,3,4,5] to build target index[6,0] will trigger the isal > bug. the block index6's data is corruption(all data is zero). > I write a unit test can stabilize repreduce. > {code:java} > // code placeholder > private int replicationStreamsHardLimit = > DFSConfigKeys.DFS_NAMENODE_REPLICATION_STREAMS_HARD_LIMIT_DEFAULT; > numDNs = dataBlocks + parityBlocks + 10; > @Test(timeout = 24) > public void testFileDecommission() throws Exception { > LOG.info("Starting test testFileDecommission"); > final Path ecFile = new Path(ecDir, "testFileDecommission"); > int writeBytes = cellSize * dataBlocks; > writeStripedFile(dfs, ecFile, writeBytes); > Assert.assertEquals(0, bm.numOfUnderReplicatedBlocks()); > FileChecksum fileChecksum1 = dfs.getFileChecksum(ecFile, writeBytes); > final INodeFile fileNode = cluster.getNamesystem().getFSDirectory() > .getINode4Write(ecFile.toString()).asFile(); > LocatedBlocks locatedBlocks = > StripedFileTestUtil.getLocatedBlocks(ecFile, dfs); > LocatedBlock lb = dfs.getClient().getLocatedBlocks(ecFile.toString(), 0) > .get(0); > DatanodeInfo[] dnLocs = lb.getLocations(); > LocatedStripedBlock lastBlock = > (LocatedStripedBlock)locatedBlocks.getLastLocatedBlock(); > DatanodeInfo[] storageInfos = lastBlock.getLocations(); > // > DatanodeDescriptor datanodeDescriptor = > cluster.getNameNode().getNamesystem() > > .getBlockManager().getDatanodeManager().getDatanode(storageInfos[6].getDatanodeUuid()); > BlockInfo firstBlock = fileNode.getBlocks()[0]; > DatanodeStorageInfo[] dStorageInfos = bm.getStorages(firstBlock); > // the first heartbeat will consume 3 replica tasks > for (int i = 0; i <= replicationStreamsHardLimit + 3; i++) { > BlockManagerTestUtil.addBlockToBeReplicated(datanodeDescriptor, new > Block(i), > new DatanodeStorageInfo[]{dStorageInfos[0]}); > } > assertEquals(dataBlocks + parityBlocks, dnLocs.length); > int[] decommNodeIndex = {3, 4}; > final List decommisionNodes = new ArrayList(); > // add the node which will be decommissioning > decommisionNodes.add(dnLocs[decommNodeIndex[0]]); > decommisionNodes.add(dnLocs[decommNodeIndex[1]]); >
[jira] [Comment Edited] (HDFS-14768) EC : Busy DN replica should be consider in live replica check.
[ https://issues.apache.org/jira/browse/HDFS-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964625#comment-16964625 ] Surendra Singh Lilhore edited comment on HDFS-14768 at 11/1/19 5:51 AM: [~gjhkael] why you think it is not a problem ? This is major issue and we have to fix this. Will commit this tomorrow if no comment from other guys. was (Author: surendrasingh): [~gjhkael] why you think it is not a problem ? This is major issue and we have to fix this. Will commit this tomorrow if no other comment form other guys. > EC : Busy DN replica should be consider in live replica check. > -- > > Key: HDFS-14768 > URL: https://issues.apache.org/jira/browse/HDFS-14768 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, erasure-coding, hdfs, namenode >Affects Versions: 3.0.2 >Reporter: guojh >Assignee: guojh >Priority: Major > Labels: patch > Attachments: 1568275810244.jpg, 1568276338275.jpg, 1568771471942.jpg, > HDFS-14768.000.patch, HDFS-14768.001.patch, HDFS-14768.002.patch, > HDFS-14768.003.patch, HDFS-14768.004.patch, HDFS-14768.005.patch, > HDFS-14768.006.patch, HDFS-14768.007.patch, HDFS-14768.008.patch, > HDFS-14768.009.patch, HDFS-14768.010.patch, HDFS-14768.jpg, > guojh_UT_after_deomission.txt, guojh_UT_before_deomission.txt, > zhaoyiming_UT_after_deomission.txt, zhaoyiming_UT_beofre_deomission.txt > > > Policy is RS-6-3-1024K, version is hadoop 3.0.2; > We suppose a file's block Index is [0,1,2,3,4,5,6,7,8], And decommission > index[3,4], increase the index 6 datanode's > pendingReplicationWithoutTargets that make it large than > replicationStreamsHardLimit(we set 14). Then, After the method > chooseSourceDatanodes of BlockMananger, the liveBlockIndices is > [0,1,2,3,4,5,7,8], Block Counter is, Live:7, Decommission:2. > In method scheduleReconstruction of BlockManager, the additionalReplRequired > is 9 - 7 = 2. After Namenode choose two target Datanode, will assign a > erasureCode task to target datanode. > When datanode get the task will build targetIndices from liveBlockIndices > and target length. the code is blow. > {code:java} > // code placeholder > targetIndices = new short[targets.length]; > private void initTargetIndices() { > BitSet bitset = reconstructor.getLiveBitSet(); > int m = 0; hasValidTargets = false; > for (int i = 0; i < dataBlkNum + parityBlkNum; i++) { > if (!bitset.get) { > if (reconstructor.getBlockLen > 0) { > if (m < targets.length) { > targetIndices[m++] = (short)i; > hasValidTargets = true; > } > } > } > } > {code} > targetIndices[0]=6, and targetIndices[1] is aways 0 from initial value. > The StripedReader is aways create reader from first 6 index block, and is > [0,1,2,3,4,5] > Use the index [0,1,2,3,4,5] to build target index[6,0] will trigger the isal > bug. the block index6's data is corruption(all data is zero). > I write a unit test can stabilize repreduce. > {code:java} > // code placeholder > private int replicationStreamsHardLimit = > DFSConfigKeys.DFS_NAMENODE_REPLICATION_STREAMS_HARD_LIMIT_DEFAULT; > numDNs = dataBlocks + parityBlocks + 10; > @Test(timeout = 24) > public void testFileDecommission() throws Exception { > LOG.info("Starting test testFileDecommission"); > final Path ecFile = new Path(ecDir, "testFileDecommission"); > int writeBytes = cellSize * dataBlocks; > writeStripedFile(dfs, ecFile, writeBytes); > Assert.assertEquals(0, bm.numOfUnderReplicatedBlocks()); > FileChecksum fileChecksum1 = dfs.getFileChecksum(ecFile, writeBytes); > final INodeFile fileNode = cluster.getNamesystem().getFSDirectory() > .getINode4Write(ecFile.toString()).asFile(); > LocatedBlocks locatedBlocks = > StripedFileTestUtil.getLocatedBlocks(ecFile, dfs); > LocatedBlock lb = dfs.getClient().getLocatedBlocks(ecFile.toString(), 0) > .get(0); > DatanodeInfo[] dnLocs = lb.getLocations(); > LocatedStripedBlock lastBlock = > (LocatedStripedBlock)locatedBlocks.getLastLocatedBlock(); > DatanodeInfo[] storageInfos = lastBlock.getLocations(); > // > DatanodeDescriptor datanodeDescriptor = > cluster.getNameNode().getNamesystem() > > .getBlockManager().getDatanodeManager().getDatanode(storageInfos[6].getDatanodeUuid()); > BlockInfo firstBlock = fileNode.getBlocks()[0]; > DatanodeStorageInfo[] dStorageInfos = bm.getStorages(firstBlock); > // the first heartbeat will consume 3 replica tasks > for (int i = 0; i <= replicationStreamsHardLimit + 3; i++) { > BlockManagerTestUtil.addBlockToBeReplicated(datanodeDescriptor, new > Block(i), > new DatanodeStorageInfo[]{dStorageInfos[0]}); > } >
[jira] [Commented] (HDFS-14768) EC : Busy DN replica should be consider in live replica check.
[ https://issues.apache.org/jira/browse/HDFS-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964625#comment-16964625 ] Surendra Singh Lilhore commented on HDFS-14768: --- [~gjhkael] why you think it is not a problem ? This is major issue and we have to fix this. Will commit this tomorrow if no other comment form other guys. > EC : Busy DN replica should be consider in live replica check. > -- > > Key: HDFS-14768 > URL: https://issues.apache.org/jira/browse/HDFS-14768 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, erasure-coding, hdfs, namenode >Affects Versions: 3.0.2 >Reporter: guojh >Assignee: guojh >Priority: Major > Labels: patch > Attachments: 1568275810244.jpg, 1568276338275.jpg, 1568771471942.jpg, > HDFS-14768.000.patch, HDFS-14768.001.patch, HDFS-14768.002.patch, > HDFS-14768.003.patch, HDFS-14768.004.patch, HDFS-14768.005.patch, > HDFS-14768.006.patch, HDFS-14768.007.patch, HDFS-14768.008.patch, > HDFS-14768.009.patch, HDFS-14768.010.patch, HDFS-14768.jpg, > guojh_UT_after_deomission.txt, guojh_UT_before_deomission.txt, > zhaoyiming_UT_after_deomission.txt, zhaoyiming_UT_beofre_deomission.txt > > > Policy is RS-6-3-1024K, version is hadoop 3.0.2; > We suppose a file's block Index is [0,1,2,3,4,5,6,7,8], And decommission > index[3,4], increase the index 6 datanode's > pendingReplicationWithoutTargets that make it large than > replicationStreamsHardLimit(we set 14). Then, After the method > chooseSourceDatanodes of BlockMananger, the liveBlockIndices is > [0,1,2,3,4,5,7,8], Block Counter is, Live:7, Decommission:2. > In method scheduleReconstruction of BlockManager, the additionalReplRequired > is 9 - 7 = 2. After Namenode choose two target Datanode, will assign a > erasureCode task to target datanode. > When datanode get the task will build targetIndices from liveBlockIndices > and target length. the code is blow. > {code:java} > // code placeholder > targetIndices = new short[targets.length]; > private void initTargetIndices() { > BitSet bitset = reconstructor.getLiveBitSet(); > int m = 0; hasValidTargets = false; > for (int i = 0; i < dataBlkNum + parityBlkNum; i++) { > if (!bitset.get) { > if (reconstructor.getBlockLen > 0) { > if (m < targets.length) { > targetIndices[m++] = (short)i; > hasValidTargets = true; > } > } > } > } > {code} > targetIndices[0]=6, and targetIndices[1] is aways 0 from initial value. > The StripedReader is aways create reader from first 6 index block, and is > [0,1,2,3,4,5] > Use the index [0,1,2,3,4,5] to build target index[6,0] will trigger the isal > bug. the block index6's data is corruption(all data is zero). > I write a unit test can stabilize repreduce. > {code:java} > // code placeholder > private int replicationStreamsHardLimit = > DFSConfigKeys.DFS_NAMENODE_REPLICATION_STREAMS_HARD_LIMIT_DEFAULT; > numDNs = dataBlocks + parityBlocks + 10; > @Test(timeout = 24) > public void testFileDecommission() throws Exception { > LOG.info("Starting test testFileDecommission"); > final Path ecFile = new Path(ecDir, "testFileDecommission"); > int writeBytes = cellSize * dataBlocks; > writeStripedFile(dfs, ecFile, writeBytes); > Assert.assertEquals(0, bm.numOfUnderReplicatedBlocks()); > FileChecksum fileChecksum1 = dfs.getFileChecksum(ecFile, writeBytes); > final INodeFile fileNode = cluster.getNamesystem().getFSDirectory() > .getINode4Write(ecFile.toString()).asFile(); > LocatedBlocks locatedBlocks = > StripedFileTestUtil.getLocatedBlocks(ecFile, dfs); > LocatedBlock lb = dfs.getClient().getLocatedBlocks(ecFile.toString(), 0) > .get(0); > DatanodeInfo[] dnLocs = lb.getLocations(); > LocatedStripedBlock lastBlock = > (LocatedStripedBlock)locatedBlocks.getLastLocatedBlock(); > DatanodeInfo[] storageInfos = lastBlock.getLocations(); > // > DatanodeDescriptor datanodeDescriptor = > cluster.getNameNode().getNamesystem() > > .getBlockManager().getDatanodeManager().getDatanode(storageInfos[6].getDatanodeUuid()); > BlockInfo firstBlock = fileNode.getBlocks()[0]; > DatanodeStorageInfo[] dStorageInfos = bm.getStorages(firstBlock); > // the first heartbeat will consume 3 replica tasks > for (int i = 0; i <= replicationStreamsHardLimit + 3; i++) { > BlockManagerTestUtil.addBlockToBeReplicated(datanodeDescriptor, new > Block(i), > new DatanodeStorageInfo[]{dStorageInfos[0]}); > } > assertEquals(dataBlocks + parityBlocks, dnLocs.length); > int[] decommNodeIndex = {3, 4}; > final List decommisionNodes = new ArrayList(); > // add the node which will be decommissioning > decommisionNodes.add(dnLocs[decommNodeIndex[0]]); >
[jira] [Reopened] (HDFS-14768) EC : Busy DN replica should be consider in live replica check.
[ https://issues.apache.org/jira/browse/HDFS-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Surendra Singh Lilhore reopened HDFS-14768: --- > EC : Busy DN replica should be consider in live replica check. > -- > > Key: HDFS-14768 > URL: https://issues.apache.org/jira/browse/HDFS-14768 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, erasure-coding, hdfs, namenode >Affects Versions: 3.0.2 >Reporter: guojh >Assignee: guojh >Priority: Major > Labels: patch > Attachments: 1568275810244.jpg, 1568276338275.jpg, 1568771471942.jpg, > HDFS-14768.000.patch, HDFS-14768.001.patch, HDFS-14768.002.patch, > HDFS-14768.003.patch, HDFS-14768.004.patch, HDFS-14768.005.patch, > HDFS-14768.006.patch, HDFS-14768.007.patch, HDFS-14768.008.patch, > HDFS-14768.009.patch, HDFS-14768.010.patch, HDFS-14768.jpg, > guojh_UT_after_deomission.txt, guojh_UT_before_deomission.txt, > zhaoyiming_UT_after_deomission.txt, zhaoyiming_UT_beofre_deomission.txt > > > Policy is RS-6-3-1024K, version is hadoop 3.0.2; > We suppose a file's block Index is [0,1,2,3,4,5,6,7,8], And decommission > index[3,4], increase the index 6 datanode's > pendingReplicationWithoutTargets that make it large than > replicationStreamsHardLimit(we set 14). Then, After the method > chooseSourceDatanodes of BlockMananger, the liveBlockIndices is > [0,1,2,3,4,5,7,8], Block Counter is, Live:7, Decommission:2. > In method scheduleReconstruction of BlockManager, the additionalReplRequired > is 9 - 7 = 2. After Namenode choose two target Datanode, will assign a > erasureCode task to target datanode. > When datanode get the task will build targetIndices from liveBlockIndices > and target length. the code is blow. > {code:java} > // code placeholder > targetIndices = new short[targets.length]; > private void initTargetIndices() { > BitSet bitset = reconstructor.getLiveBitSet(); > int m = 0; hasValidTargets = false; > for (int i = 0; i < dataBlkNum + parityBlkNum; i++) { > if (!bitset.get) { > if (reconstructor.getBlockLen > 0) { > if (m < targets.length) { > targetIndices[m++] = (short)i; > hasValidTargets = true; > } > } > } > } > {code} > targetIndices[0]=6, and targetIndices[1] is aways 0 from initial value. > The StripedReader is aways create reader from first 6 index block, and is > [0,1,2,3,4,5] > Use the index [0,1,2,3,4,5] to build target index[6,0] will trigger the isal > bug. the block index6's data is corruption(all data is zero). > I write a unit test can stabilize repreduce. > {code:java} > // code placeholder > private int replicationStreamsHardLimit = > DFSConfigKeys.DFS_NAMENODE_REPLICATION_STREAMS_HARD_LIMIT_DEFAULT; > numDNs = dataBlocks + parityBlocks + 10; > @Test(timeout = 24) > public void testFileDecommission() throws Exception { > LOG.info("Starting test testFileDecommission"); > final Path ecFile = new Path(ecDir, "testFileDecommission"); > int writeBytes = cellSize * dataBlocks; > writeStripedFile(dfs, ecFile, writeBytes); > Assert.assertEquals(0, bm.numOfUnderReplicatedBlocks()); > FileChecksum fileChecksum1 = dfs.getFileChecksum(ecFile, writeBytes); > final INodeFile fileNode = cluster.getNamesystem().getFSDirectory() > .getINode4Write(ecFile.toString()).asFile(); > LocatedBlocks locatedBlocks = > StripedFileTestUtil.getLocatedBlocks(ecFile, dfs); > LocatedBlock lb = dfs.getClient().getLocatedBlocks(ecFile.toString(), 0) > .get(0); > DatanodeInfo[] dnLocs = lb.getLocations(); > LocatedStripedBlock lastBlock = > (LocatedStripedBlock)locatedBlocks.getLastLocatedBlock(); > DatanodeInfo[] storageInfos = lastBlock.getLocations(); > // > DatanodeDescriptor datanodeDescriptor = > cluster.getNameNode().getNamesystem() > > .getBlockManager().getDatanodeManager().getDatanode(storageInfos[6].getDatanodeUuid()); > BlockInfo firstBlock = fileNode.getBlocks()[0]; > DatanodeStorageInfo[] dStorageInfos = bm.getStorages(firstBlock); > // the first heartbeat will consume 3 replica tasks > for (int i = 0; i <= replicationStreamsHardLimit + 3; i++) { > BlockManagerTestUtil.addBlockToBeReplicated(datanodeDescriptor, new > Block(i), > new DatanodeStorageInfo[]{dStorageInfos[0]}); > } > assertEquals(dataBlocks + parityBlocks, dnLocs.length); > int[] decommNodeIndex = {3, 4}; > final List decommisionNodes = new ArrayList(); > // add the node which will be decommissioning > decommisionNodes.add(dnLocs[decommNodeIndex[0]]); > decommisionNodes.add(dnLocs[decommNodeIndex[1]]); > decommissionNode(0, decommisionNodes, AdminStates.DECOMMISSIONED); > assertEquals(decommisionNodes.size(), fsn.getNumDecomLiveDataNodes()); >
[jira] [Updated] (HDFS-14950) missing libhdfspp libs in dist-package
[ https://issues.apache.org/jira/browse/HDFS-14950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuan Zhou updated HDFS-14950: - Attachment: fix_libhdfspp_lib.patch Status: Patch Available (was: Open) > missing libhdfspp libs in dist-package > -- > > Key: HDFS-14950 > URL: https://issues.apache.org/jira/browse/HDFS-14950 > Project: Hadoop HDFS > Issue Type: Bug > Components: build >Reporter: Yuan Zhou >Assignee: Yuan Zhou >Priority: Major > Attachments: fix_libhdfspp_lib.patch > > > In a Hadoop build like "mvn package -Pnative" will copy HDFS native libs to > target/lib/native. For now it will only copy the C client > libraries(libhdfs.\{a,so}). C++ based HDFS client libraies(libhdfspp.\{a,so}) > are missing there. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14950) missing libhdfspp libs in dist-package
[ https://issues.apache.org/jira/browse/HDFS-14950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuan Zhou updated HDFS-14950: - Attachment: (was: fix_libhdfspp_lib.patch) > missing libhdfspp libs in dist-package > -- > > Key: HDFS-14950 > URL: https://issues.apache.org/jira/browse/HDFS-14950 > Project: Hadoop HDFS > Issue Type: Bug > Components: build >Reporter: Yuan Zhou >Assignee: Yuan Zhou >Priority: Major > > In a Hadoop build like "mvn package -Pnative" will copy HDFS native libs to > target/lib/native. For now it will only copy the C client > libraries(libhdfs.\{a,so}). C++ based HDFS client libraies(libhdfspp.\{a,so}) > are missing there. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14938) Add check if excludedNodes contain scope in DFSNetworkTopology#chooseRandomWithStorageType()
[ https://issues.apache.org/jira/browse/HDFS-14938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964617#comment-16964617 ] Hadoop QA commented on HDFS-14938: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 34m 39s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 5s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 43s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 19s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 7s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 7s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 43s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 3 new + 1 unchanged - 0 fixed = 4 total (was 1) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 46s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 12s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}101m 17s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 33s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}198m 45s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.TestMaintenanceState | | | hadoop.hdfs.server.namenode.ha.TestBootstrapStandby | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 | | JIRA Issue | HDFS-14938 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12984565/HDFS-14938.004.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 22f7ab011716 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / f9b99d2 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_222 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/28217/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/28217/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/28217/testReport/ | | Max. process+thread count
[jira] [Commented] (HDFS-14927) RBF: Add metrics for async callers thread pool
[ https://issues.apache.org/jira/browse/HDFS-14927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964598#comment-16964598 ] Leon Gao commented on HDFS-14927: - Thanks [~inigoiri] and [~ayushtkn] for the review ^ > RBF: Add metrics for async callers thread pool > -- > > Key: HDFS-14927 > URL: https://issues.apache.org/jira/browse/HDFS-14927 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rbf >Reporter: Leon Gao >Assignee: Leon Gao >Priority: Minor > Attachments: HDFS-14927.001.patch, HDFS-14927.002.patch, > HDFS-14927.003.patch, HDFS-14927.004.patch, HDFS-14927.005.patch, > HDFS-14927.006.patch, HDFS-14927.007.patch, HDFS-14927.008.patch, > HDFS-14927.009.patch > > > It is good to add some monitoring on the async caller thread pool to handle > fan-out RPC client requests, so we know the utilization and when to bump up > dfs.federation.router.client.thread-size -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-2311) Fix logic of RetryPolicy in OzoneClientSideTranslatorPB
[ https://issues.apache.org/jira/browse/HDDS-2311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dinesh Chitlangia resolved HDDS-2311. - Fix Version/s: 0.5.0 Resolution: Fixed [~bharat] Thank you for flagging this issue and reviewing. [~hanishakoneru] Thank you for the contribution. The integration failure was unrelated to patch and this has been committed to master. > Fix logic of RetryPolicy in OzoneClientSideTranslatorPB > --- > > Key: HDDS-2311 > URL: https://issues.apache.org/jira/browse/HDDS-2311 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Bharat Viswanadham >Assignee: Hanisha Koneru >Priority: Blocker > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 20m > Remaining Estimate: 0h > > OzoneManagerProtocolClientSideTranslatorPB.java > L251: if (cause instanceof NotLeaderException) { > NotLeaderException notLeaderException = (NotLeaderException) cause; > omFailoverProxyProvider.performFailoverIfRequired( > notLeaderException.getSuggestedLeaderNodeId()); > return getRetryAction(RetryAction.RETRY, retries, failovers); > } > > The suggested leader returned from Server is not used during failOver, as the > cause is a type of RemoteException. So with current code, it does not use > suggested leader for failOver at all and by default with each OM, it tries > max retries. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-2311) Fix logic of RetryPolicy in OzoneClientSideTranslatorPB
[ https://issues.apache.org/jira/browse/HDDS-2311?focusedWorklogId=337179=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337179 ] ASF GitHub Bot logged work on HDDS-2311: Author: ASF GitHub Bot Created on: 01/Nov/19 04:39 Start Date: 01/Nov/19 04:39 Worklog Time Spent: 10m Work Description: dineshchitlangia commented on pull request #51: HDDS-2311. Fix logic of RetryPolicy in OzoneClientSideTranslatorPB. URL: https://github.com/apache/hadoop-ozone/pull/51 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337179) Time Spent: 20m (was: 10m) > Fix logic of RetryPolicy in OzoneClientSideTranslatorPB > --- > > Key: HDDS-2311 > URL: https://issues.apache.org/jira/browse/HDDS-2311 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Bharat Viswanadham >Assignee: Hanisha Koneru >Priority: Blocker > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > OzoneManagerProtocolClientSideTranslatorPB.java > L251: if (cause instanceof NotLeaderException) { > NotLeaderException notLeaderException = (NotLeaderException) cause; > omFailoverProxyProvider.performFailoverIfRequired( > notLeaderException.getSuggestedLeaderNodeId()); > return getRetryAction(RetryAction.RETRY, retries, failovers); > } > > The suggested leader returned from Server is not used during failOver, as the > cause is a type of RemoteException. So with current code, it does not use > suggested leader for failOver at all and by default with each OM, it tries > max retries. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-2397) Fix calling cleanup for few missing tables in OM
[ https://issues.apache.org/jira/browse/HDDS-2397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham reassigned HDDS-2397: Assignee: Bharat Viswanadham > Fix calling cleanup for few missing tables in OM > > > Key: HDDS-2397 > URL: https://issues.apache.org/jira/browse/HDDS-2397 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > After DoubleBuffer flushes, we call cleanup cache to cleanup tables cache. > For few tables cleanup of cache is missed: > # PrefixTable > # S3SecretTable > # DelegationTable -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-2397) Fix calling cleanup for few missing tables in OM
[ https://issues.apache.org/jira/browse/HDDS-2397?focusedWorklogId=337178=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337178 ] ASF GitHub Bot logged work on HDDS-2397: Author: ASF GitHub Bot Created on: 01/Nov/19 04:32 Start Date: 01/Nov/19 04:32 Worklog Time Spent: 10m Work Description: bharatviswa504 commented on pull request #112: HDDS-2397. Fix calling cleanup for few missing tables in OM. URL: https://github.com/apache/hadoop-ozone/pull/112 ## What changes were proposed in this pull request? Fix calling clean up of few tables which is missing in OzoneManagerDoubleBuffer cleanupcache. For few tables cleanup of cache is missed: PrefixTable S3SecretTable DelegationTable ## What is the link to the Apache JIRA https://issues.apache.org/jira/browse/HDDS-2397 ## How was this patch tested? Ran a few integration tests. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337178) Remaining Estimate: 0h Time Spent: 10m > Fix calling cleanup for few missing tables in OM > > > Key: HDDS-2397 > URL: https://issues.apache.org/jira/browse/HDDS-2397 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Bharat Viswanadham >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > After DoubleBuffer flushes, we call cleanup cache to cleanup tables cache. > For few tables cleanup of cache is missed: > # PrefixTable > # S3SecretTable > # DelegationTable -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2397) Fix calling cleanup for few missing tables in OM
[ https://issues.apache.org/jira/browse/HDDS-2397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-2397: - Labels: pull-request-available (was: ) > Fix calling cleanup for few missing tables in OM > > > Key: HDDS-2397 > URL: https://issues.apache.org/jira/browse/HDDS-2397 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Bharat Viswanadham >Priority: Major > Labels: pull-request-available > > After DoubleBuffer flushes, we call cleanup cache to cleanup tables cache. > For few tables cleanup of cache is missed: > # PrefixTable > # S3SecretTable > # DelegationTable -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-2397) Fix calling cleanup for few missing tables in OM
Bharat Viswanadham created HDDS-2397: Summary: Fix calling cleanup for few missing tables in OM Key: HDDS-2397 URL: https://issues.apache.org/jira/browse/HDDS-2397 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Bharat Viswanadham After DoubleBuffer flushes, we call cleanup cache to cleanup tables cache. For few tables cleanup of cache is missed: # PrefixTable # S3SecretTable # DelegationTable -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14937) [SBN read] ObserverReadProxyProvider should throw InterruptException
[ https://issues.apache.org/jira/browse/HDFS-14937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964591#comment-16964591 ] Hadoop QA commented on HDFS-14937: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 51s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 1s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 8s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 33s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 3s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 33s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 2s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 32s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 67m 16s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 | | JIRA Issue | HDFS-14937 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12984571/HDFS-14937-trunk-002.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux ae75a26963b0 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / f9b99d2 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_222 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/28218/testReport/ | | Max. process+thread count | 309 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs-client U: hadoop-hdfs-project/hadoop-hdfs-client | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/28218/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > [SBN read]
[jira] [Issue Comment Deleted] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham updated HDDS-2356: - Comment: was deleted (was: Has the above error caused crash in OM? If so, can you share stack trace?) > Multipart upload report errors while writing to ozone Ratis pipeline > > > Key: HDDS-2356 > URL: https://issues.apache.org/jira/browse/HDDS-2356 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Affects Versions: 0.4.1 > Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM > on a separate VM >Reporter: Li Cheng >Assignee: Bharat Viswanadham >Priority: Blocker > Fix For: 0.5.0 > > Attachments: hs_err_pid9340.log, image-2019-10-31-18-56-56-177.png > > > Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say > it's VM0. > I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path > on VM0, while reading data from VM0 local disk and write to mount path. The > dataset has various sizes of files from 0 byte to GB-level and it has a > number of ~50,000 files. > The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I > look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors > related with Multipart upload. This error eventually causes the writing to > terminate and OM to be closed. > > 2019-10-28 11:44:34,079 [qtp1383524016-70] ERROR - Error in Complete > Multipart Upload Request for bucket: ozone-test, key: > 20191012/plc_1570863541668_927 > 8 > MISMATCH_MULTIPART_LIST org.apache.hadoop.ozone.om.exceptions.OMException: > Complete Multipart Upload Failed: volume: > s3c89e813c80ffcea9543004d57b2a1239bucket: > ozone-testkey: 20191012/plc_1570863541668_9278 > at > org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.handleError(OzoneManagerProtocolClientSideTranslatorPB.java:732) > at > org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.completeMultipartUpload(OzoneManagerProtocolClientSideTranslatorPB > .java:1104) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.apache.hadoop.hdds.tracing.TraceAllMethod.invoke(TraceAllMethod.java:66) > at com.sun.proxy.$Proxy82.completeMultipartUpload(Unknown Source) > at > org.apache.hadoop.ozone.client.rpc.RpcClient.completeMultipartUpload(RpcClient.java:883) > at > org.apache.hadoop.ozone.client.OzoneBucket.completeMultipartUpload(OzoneBucket.java:445) > at > org.apache.hadoop.ozone.s3.endpoint.ObjectEndpoint.completeMultipartUpload(ObjectEndpoint.java:498) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:76) > at > org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:148) > at > org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:191) > at > org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$ResponseOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:200) > at > org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:103) > at > org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:493) > > The following errors has been resolved in > https://issues.apache.org/jira/browse/HDDS-2322. > 2019-10-24 16:01:59,527 [OMDoubleBufferFlushThread] ERROR - Terminating with > exit status 2: OMDoubleBuffer flush > threadOMDoubleBufferFlushThreadencountered Throwable error > java.util.ConcurrentModificationException > at java.util.TreeMap.forEach(TreeMap.java:1004) > at > org.apache.hadoop.ozone.om.helpers.OmMultipartKeyInfo.getProto(OmMultipartKeyInfo.java:111) > at > org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:38) > at > org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:31) > at
[jira] [Commented] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964586#comment-16964586 ] Bharat Viswanadham commented on HDDS-2356: -- Has the above error caused crash in OM? If so, can you share stack trace? > Multipart upload report errors while writing to ozone Ratis pipeline > > > Key: HDDS-2356 > URL: https://issues.apache.org/jira/browse/HDDS-2356 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Affects Versions: 0.4.1 > Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM > on a separate VM >Reporter: Li Cheng >Assignee: Bharat Viswanadham >Priority: Blocker > Fix For: 0.5.0 > > Attachments: hs_err_pid9340.log, image-2019-10-31-18-56-56-177.png > > > Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say > it's VM0. > I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path > on VM0, while reading data from VM0 local disk and write to mount path. The > dataset has various sizes of files from 0 byte to GB-level and it has a > number of ~50,000 files. > The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I > look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors > related with Multipart upload. This error eventually causes the writing to > terminate and OM to be closed. > > 2019-10-28 11:44:34,079 [qtp1383524016-70] ERROR - Error in Complete > Multipart Upload Request for bucket: ozone-test, key: > 20191012/plc_1570863541668_927 > 8 > MISMATCH_MULTIPART_LIST org.apache.hadoop.ozone.om.exceptions.OMException: > Complete Multipart Upload Failed: volume: > s3c89e813c80ffcea9543004d57b2a1239bucket: > ozone-testkey: 20191012/plc_1570863541668_9278 > at > org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.handleError(OzoneManagerProtocolClientSideTranslatorPB.java:732) > at > org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.completeMultipartUpload(OzoneManagerProtocolClientSideTranslatorPB > .java:1104) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.apache.hadoop.hdds.tracing.TraceAllMethod.invoke(TraceAllMethod.java:66) > at com.sun.proxy.$Proxy82.completeMultipartUpload(Unknown Source) > at > org.apache.hadoop.ozone.client.rpc.RpcClient.completeMultipartUpload(RpcClient.java:883) > at > org.apache.hadoop.ozone.client.OzoneBucket.completeMultipartUpload(OzoneBucket.java:445) > at > org.apache.hadoop.ozone.s3.endpoint.ObjectEndpoint.completeMultipartUpload(ObjectEndpoint.java:498) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:76) > at > org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:148) > at > org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:191) > at > org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$ResponseOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:200) > at > org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:103) > at > org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:493) > > The following errors has been resolved in > https://issues.apache.org/jira/browse/HDDS-2322. > 2019-10-24 16:01:59,527 [OMDoubleBufferFlushThread] ERROR - Terminating with > exit status 2: OMDoubleBuffer flush > threadOMDoubleBufferFlushThreadencountered Throwable error > java.util.ConcurrentModificationException > at java.util.TreeMap.forEach(TreeMap.java:1004) > at > org.apache.hadoop.ozone.om.helpers.OmMultipartKeyInfo.getProto(OmMultipartKeyInfo.java:111) > at > org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:38) > at >
[jira] [Commented] (HDDS-2395) Handle Ozone S3 completeMPU to match with aws s3 behavior.
[ https://issues.apache.org/jira/browse/HDDS-2395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964585#comment-16964585 ] Bharat Viswanadham commented on HDDS-2395: -- Hi [~timmylicheng] Exclude List is fixed as part of HDDS-2381. Thanks. > Handle Ozone S3 completeMPU to match with aws s3 behavior. > -- > > Key: HDDS-2395 > URL: https://issues.apache.org/jira/browse/HDDS-2395 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > # When uploaded 2 parts, and when complete upload 1 part no error > # During complete multipart upload name/part number not matching with > uploaded part and part number then InvalidPart error > # When parts are not specified in sorted order InvalidPartOrder > # During complete multipart upload when no uploaded parts, and we specify > some parts then also InvalidPart > # Uploaded parts 1,2,3 and during complete we can do upload 1,3 (No error) > # When part 3 uploaded, complete with part 3 can be done -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12943) Consistent Reads from Standby Node
[ https://issues.apache.org/jira/browse/HDFS-12943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hung updated HDFS-12943: - Release Note: Observer is a new type of a NameNode in addition to Active and Standby Nodes in HA settings. An Observer Node maintains a replica of the namespace same as a Standby Node. It additionally allows execution of clients read requests. To ensure read-after-write consistency within a single client, a state ID is introduced in RPC headers. The Observer responds to the client request only after its own state has caught up with the client’s state ID, which it previously received from the Active NameNode. Clients can explicitly invoke a new client protocol call msync(), which ensures that subsequent reads by this client from an Observer are consistent. A new client-side ObserverReadProxyProvider is introduced to provide automatic switching between Active and Observer NameNodes for submitting respectively write and read requests. was: Observer is a new type of a NameNode in addition to Active and Standby Nodes in HA settings. An Observer Node maintains a replica of the namespace same as a Standby Node. It additionally allows execution of clients read requests. To ensure read-after-write consistency within a single client, a state ID is introduced in RPC headers. The Observer responds to the client request only after its own state has caught up with the client’s state ID, which it previously received from the Active NameNode. Clients can explicitly invoke a new client protocol call msync(), which ensures that subsequent reads by this client from an Observer are consistent. A new client-side ObserverReadProxyProvider is introduced to provide automatic switching between Active and Observer NameNodes for submitting respectively write and read requests. > Consistent Reads from Standby Node > -- > > Key: HDFS-12943 > URL: https://issues.apache.org/jira/browse/HDFS-12943 > Project: Hadoop HDFS > Issue Type: New Feature > Components: hdfs >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko >Priority: Major > Fix For: 2.10.0, 3.3.0, 3.1.4, 3.2.2 > > Attachments: ConsistentReadsFromStandbyNode.pdf, > ConsistentReadsFromStandbyNode.pdf, HDFS-12943-001.patch, > HDFS-12943-002.patch, HDFS-12943-003.patch, HDFS-12943-004.patch, > TestPlan-ConsistentReadsFromStandbyNode.pdf > > > StandbyNode in HDFS is a replica of the active NameNode. The states of the > NameNodes are coordinated via the journal. It is natural to consider > StandbyNode as a read-only replica. As with any replicated distributed system > the problem of stale reads should be resolved. Our main goal is to provide > reads from standby in a consistent way in order to enable a wide range of > existing applications running on top of HDFS. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-2363) Failed to create Ratis container
[ https://issues.apache.org/jira/browse/HDDS-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sammi Chen resolved HDDS-2363. -- Resolution: Fixed > Failed to create Ratis container > > > Key: HDDS-2363 > URL: https://issues.apache.org/jira/browse/HDDS-2363 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Reporter: Sammi Chen >Assignee: Sammi Chen >Priority: Blocker > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Error logs; > 2019-10-29 10:24:59,553 [pool-7-thread-1] ERROR - > org.rocksdb.RocksDBException Failed init RocksDB, db path : > /data2/hdds/efe9f8f3-86be-417c-93cd-24bbeceee86f/current/containerDir2/1126/metadata/1126-dn-container.db, > exception > :/data2/hdds/efe9f8f3-86be-417c-93cd-24bbeceee86f/current/containerDir2/1126/metadata/1126-dn-container.db: > does not exist (create_if_missing is false) > CACHED_OPTS is a RockDB options cache in MetadataStoreBuilder. The cache > keeps the old rocksdb options which is not refreshed with new option values > at new call. > Logs as following didn't reveal the true failure of write failure. Will > improve following logs too. > 2019-10-24 17:43:53,460 [pool-7-thread-1] INFO - Operation: > CreateContainer : Trace ID: : Message: Container creation failed. : Result: > CONTAINER_INTERNAL_ERROR > 2019-10-24 17:43:53,478 [pool-7-thread-1] INFO - Operation: WriteChunk > : Trace ID: : Message: ContainerID 402 creation failed : Result: > CONTAINER_INTERNAL_ERROR -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2363) Failed to create Ratis container
[ https://issues.apache.org/jira/browse/HDDS-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sammi Chen updated HDDS-2363: - Fix Version/s: 0.5.0 > Failed to create Ratis container > > > Key: HDDS-2363 > URL: https://issues.apache.org/jira/browse/HDDS-2363 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Reporter: Sammi Chen >Assignee: Sammi Chen >Priority: Blocker > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Error logs; > 2019-10-29 10:24:59,553 [pool-7-thread-1] ERROR - > org.rocksdb.RocksDBException Failed init RocksDB, db path : > /data2/hdds/efe9f8f3-86be-417c-93cd-24bbeceee86f/current/containerDir2/1126/metadata/1126-dn-container.db, > exception > :/data2/hdds/efe9f8f3-86be-417c-93cd-24bbeceee86f/current/containerDir2/1126/metadata/1126-dn-container.db: > does not exist (create_if_missing is false) > CACHED_OPTS is a RockDB options cache in MetadataStoreBuilder. The cache > keeps the old rocksdb options which is not refreshed with new option values > at new call. > Logs as following didn't reveal the true failure of write failure. Will > improve following logs too. > 2019-10-24 17:43:53,460 [pool-7-thread-1] INFO - Operation: > CreateContainer : Trace ID: : Message: Container creation failed. : Result: > CONTAINER_INTERNAL_ERROR > 2019-10-24 17:43:53,478 [pool-7-thread-1] INFO - Operation: WriteChunk > : Trace ID: : Message: ContainerID 402 creation failed : Result: > CONTAINER_INTERNAL_ERROR -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12943) Consistent Reads from Standby Node
[ https://issues.apache.org/jira/browse/HDFS-12943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964579#comment-16964579 ] Hudson commented on HDFS-12943: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17592 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17592/]) Add 2.10.0 release notes for HDFS-12943 (jhung: rev ef9d12df24c0db76fd37a95551db7920d27d740c) * (edit) hadoop-common-project/hadoop-common/src/site/markdown/release/2.10.0/RELEASENOTES.2.10.0.md > Consistent Reads from Standby Node > -- > > Key: HDFS-12943 > URL: https://issues.apache.org/jira/browse/HDFS-12943 > Project: Hadoop HDFS > Issue Type: New Feature > Components: hdfs >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko >Priority: Major > Fix For: 2.10.0, 3.3.0, 3.1.4, 3.2.2 > > Attachments: ConsistentReadsFromStandbyNode.pdf, > ConsistentReadsFromStandbyNode.pdf, HDFS-12943-001.patch, > HDFS-12943-002.patch, HDFS-12943-003.patch, HDFS-12943-004.patch, > TestPlan-ConsistentReadsFromStandbyNode.pdf > > > StandbyNode in HDFS is a replica of the active NameNode. The states of the > NameNodes are coordinated via the journal. It is natural to consider > StandbyNode as a read-only replica. As with any replicated distributed system > the problem of stale reads should be resolved. Our main goal is to provide > reads from standby in a consistent way in order to enable a wide range of > existing applications running on top of HDFS. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-14950) missing libhdfspp libs in dist-package
Yuan Zhou created HDFS-14950: Summary: missing libhdfspp libs in dist-package Key: HDFS-14950 URL: https://issues.apache.org/jira/browse/HDFS-14950 Project: Hadoop HDFS Issue Type: Bug Components: build Reporter: Yuan Zhou Assignee: Yuan Zhou Attachments: fix_libhdfspp_lib.patch In a Hadoop build like "mvn package -Pnative" will copy HDFS native libs to target/lib/native. For now it will only copy the C client libraries(libhdfs.\{a,so}). C++ based HDFS client libraies(libhdfspp.\{a,so}) are missing there. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964554#comment-16964554 ] Li Cheng edited comment on HDDS-2356 at 11/1/19 3:45 AM: - Also see a core dump in rocksdb during last night's testing. Please check the attachment for the entire log. >From the first glance, it looks like when rocksdb is iterating the write_batch >to insert to the memtable, there happens a stl memory error during memory >movement. It might not be related to ozone, but it would cause rocksdb >failure. Created https://issues.apache.org/jira/browse/HDDS-2396 to track the core dump in OM rocksdb. Below is some part of the stack: C [libc.so.6+0x151d60] __memmove_ssse3_back+0x1ae0 C [librocksdbjni3192271038586903156.so+0x358fec] rocksdb::MemTableInserter::PutCFImpl(unsigned int, rocksdb::Slice const&, rocksdb::Slice const&, rocksdb: :ValueType)+0x51c C [librocksdbjni3192271038586903156.so+0x359d17] rocksdb::MemTableInserter::PutCF(unsigned int, rocksdb::Slice const&, rocksdb::Slice const&)+0x17 C [librocksdbjni3192271038586903156.so+0x3513bc] rocksdb::WriteBatch::Iterate(rocksdb::WriteBatch::Handler*) const+0x45c C [librocksdbjni3192271038586903156.so+0x354df9] rocksdb::WriteBatchInternal::InsertInto(rocksdb::WriteThread::WriteGroup&, unsigned long, rocksdb::ColumnFamilyMemTables*, rocksdb::FlushScheduler*, bool, unsigned long, rocksdb::DB*, bool, bool, bool)+0x1f9 C [librocksdbjni3192271038586903156.so+0x29fd79] rocksdb::DBImpl::WriteImpl(rocksdb::WriteOptions const&, rocksdb::WriteBatch*, rocksdb::WriteCallback*, unsigned long*, unsigned long, bool, unsigned long*, unsigned long, rocksdb::PreReleaseCallback*)+0x24b9 C [librocksdbjni3192271038586903156.so+0x2a0431] rocksdb::DBImpl::Write(rocksdb::WriteOptions const&, rocksdb::WriteBatch*)+0x21 C [librocksdbjni3192271038586903156.so+0x1a064c] Java_org_rocksdb_RocksDB_write0+0xcc J 7899 org.rocksdb.RocksDB.write0(JJJ)V (0 bytes) @ 0x7f58f1872dbe [0x7f58f1872d00+0xbe] J 10093% C1 org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions()V (400 bytes) @ 0x7f58f2308b0c [0x7f58f2307a40+0x10cc] j org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer$$Lambda$29.run()V+4 was (Author: timmylicheng): Also see a core dump in rocksdb during last night's testing. Please check the attachment for the entire log. >From the first glance, it looks like when rocksdb is iterating the write_batch >to insert to the memtable, there happens a stl memory error during memory >movement. It might not be related to ozone, but it would cause rocksdb >failure. Below is some part of the stack: C [libc.so.6+0x151d60] __memmove_ssse3_back+0x1ae0 C [librocksdbjni3192271038586903156.so+0x358fec] rocksdb::MemTableInserter::PutCFImpl(unsigned int, rocksdb::Slice const&, rocksdb::Slice const&, rocksdb: :ValueType)+0x51c C [librocksdbjni3192271038586903156.so+0x359d17] rocksdb::MemTableInserter::PutCF(unsigned int, rocksdb::Slice const&, rocksdb::Slice const&)+0x17 C [librocksdbjni3192271038586903156.so+0x3513bc] rocksdb::WriteBatch::Iterate(rocksdb::WriteBatch::Handler*) const+0x45c C [librocksdbjni3192271038586903156.so+0x354df9] rocksdb::WriteBatchInternal::InsertInto(rocksdb::WriteThread::WriteGroup&, unsigned long, rocksdb::ColumnFamilyMemTables*, rocksdb::FlushScheduler*, bool, unsigned long, rocksdb::DB*, bool, bool, bool)+0x1f9 C [librocksdbjni3192271038586903156.so+0x29fd79] rocksdb::DBImpl::WriteImpl(rocksdb::WriteOptions const&, rocksdb::WriteBatch*, rocksdb::WriteCallback*, unsigned long*, unsigned long, bool, unsigned long*, unsigned long, rocksdb::PreReleaseCallback*)+0x24b9 C [librocksdbjni3192271038586903156.so+0x2a0431] rocksdb::DBImpl::Write(rocksdb::WriteOptions const&, rocksdb::WriteBatch*)+0x21 C [librocksdbjni3192271038586903156.so+0x1a064c] Java_org_rocksdb_RocksDB_write0+0xcc J 7899 org.rocksdb.RocksDB.write0(JJJ)V (0 bytes) @ 0x7f58f1872dbe [0x7f58f1872d00+0xbe] J 10093% C1 org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions()V (400 bytes) @ 0x7f58f2308b0c [0x7f58f2307a40+0x10cc] j org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer$$Lambda$29.run()V+4 > Multipart upload report errors while writing to ozone Ratis pipeline > > > Key: HDDS-2356 > URL: https://issues.apache.org/jira/browse/HDDS-2356 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Affects Versions: 0.4.1 > Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM > on a separate VM >Reporter: Li Cheng >Assignee: Bharat Viswanadham >Priority: Blocker > Fix For: 0.5.0 > > Attachments:
[jira] [Commented] (HDDS-2396) OM rocksdb core dump during writing
[ https://issues.apache.org/jira/browse/HDDS-2396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964578#comment-16964578 ] Li Cheng commented on HDDS-2396: Attached the entire log for the core dump. Will try to turn on ulimit and reproduce this, but it happens only occasionally. > OM rocksdb core dump during writing > --- > > Key: HDDS-2396 > URL: https://issues.apache.org/jira/browse/HDDS-2396 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Affects Versions: 0.4.1 >Reporter: Li Cheng >Priority: Major > Attachments: hs_err_pid9340.log > > > Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say > it's VM0. > I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path > on VM0, while reading data from VM0 local disk and write to mount path. The > dataset has various sizes of files from 0 byte to GB-level and it has a > number of ~50,000 files. > > There happens core dump in rocksdb while it's occasional. > > Stack: [0x7f5891a23000,0x7f5891b24000], sp=0x7f5891b21bb8, free > space=1018k > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native > code) > C [libc.so.6+0x151d60] __memmove_ssse3_back+0x1ae0 > C [librocksdbjni3192271038586903156.so+0x358fec] > rocksdb::MemTableInserter::PutCFImpl(unsigned int, rocksdb::Slice const&, > rocksdb::Slice const&, rocksdb: > :ValueType)+0x51c > C [librocksdbjni3192271038586903156.so+0x359d17] > rocksdb::MemTableInserter::PutCF(unsigned int, rocksdb::Slice const&, > rocksdb::Slice const&)+0x17 > C [librocksdbjni3192271038586903156.so+0x3513bc] > rocksdb::WriteBatch::Iterate(rocksdb::WriteBatch::Handler*) const+0x45c > C [librocksdbjni3192271038586903156.so+0x354df9] > rocksdb::WriteBatchInternal::InsertInto(rocksdb::WriteThread::WriteGroup&, > unsigned long, rocksdb::ColumnFamilyMemTables*, rocksdb::FlushScheduler*, > bool, unsigned long, rocksdb::DB*, bool, bool, bool)+0x1f9 > C [librocksdbjni3192271038586903156.so+0x29fd79] > rocksdb::DBImpl::WriteImpl(rocksdb::WriteOptions const&, > rocksdb::WriteBatch*, rocksdb::WriteCallback*, unsigned long*, unsigned long, > bool, unsigned long*, unsigned long, rocksdb::PreReleaseCallback*)+0x24b9 > C [librocksdbjni3192271038586903156.so+0x2a0431] > rocksdb::DBImpl::Write(rocksdb::WriteOptions const&, > rocksdb::WriteBatch*)+0x21 > C [librocksdbjni3192271038586903156.so+0x1a064c] > Java_org_rocksdb_RocksDB_write0+0xcc > J 7899 org.rocksdb.RocksDB.write0(JJJ)V (0 bytes) @ 0x7f58f1872dbe > [0x7f58f1872d00+0xbe] > J 10093% C1 > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions()V > (400 bytes) @ 0x7f58f2308b0c [0x7f58f2307a40+0x10cc] > j > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer$$Lambda$29.run()V+4 > j java.lang.Thread.run()V+11 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2396) OM rocksdb core dump during writing
[ https://issues.apache.org/jira/browse/HDDS-2396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Cheng updated HDDS-2396: --- Attachment: hs_err_pid9340.log > OM rocksdb core dump during writing > --- > > Key: HDDS-2396 > URL: https://issues.apache.org/jira/browse/HDDS-2396 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Affects Versions: 0.4.1 >Reporter: Li Cheng >Priority: Major > Attachments: hs_err_pid9340.log > > > Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say > it's VM0. > I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path > on VM0, while reading data from VM0 local disk and write to mount path. The > dataset has various sizes of files from 0 byte to GB-level and it has a > number of ~50,000 files. > > There happens core dump in rocksdb while it's occasional. > > Stack: [0x7f5891a23000,0x7f5891b24000], sp=0x7f5891b21bb8, free > space=1018k > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native > code) > C [libc.so.6+0x151d60] __memmove_ssse3_back+0x1ae0 > C [librocksdbjni3192271038586903156.so+0x358fec] > rocksdb::MemTableInserter::PutCFImpl(unsigned int, rocksdb::Slice const&, > rocksdb::Slice const&, rocksdb: > :ValueType)+0x51c > C [librocksdbjni3192271038586903156.so+0x359d17] > rocksdb::MemTableInserter::PutCF(unsigned int, rocksdb::Slice const&, > rocksdb::Slice const&)+0x17 > C [librocksdbjni3192271038586903156.so+0x3513bc] > rocksdb::WriteBatch::Iterate(rocksdb::WriteBatch::Handler*) const+0x45c > C [librocksdbjni3192271038586903156.so+0x354df9] > rocksdb::WriteBatchInternal::InsertInto(rocksdb::WriteThread::WriteGroup&, > unsigned long, rocksdb::ColumnFamilyMemTables*, rocksdb::FlushScheduler*, > bool, unsigned long, rocksdb::DB*, bool, bool, bool)+0x1f9 > C [librocksdbjni3192271038586903156.so+0x29fd79] > rocksdb::DBImpl::WriteImpl(rocksdb::WriteOptions const&, > rocksdb::WriteBatch*, rocksdb::WriteCallback*, unsigned long*, unsigned long, > bool, unsigned long*, unsigned long, rocksdb::PreReleaseCallback*)+0x24b9 > C [librocksdbjni3192271038586903156.so+0x2a0431] > rocksdb::DBImpl::Write(rocksdb::WriteOptions const&, > rocksdb::WriteBatch*)+0x21 > C [librocksdbjni3192271038586903156.so+0x1a064c] > Java_org_rocksdb_RocksDB_write0+0xcc > J 7899 org.rocksdb.RocksDB.write0(JJJ)V (0 bytes) @ 0x7f58f1872dbe > [0x7f58f1872d00+0xbe] > J 10093% C1 > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions()V > (400 bytes) @ 0x7f58f2308b0c [0x7f58f2307a40+0x10cc] > j > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer$$Lambda$29.run()V+4 > j java.lang.Thread.run()V+11 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-2396) OM rocksdb core dump during writing
Li Cheng created HDDS-2396: -- Summary: OM rocksdb core dump during writing Key: HDDS-2396 URL: https://issues.apache.org/jira/browse/HDDS-2396 Project: Hadoop Distributed Data Store Issue Type: Bug Components: Ozone Manager Affects Versions: 0.4.1 Reporter: Li Cheng Attachments: hs_err_pid9340.log Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say it's VM0. I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path on VM0, while reading data from VM0 local disk and write to mount path. The dataset has various sizes of files from 0 byte to GB-level and it has a number of ~50,000 files. There happens core dump in rocksdb while it's occasional. Stack: [0x7f5891a23000,0x7f5891b24000], sp=0x7f5891b21bb8, free space=1018k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) C [libc.so.6+0x151d60] __memmove_ssse3_back+0x1ae0 C [librocksdbjni3192271038586903156.so+0x358fec] rocksdb::MemTableInserter::PutCFImpl(unsigned int, rocksdb::Slice const&, rocksdb::Slice const&, rocksdb: :ValueType)+0x51c C [librocksdbjni3192271038586903156.so+0x359d17] rocksdb::MemTableInserter::PutCF(unsigned int, rocksdb::Slice const&, rocksdb::Slice const&)+0x17 C [librocksdbjni3192271038586903156.so+0x3513bc] rocksdb::WriteBatch::Iterate(rocksdb::WriteBatch::Handler*) const+0x45c C [librocksdbjni3192271038586903156.so+0x354df9] rocksdb::WriteBatchInternal::InsertInto(rocksdb::WriteThread::WriteGroup&, unsigned long, rocksdb::ColumnFamilyMemTables*, rocksdb::FlushScheduler*, bool, unsigned long, rocksdb::DB*, bool, bool, bool)+0x1f9 C [librocksdbjni3192271038586903156.so+0x29fd79] rocksdb::DBImpl::WriteImpl(rocksdb::WriteOptions const&, rocksdb::WriteBatch*, rocksdb::WriteCallback*, unsigned long*, unsigned long, bool, unsigned long*, unsigned long, rocksdb::PreReleaseCallback*)+0x24b9 C [librocksdbjni3192271038586903156.so+0x2a0431] rocksdb::DBImpl::Write(rocksdb::WriteOptions const&, rocksdb::WriteBatch*)+0x21 C [librocksdbjni3192271038586903156.so+0x1a064c] Java_org_rocksdb_RocksDB_write0+0xcc J 7899 org.rocksdb.RocksDB.write0(JJJ)V (0 bytes) @ 0x7f58f1872dbe [0x7f58f1872d00+0xbe] J 10093% C1 org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions()V (400 bytes) @ 0x7f58f2308b0c [0x7f58f2307a40+0x10cc] j org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer$$Lambda$29.run()V+4 j java.lang.Thread.run()V+11 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-2395) Handle Ozone S3 completeMPU to match with aws s3 behavior.
[ https://issues.apache.org/jira/browse/HDDS-2395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964573#comment-16964573 ] Li Cheng commented on HDDS-2395: Also please note the exclude list issue. 2019-11-01 11:25:24,047 [qtp1383524016-27648] INFO - Allocating block with ExcludeList \{datanodes = [], containerIds = [], pipelineIds = [PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97, PipelineID=20d1830a-a77d-498e-a4a1-ba656ead3d97]} > Handle Ozone S3 completeMPU to match with aws s3 behavior. > -- > > Key: HDDS-2395 > URL: https://issues.apache.org/jira/browse/HDDS-2395 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > # When uploaded 2 parts, and when complete upload 1 part no error > # During complete multipart upload name/part number not matching with > uploaded part and part number then InvalidPart error > # When parts are not specified in sorted order InvalidPartOrder > # During complete multipart upload when no
[jira] [Updated] (HDFS-14937) [SBN read] ObserverReadProxyProvider should throw InterruptException
[ https://issues.apache.org/jira/browse/HDFS-14937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xuzq updated HDFS-14937: Attachment: HDFS-14937-trunk-002.patch > [SBN read] ObserverReadProxyProvider should throw InterruptException > > > Key: HDFS-14937 > URL: https://issues.apache.org/jira/browse/HDFS-14937 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: xuzq >Assignee: xuzq >Priority: Major > Attachments: HDFS-14937-trunk-001.patch, HDFS-14937-trunk-002.patch > > > ObserverReadProxyProvider should throw InterruptException immediately if one > Observer catch InterruptException in invoking. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14937) [SBN read] ObserverReadProxyProvider should throw InterruptException
[ https://issues.apache.org/jira/browse/HDFS-14937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964560#comment-16964560 ] xuzq commented on HDFS-14937: - Thanks [~xkrogen] [~vagarychen] for the comment. RetryInvocationHandler use Thread.currentThread().isInterrupted() to check Interrupted, so I keep it. {code:java} final long failoverCount = retryInvocationHandler.getFailoverCount(); try { return invoke(); } catch (Exception e) { if (LOG.isTraceEnabled()) { LOG.trace(toString(), e); } if (Thread.currentThread().isInterrupted()) { // If interrupted, do not retry. throw e; } retryInfo = retryInvocationHandler.handleException( method, callId, retryPolicy, counters, failoverCount, e); return processWaitTimeAndRetryInfo(); } {code} Use InterruptedException to instead of them? > [SBN read] ObserverReadProxyProvider should throw InterruptException > > > Key: HDFS-14937 > URL: https://issues.apache.org/jira/browse/HDFS-14937 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: xuzq >Assignee: xuzq >Priority: Major > Attachments: HDFS-14937-trunk-001.patch > > > ObserverReadProxyProvider should throw InterruptException immediately if one > Observer catch InterruptException in invoking. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964554#comment-16964554 ] Li Cheng commented on HDDS-2356: Also see a core dump in rocksdb during last night's testing. Please check the attachment for the entire log. >From the first glance, it looks like when rocksdb is iterating the write_batch >to insert to the memtable, there happens a stl memory error during memory >movement. It might not be related to ozone, but it would cause rocksdb >failure. Below is some part of the stack: C [libc.so.6+0x151d60] __memmove_ssse3_back+0x1ae0 C [librocksdbjni3192271038586903156.so+0x358fec] rocksdb::MemTableInserter::PutCFImpl(unsigned int, rocksdb::Slice const&, rocksdb::Slice const&, rocksdb: :ValueType)+0x51c C [librocksdbjni3192271038586903156.so+0x359d17] rocksdb::MemTableInserter::PutCF(unsigned int, rocksdb::Slice const&, rocksdb::Slice const&)+0x17 C [librocksdbjni3192271038586903156.so+0x3513bc] rocksdb::WriteBatch::Iterate(rocksdb::WriteBatch::Handler*) const+0x45c C [librocksdbjni3192271038586903156.so+0x354df9] rocksdb::WriteBatchInternal::InsertInto(rocksdb::WriteThread::WriteGroup&, unsigned long, rocksdb::ColumnFamilyMemTables*, rocksdb::FlushScheduler*, bool, unsigned long, rocksdb::DB*, bool, bool, bool)+0x1f9 C [librocksdbjni3192271038586903156.so+0x29fd79] rocksdb::DBImpl::WriteImpl(rocksdb::WriteOptions const&, rocksdb::WriteBatch*, rocksdb::WriteCallback*, unsigned long*, unsigned long, bool, unsigned long*, unsigned long, rocksdb::PreReleaseCallback*)+0x24b9 C [librocksdbjni3192271038586903156.so+0x2a0431] rocksdb::DBImpl::Write(rocksdb::WriteOptions const&, rocksdb::WriteBatch*)+0x21 C [librocksdbjni3192271038586903156.so+0x1a064c] Java_org_rocksdb_RocksDB_write0+0xcc J 7899 org.rocksdb.RocksDB.write0(JJJ)V (0 bytes) @ 0x7f58f1872dbe [0x7f58f1872d00+0xbe] J 10093% C1 org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions()V (400 bytes) @ 0x7f58f2308b0c [0x7f58f2307a40+0x10cc] j org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer$$Lambda$29.run()V+4 > Multipart upload report errors while writing to ozone Ratis pipeline > > > Key: HDDS-2356 > URL: https://issues.apache.org/jira/browse/HDDS-2356 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Affects Versions: 0.4.1 > Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM > on a separate VM >Reporter: Li Cheng >Assignee: Bharat Viswanadham >Priority: Blocker > Fix For: 0.5.0 > > Attachments: hs_err_pid9340.log, image-2019-10-31-18-56-56-177.png > > > Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say > it's VM0. > I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path > on VM0, while reading data from VM0 local disk and write to mount path. The > dataset has various sizes of files from 0 byte to GB-level and it has a > number of ~50,000 files. > The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I > look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors > related with Multipart upload. This error eventually causes the writing to > terminate and OM to be closed. > > 2019-10-28 11:44:34,079 [qtp1383524016-70] ERROR - Error in Complete > Multipart Upload Request for bucket: ozone-test, key: > 20191012/plc_1570863541668_927 > 8 > MISMATCH_MULTIPART_LIST org.apache.hadoop.ozone.om.exceptions.OMException: > Complete Multipart Upload Failed: volume: > s3c89e813c80ffcea9543004d57b2a1239bucket: > ozone-testkey: 20191012/plc_1570863541668_9278 > at > org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.handleError(OzoneManagerProtocolClientSideTranslatorPB.java:732) > at > org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.completeMultipartUpload(OzoneManagerProtocolClientSideTranslatorPB > .java:1104) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.apache.hadoop.hdds.tracing.TraceAllMethod.invoke(TraceAllMethod.java:66) > at com.sun.proxy.$Proxy82.completeMultipartUpload(Unknown Source) > at > org.apache.hadoop.ozone.client.rpc.RpcClient.completeMultipartUpload(RpcClient.java:883) > at > org.apache.hadoop.ozone.client.OzoneBucket.completeMultipartUpload(OzoneBucket.java:445) > at > org.apache.hadoop.ozone.s3.endpoint.ObjectEndpoint.completeMultipartUpload(ObjectEndpoint.java:498) >
[jira] [Updated] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Cheng updated HDDS-2356: --- Attachment: hs_err_pid9340.log > Multipart upload report errors while writing to ozone Ratis pipeline > > > Key: HDDS-2356 > URL: https://issues.apache.org/jira/browse/HDDS-2356 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Affects Versions: 0.4.1 > Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM > on a separate VM >Reporter: Li Cheng >Assignee: Bharat Viswanadham >Priority: Blocker > Fix For: 0.5.0 > > Attachments: hs_err_pid9340.log, image-2019-10-31-18-56-56-177.png > > > Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say > it's VM0. > I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path > on VM0, while reading data from VM0 local disk and write to mount path. The > dataset has various sizes of files from 0 byte to GB-level and it has a > number of ~50,000 files. > The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I > look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors > related with Multipart upload. This error eventually causes the writing to > terminate and OM to be closed. > > 2019-10-28 11:44:34,079 [qtp1383524016-70] ERROR - Error in Complete > Multipart Upload Request for bucket: ozone-test, key: > 20191012/plc_1570863541668_927 > 8 > MISMATCH_MULTIPART_LIST org.apache.hadoop.ozone.om.exceptions.OMException: > Complete Multipart Upload Failed: volume: > s3c89e813c80ffcea9543004d57b2a1239bucket: > ozone-testkey: 20191012/plc_1570863541668_9278 > at > org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.handleError(OzoneManagerProtocolClientSideTranslatorPB.java:732) > at > org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.completeMultipartUpload(OzoneManagerProtocolClientSideTranslatorPB > .java:1104) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.apache.hadoop.hdds.tracing.TraceAllMethod.invoke(TraceAllMethod.java:66) > at com.sun.proxy.$Proxy82.completeMultipartUpload(Unknown Source) > at > org.apache.hadoop.ozone.client.rpc.RpcClient.completeMultipartUpload(RpcClient.java:883) > at > org.apache.hadoop.ozone.client.OzoneBucket.completeMultipartUpload(OzoneBucket.java:445) > at > org.apache.hadoop.ozone.s3.endpoint.ObjectEndpoint.completeMultipartUpload(ObjectEndpoint.java:498) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:76) > at > org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:148) > at > org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:191) > at > org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$ResponseOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:200) > at > org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:103) > at > org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:493) > > The following errors has been resolved in > https://issues.apache.org/jira/browse/HDDS-2322. > 2019-10-24 16:01:59,527 [OMDoubleBufferFlushThread] ERROR - Terminating with > exit status 2: OMDoubleBuffer flush > threadOMDoubleBufferFlushThreadencountered Throwable error > java.util.ConcurrentModificationException > at java.util.TreeMap.forEach(TreeMap.java:1004) > at > org.apache.hadoop.ozone.om.helpers.OmMultipartKeyInfo.getProto(OmMultipartKeyInfo.java:111) > at > org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:38) > at > org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:31) > at > org.apache.hadoop.hdds.utils.db.CodecRegistry.asRawData(CodecRegistry.java:68) > at >
[jira] [Updated] (HDDS-2363) Failed to create Ratis container
[ https://issues.apache.org/jira/browse/HDDS-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sammi Chen updated HDDS-2363: - Summary: Failed to create Ratis container (was: Fail to create Ratis container) > Failed to create Ratis container > > > Key: HDDS-2363 > URL: https://issues.apache.org/jira/browse/HDDS-2363 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Reporter: Sammi Chen >Assignee: Sammi Chen >Priority: Blocker > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Error logs; > 2019-10-29 10:24:59,553 [pool-7-thread-1] ERROR - > org.rocksdb.RocksDBException Failed init RocksDB, db path : > /data2/hdds/efe9f8f3-86be-417c-93cd-24bbeceee86f/current/containerDir2/1126/metadata/1126-dn-container.db, > exception > :/data2/hdds/efe9f8f3-86be-417c-93cd-24bbeceee86f/current/containerDir2/1126/metadata/1126-dn-container.db: > does not exist (create_if_missing is false) > CACHED_OPTS is a RockDB options cache in MetadataStoreBuilder. The cache > keeps the old rocksdb options which is not refreshed with new option values > at new call. > Logs as following didn't reveal the true failure of write failure. Will > improve following logs too. > 2019-10-24 17:43:53,460 [pool-7-thread-1] INFO - Operation: > CreateContainer : Trace ID: : Message: Container creation failed. : Result: > CONTAINER_INTERNAL_ERROR > 2019-10-24 17:43:53,478 [pool-7-thread-1] INFO - Operation: WriteChunk > : Trace ID: : Message: ContainerID 402 creation failed : Result: > CONTAINER_INTERNAL_ERROR -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-2363) Fail to create Ratis container
[ https://issues.apache.org/jira/browse/HDDS-2363?focusedWorklogId=337145=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337145 ] ASF GitHub Bot logged work on HDDS-2363: Author: ASF GitHub Bot Created on: 01/Nov/19 02:33 Start Date: 01/Nov/19 02:33 Worklog Time Spent: 10m Work Description: ChenSammi commented on pull request #98: HDDS-2363. Fail to create Ratis container. URL: https://github.com/apache/hadoop-ozone/pull/98 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337145) Time Spent: 20m (was: 10m) > Fail to create Ratis container > -- > > Key: HDDS-2363 > URL: https://issues.apache.org/jira/browse/HDDS-2363 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Reporter: Sammi Chen >Assignee: Sammi Chen >Priority: Blocker > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Error logs; > 2019-10-29 10:24:59,553 [pool-7-thread-1] ERROR - > org.rocksdb.RocksDBException Failed init RocksDB, db path : > /data2/hdds/efe9f8f3-86be-417c-93cd-24bbeceee86f/current/containerDir2/1126/metadata/1126-dn-container.db, > exception > :/data2/hdds/efe9f8f3-86be-417c-93cd-24bbeceee86f/current/containerDir2/1126/metadata/1126-dn-container.db: > does not exist (create_if_missing is false) > CACHED_OPTS is a RockDB options cache in MetadataStoreBuilder. The cache > keeps the old rocksdb options which is not refreshed with new option values > at new call. > Logs as following didn't reveal the true failure of write failure. Will > improve following logs too. > 2019-10-24 17:43:53,460 [pool-7-thread-1] INFO - Operation: > CreateContainer : Trace ID: : Message: Container creation failed. : Result: > CONTAINER_INTERNAL_ERROR > 2019-10-24 17:43:53,478 [pool-7-thread-1] INFO - Operation: WriteChunk > : Trace ID: : Message: ContainerID 402 creation failed : Result: > CONTAINER_INTERNAL_ERROR -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14942) Change Log Level to warn in JournalNodeSyncer#syncWithJournalAtIndex
[ https://issues.apache.org/jira/browse/HDFS-14942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964535#comment-16964535 ] Lisheng Sun commented on HDFS-14942: hi [~weichiu] [~elgoiri] [~ayushtkn] Could you mind review this patch? Thank you. > Change Log Level to warn in JournalNodeSyncer#syncWithJournalAtIndex > > > Key: HDFS-14942 > URL: https://issues.apache.org/jira/browse/HDFS-14942 > Project: Hadoop HDFS > Issue Type: Improvement > Environment: when hadoop 2.x upgrades to hadoop 3.x, > InterQJournalProtocol is newly added,so throw Unknown protocol. > the newly InterQJournalProtocol is used to sychronize past log segments to > JNs that missed them. And that an error occurs does not affect normal > service. I think it should not be a ERROR log,and that log a warn log is more > reasonable. > {code:java} > private void syncWithJournalAtIndex(int index) { > ... > GetEditLogManifestResponseProto editLogManifest; > try { > editLogManifest = jnProxy.getEditLogManifestFromJournal(jid, > nameServiceId, 0, false); > } catch (IOException e) { > LOG.error("Could not sync with Journal at " + > otherJNProxies.get(journalNodeIndexForSync), e); > return; > } > {code} > {code:java} > 2019-10-30,15:11:17,388 ERROR > org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer: Could not sync with > Journal at mos1-hadoop-prc-ct17.ksru/10.85.3.59:111002019-10-30,15:11:17,388 > ERROR org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer: Could not > sync with Journal at > mos1-hadoop-prc-ct17.ksru/10.85.3.59:11100org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException): > Unknown protocol: > org.apache.hadoop.hdfs.qjournal.protocol.InterQJournalProtocol at > org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1565) at > org.apache.hadoop.ipc.Client.call(Client.java:1511) at > org.apache.hadoop.ipc.Client.call(Client.java:1421) at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116) > at com.sun.proxy.$Proxy16.getEditLogManifestFromJournal(Unknown Source) at > org.apache.hadoop.hdfs.qjournal.protocolPB.InterQJournalProtocolTranslatorPB.getEditLogManifestFromJournal(InterQJournalProtocolTranslatorPB.java:75) > at > org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer.syncWithJournalAtIndex(JournalNodeSyncer.java:250) > at > org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer.syncJournals(JournalNodeSyncer.java:226) > at > org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer.lambda$startSyncJournalsDaemon$0(JournalNodeSyncer.java:186) > at java.lang.Thread.run(Thread.java:748) > {code} >Reporter: Lisheng Sun >Priority: Minor > Attachments: HDFS-14942.001.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14938) Add check if excludedNodes contain scope in DFSNetworkTopology#chooseRandomWithStorageType()
[ https://issues.apache.org/jira/browse/HDFS-14938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964530#comment-16964530 ] Lisheng Sun commented on HDFS-14938: Thanks [~elgoiri] for your comments. I add a javadoc and ut for this patch. Upload the v004 patch. Could you have time to review it. Thank you. > Add check if excludedNodes contain scope in > DFSNetworkTopology#chooseRandomWithStorageType() > - > > Key: HDFS-14938 > URL: https://issues.apache.org/jira/browse/HDFS-14938 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Lisheng Sun >Assignee: Lisheng Sun >Priority: Major > Attachments: HDFS-14938.001.patch, HDFS-14938.002.patch, > HDFS-14938.003.patch, HDFS-14938.004.patch > > > Add check if excludedNodes contain scope in > DFSNetworkTopology#chooseRandomWithStorageType(). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14938) Add check if excludedNodes contain scope in DFSNetworkTopology#chooseRandomWithStorageType()
[ https://issues.apache.org/jira/browse/HDFS-14938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lisheng Sun updated HDFS-14938: --- Attachment: HDFS-14938.004.patch > Add check if excludedNodes contain scope in > DFSNetworkTopology#chooseRandomWithStorageType() > - > > Key: HDFS-14938 > URL: https://issues.apache.org/jira/browse/HDFS-14938 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Lisheng Sun >Assignee: Lisheng Sun >Priority: Major > Attachments: HDFS-14938.001.patch, HDFS-14938.002.patch, > HDFS-14938.003.patch, HDFS-14938.004.patch > > > Add check if excludedNodes contain scope in > DFSNetworkTopology#chooseRandomWithStorageType(). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14927) RBF: Add metrics for async callers thread pool
[ https://issues.apache.org/jira/browse/HDFS-14927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964502#comment-16964502 ] Íñigo Goiri commented on HDFS-14927: +1 on [^HDFS-14927.009.patch]. > RBF: Add metrics for async callers thread pool > -- > > Key: HDFS-14927 > URL: https://issues.apache.org/jira/browse/HDFS-14927 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rbf >Reporter: Leon Gao >Assignee: Leon Gao >Priority: Minor > Attachments: HDFS-14927.001.patch, HDFS-14927.002.patch, > HDFS-14927.003.patch, HDFS-14927.004.patch, HDFS-14927.005.patch, > HDFS-14927.006.patch, HDFS-14927.007.patch, HDFS-14927.008.patch, > HDFS-14927.009.patch > > > It is good to add some monitoring on the async caller thread pool to handle > fan-out RPC client requests, so we know the utilization and when to bump up > dfs.federation.router.client.thread-size -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2395) Handle Ozone S3 completeMPU to match with aws s3 behavior.
[ https://issues.apache.org/jira/browse/HDDS-2395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham updated HDDS-2395: - Issue Type: Bug (was: Task) > Handle Ozone S3 completeMPU to match with aws s3 behavior. > -- > > Key: HDDS-2395 > URL: https://issues.apache.org/jira/browse/HDDS-2395 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > # When uploaded 2 parts, and when complete upload 1 part no error > # During complete multipart upload name/part number not matching with > uploaded part and part number then InvalidPart error > # When parts are not specified in sorted order InvalidPartOrder > # During complete multipart upload when no uploaded parts, and we specify > some parts then also InvalidPart > # Uploaded parts 1,2,3 and during complete we can do upload 1,3 (No error) > # When part 3 uploaded, complete with part 3 can be done -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964482#comment-16964482 ] Bharat Viswanadham commented on HDDS-2356: -- Opened HDDS-2359 to handle CompleteMPU error cases. > Multipart upload report errors while writing to ozone Ratis pipeline > > > Key: HDDS-2356 > URL: https://issues.apache.org/jira/browse/HDDS-2356 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Affects Versions: 0.4.1 > Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM > on a separate VM >Reporter: Li Cheng >Assignee: Bharat Viswanadham >Priority: Blocker > Fix For: 0.5.0 > > Attachments: image-2019-10-31-18-56-56-177.png > > > Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say > it's VM0. > I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path > on VM0, while reading data from VM0 local disk and write to mount path. The > dataset has various sizes of files from 0 byte to GB-level and it has a > number of ~50,000 files. > The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I > look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors > related with Multipart upload. This error eventually causes the writing to > terminate and OM to be closed. > > 2019-10-28 11:44:34,079 [qtp1383524016-70] ERROR - Error in Complete > Multipart Upload Request for bucket: ozone-test, key: > 20191012/plc_1570863541668_927 > 8 > MISMATCH_MULTIPART_LIST org.apache.hadoop.ozone.om.exceptions.OMException: > Complete Multipart Upload Failed: volume: > s3c89e813c80ffcea9543004d57b2a1239bucket: > ozone-testkey: 20191012/plc_1570863541668_9278 > at > org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.handleError(OzoneManagerProtocolClientSideTranslatorPB.java:732) > at > org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.completeMultipartUpload(OzoneManagerProtocolClientSideTranslatorPB > .java:1104) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.apache.hadoop.hdds.tracing.TraceAllMethod.invoke(TraceAllMethod.java:66) > at com.sun.proxy.$Proxy82.completeMultipartUpload(Unknown Source) > at > org.apache.hadoop.ozone.client.rpc.RpcClient.completeMultipartUpload(RpcClient.java:883) > at > org.apache.hadoop.ozone.client.OzoneBucket.completeMultipartUpload(OzoneBucket.java:445) > at > org.apache.hadoop.ozone.s3.endpoint.ObjectEndpoint.completeMultipartUpload(ObjectEndpoint.java:498) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:76) > at > org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:148) > at > org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:191) > at > org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$ResponseOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:200) > at > org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:103) > at > org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:493) > > The following errors has been resolved in > https://issues.apache.org/jira/browse/HDDS-2322. > 2019-10-24 16:01:59,527 [OMDoubleBufferFlushThread] ERROR - Terminating with > exit status 2: OMDoubleBuffer flush > threadOMDoubleBufferFlushThreadencountered Throwable error > java.util.ConcurrentModificationException > at java.util.TreeMap.forEach(TreeMap.java:1004) > at > org.apache.hadoop.ozone.om.helpers.OmMultipartKeyInfo.getProto(OmMultipartKeyInfo.java:111) > at > org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:38) > at > org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:31) > at >
[jira] [Comment Edited] (HDDS-2356) Multipart upload report errors while writing to ozone Ratis pipeline
[ https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964482#comment-16964482 ] Bharat Viswanadham edited comment on HDDS-2356 at 11/1/19 12:41 AM: Opened HDDS-2395 to handle CompleteMPU error cases. was (Author: bharatviswa): Opened HDDS-2359 to handle CompleteMPU error cases. > Multipart upload report errors while writing to ozone Ratis pipeline > > > Key: HDDS-2356 > URL: https://issues.apache.org/jira/browse/HDDS-2356 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Affects Versions: 0.4.1 > Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM > on a separate VM >Reporter: Li Cheng >Assignee: Bharat Viswanadham >Priority: Blocker > Fix For: 0.5.0 > > Attachments: image-2019-10-31-18-56-56-177.png > > > Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say > it's VM0. > I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path > on VM0, while reading data from VM0 local disk and write to mount path. The > dataset has various sizes of files from 0 byte to GB-level and it has a > number of ~50,000 files. > The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I > look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors > related with Multipart upload. This error eventually causes the writing to > terminate and OM to be closed. > > 2019-10-28 11:44:34,079 [qtp1383524016-70] ERROR - Error in Complete > Multipart Upload Request for bucket: ozone-test, key: > 20191012/plc_1570863541668_927 > 8 > MISMATCH_MULTIPART_LIST org.apache.hadoop.ozone.om.exceptions.OMException: > Complete Multipart Upload Failed: volume: > s3c89e813c80ffcea9543004d57b2a1239bucket: > ozone-testkey: 20191012/plc_1570863541668_9278 > at > org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.handleError(OzoneManagerProtocolClientSideTranslatorPB.java:732) > at > org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.completeMultipartUpload(OzoneManagerProtocolClientSideTranslatorPB > .java:1104) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.apache.hadoop.hdds.tracing.TraceAllMethod.invoke(TraceAllMethod.java:66) > at com.sun.proxy.$Proxy82.completeMultipartUpload(Unknown Source) > at > org.apache.hadoop.ozone.client.rpc.RpcClient.completeMultipartUpload(RpcClient.java:883) > at > org.apache.hadoop.ozone.client.OzoneBucket.completeMultipartUpload(OzoneBucket.java:445) > at > org.apache.hadoop.ozone.s3.endpoint.ObjectEndpoint.completeMultipartUpload(ObjectEndpoint.java:498) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:76) > at > org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:148) > at > org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:191) > at > org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$ResponseOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:200) > at > org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:103) > at > org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:493) > > The following errors has been resolved in > https://issues.apache.org/jira/browse/HDDS-2322. > 2019-10-24 16:01:59,527 [OMDoubleBufferFlushThread] ERROR - Terminating with > exit status 2: OMDoubleBuffer flush > threadOMDoubleBufferFlushThreadencountered Throwable error > java.util.ConcurrentModificationException > at java.util.TreeMap.forEach(TreeMap.java:1004) > at > org.apache.hadoop.ozone.om.helpers.OmMultipartKeyInfo.getProto(OmMultipartKeyInfo.java:111) > at > org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:38) > at >
[jira] [Commented] (HDFS-14720) DataNode shouldn't report block as bad block if the block length is Long.MAX_VALUE.
[ https://issues.apache.org/jira/browse/HDFS-14720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964476#comment-16964476 ] Konstantin Shvachko commented on HDFS-14720: Does it need a unit test? > DataNode shouldn't report block as bad block if the block length is > Long.MAX_VALUE. > --- > > Key: HDFS-14720 > URL: https://issues.apache.org/jira/browse/HDFS-14720 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 3.1.1 >Reporter: Surendra Singh Lilhore >Assignee: hemanthboyina >Priority: Major > Attachments: HDFS-14720.001.patch > > > {noformat} > 2019-08-11 09:15:58,092 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Can't replicate block > BP-725378529-10.0.0.8-1410027444173:blk_13276745777_1112363330268 because > on-disk length 175085 is shorter than NameNode recorded length > 9223372036854775807.{noformat} > If the block length is Long.MAX_VALUE, means file belongs to this block is > deleted from the namenode and DN got the command after deletion of file. In > this case command should be ignored. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12943) Consistent Reads from Standby Node
[ https://issues.apache.org/jira/browse/HDFS-12943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-12943: --- Release Note: Observer is a new type of a NameNode in addition to Active and Standby Nodes in HA settings. An Observer Node maintains a replica of the namespace same as a Standby Node. It additionally allows execution of clients read requests. To ensure read-after-write consistency within a single client, a state ID is introduced in RPC headers. The Observer responds to the client request only after its own state has caught up with the client’s state ID, which it previously received from the Active NameNode. Clients can explicitly invoke a new client protocol call msync(), which ensures that subsequent reads by this client from an Observer are consistent. A new client-side ObserverReadProxyProvider is introduced to provide automatic switching between Active and Observer NameNodes for submitting respectively write and read requests. was: Observer is a new type of a NameNode in addition to Active and Standby in HA settings. Observer Node maintains a replica of the namespace same as a Standby Node. It additionally allows execution of clients read requests. To ensure read-after-write consistency within a single client, a state ID is introduced in RPC headers. The Observer responds to the client request only after its own state has caught up with the client’s state ID, which it previously received from the Active NameNode. Clients can explicitly invoke a new client protocol call msync(), which ensures that subsequent reads by this client from an Observer are consistent. A new client-side ObserverReadProxyProvider is introduced to provide automatic switching between Active and Observer NameNodes for submitting respectively write and read requests. > Consistent Reads from Standby Node > -- > > Key: HDFS-12943 > URL: https://issues.apache.org/jira/browse/HDFS-12943 > Project: Hadoop HDFS > Issue Type: New Feature > Components: hdfs >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko >Priority: Major > Fix For: 2.10.0, 3.3.0, 3.1.4, 3.2.2 > > Attachments: ConsistentReadsFromStandbyNode.pdf, > ConsistentReadsFromStandbyNode.pdf, HDFS-12943-001.patch, > HDFS-12943-002.patch, HDFS-12943-003.patch, HDFS-12943-004.patch, > TestPlan-ConsistentReadsFromStandbyNode.pdf > > > StandbyNode in HDFS is a replica of the active NameNode. The states of the > NameNodes are coordinated via the journal. It is natural to consider > StandbyNode as a read-only replica. As with any replicated distributed system > the problem of stale reads should be resolved. Our main goal is to provide > reads from standby in a consistent way in order to enable a wide range of > existing applications running on top of HDFS. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12943) Consistent Reads from Standby Node
[ https://issues.apache.org/jira/browse/HDFS-12943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-12943: --- Release Note: Observer is a new type of a NameNode in addition to Active and Standby in HA settings. Observer Node maintains a replica of the namespace same as a Standby Node. It additionally allows execution of clients read requests. To ensure read-after-write consistency within a single client, a state ID is introduced in RPC headers. The Observer responds to the client request only after its own state has caught up with the client’s state ID, which it previously received from the Active NameNode. Clients can explicitly invoke a new client protocol call msync(), which ensures that subsequent reads by this client from an Observer are consistent. A new client-side ObserverReadProxyProvider is introduced to provide automatic switching between Active and Observer NameNodes for submitting respectively write and read requests. was: Observer is a new type of NameNodes in addition to Active and Standby in HA settings. Observer Node maintains a replica of the namespace same as a Standby Node. It additionally allows execution of clients read requests. To ensure read-after-write consistency within a single client, a state ID is introduced in RPC headers. The Observer responds to the client request only after its own state has caught up with the client’s state ID, which it previously received from the Active NameNode. Clients can explicitly invoke a new client protocol call msync(), which ensures that subsequent reads by this client from an Observer are consistent. A new client-side ObserverReadProxyProvider is introduced to provide automatic switching between Active and Observer NameNodes for submitting respectively write and read requests. > Consistent Reads from Standby Node > -- > > Key: HDFS-12943 > URL: https://issues.apache.org/jira/browse/HDFS-12943 > Project: Hadoop HDFS > Issue Type: New Feature > Components: hdfs >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko >Priority: Major > Fix For: 2.10.0, 3.3.0, 3.1.4, 3.2.2 > > Attachments: ConsistentReadsFromStandbyNode.pdf, > ConsistentReadsFromStandbyNode.pdf, HDFS-12943-001.patch, > HDFS-12943-002.patch, HDFS-12943-003.patch, HDFS-12943-004.patch, > TestPlan-ConsistentReadsFromStandbyNode.pdf > > > StandbyNode in HDFS is a replica of the active NameNode. The states of the > NameNodes are coordinated via the journal. It is natural to consider > StandbyNode as a read-only replica. As with any replicated distributed system > the problem of stale reads should be resolved. Our main goal is to provide > reads from standby in a consistent way in order to enable a wide range of > existing applications running on top of HDFS. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-12943) Consistent Reads from Standby Node
[ https://issues.apache.org/jira/browse/HDFS-12943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko resolved HDFS-12943. Fix Version/s: 2.10.0 3.2.2 3.1.4 3.3.0 Hadoop Flags: Reviewed Release Note: Observer is a new type of NameNodes in addition to Active and Standby in HA settings. Observer Node maintains a replica of the namespace same as a Standby Node. It additionally allows execution of clients read requests. To ensure read-after-write consistency within a single client, a state ID is introduced in RPC headers. The Observer responds to the client request only after its own state has caught up with the client’s state ID, which it previously received from the Active NameNode. Clients can explicitly invoke a new client protocol call msync(), which ensures that subsequent reads by this client from an Observer are consistent. A new client-side ObserverReadProxyProvider is introduced to provide automatic switching between Active and Observer NameNodes for submitting respectively write and read requests. Resolution: Fixed Closing this as Fixed. The feature has been tested, back-ported down to 2.10 and released. Few remaining subtasks are being addressed as usual issues. Added release notes. Please review if I missed anything. _Thank you everybody for contributing to this effort._ > Consistent Reads from Standby Node > -- > > Key: HDFS-12943 > URL: https://issues.apache.org/jira/browse/HDFS-12943 > Project: Hadoop HDFS > Issue Type: New Feature > Components: hdfs >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko >Priority: Major > Fix For: 3.3.0, 3.1.4, 3.2.2, 2.10.0 > > Attachments: ConsistentReadsFromStandbyNode.pdf, > ConsistentReadsFromStandbyNode.pdf, HDFS-12943-001.patch, > HDFS-12943-002.patch, HDFS-12943-003.patch, HDFS-12943-004.patch, > TestPlan-ConsistentReadsFromStandbyNode.pdf > > > StandbyNode in HDFS is a replica of the active NameNode. The states of the > NameNodes are coordinated via the journal. It is natural to consider > StandbyNode as a read-only replica. As with any replicated distributed system > the problem of stale reads should be resolved. Our main goal is to provide > reads from standby in a consistent way in order to enable a wide range of > existing applications running on top of HDFS. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-14768) EC : Busy DN replica should be consider in live replica check.
[ https://issues.apache.org/jira/browse/HDFS-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] guojh resolved HDFS-14768. -- Resolution: Not A Problem > EC : Busy DN replica should be consider in live replica check. > -- > > Key: HDFS-14768 > URL: https://issues.apache.org/jira/browse/HDFS-14768 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, erasure-coding, hdfs, namenode >Affects Versions: 3.0.2 >Reporter: guojh >Assignee: guojh >Priority: Major > Labels: patch > Attachments: 1568275810244.jpg, 1568276338275.jpg, 1568771471942.jpg, > HDFS-14768.000.patch, HDFS-14768.001.patch, HDFS-14768.002.patch, > HDFS-14768.003.patch, HDFS-14768.004.patch, HDFS-14768.005.patch, > HDFS-14768.006.patch, HDFS-14768.007.patch, HDFS-14768.008.patch, > HDFS-14768.009.patch, HDFS-14768.010.patch, HDFS-14768.jpg, > guojh_UT_after_deomission.txt, guojh_UT_before_deomission.txt, > zhaoyiming_UT_after_deomission.txt, zhaoyiming_UT_beofre_deomission.txt > > > Policy is RS-6-3-1024K, version is hadoop 3.0.2; > We suppose a file's block Index is [0,1,2,3,4,5,6,7,8], And decommission > index[3,4], increase the index 6 datanode's > pendingReplicationWithoutTargets that make it large than > replicationStreamsHardLimit(we set 14). Then, After the method > chooseSourceDatanodes of BlockMananger, the liveBlockIndices is > [0,1,2,3,4,5,7,8], Block Counter is, Live:7, Decommission:2. > In method scheduleReconstruction of BlockManager, the additionalReplRequired > is 9 - 7 = 2. After Namenode choose two target Datanode, will assign a > erasureCode task to target datanode. > When datanode get the task will build targetIndices from liveBlockIndices > and target length. the code is blow. > {code:java} > // code placeholder > targetIndices = new short[targets.length]; > private void initTargetIndices() { > BitSet bitset = reconstructor.getLiveBitSet(); > int m = 0; hasValidTargets = false; > for (int i = 0; i < dataBlkNum + parityBlkNum; i++) { > if (!bitset.get) { > if (reconstructor.getBlockLen > 0) { > if (m < targets.length) { > targetIndices[m++] = (short)i; > hasValidTargets = true; > } > } > } > } > {code} > targetIndices[0]=6, and targetIndices[1] is aways 0 from initial value. > The StripedReader is aways create reader from first 6 index block, and is > [0,1,2,3,4,5] > Use the index [0,1,2,3,4,5] to build target index[6,0] will trigger the isal > bug. the block index6's data is corruption(all data is zero). > I write a unit test can stabilize repreduce. > {code:java} > // code placeholder > private int replicationStreamsHardLimit = > DFSConfigKeys.DFS_NAMENODE_REPLICATION_STREAMS_HARD_LIMIT_DEFAULT; > numDNs = dataBlocks + parityBlocks + 10; > @Test(timeout = 24) > public void testFileDecommission() throws Exception { > LOG.info("Starting test testFileDecommission"); > final Path ecFile = new Path(ecDir, "testFileDecommission"); > int writeBytes = cellSize * dataBlocks; > writeStripedFile(dfs, ecFile, writeBytes); > Assert.assertEquals(0, bm.numOfUnderReplicatedBlocks()); > FileChecksum fileChecksum1 = dfs.getFileChecksum(ecFile, writeBytes); > final INodeFile fileNode = cluster.getNamesystem().getFSDirectory() > .getINode4Write(ecFile.toString()).asFile(); > LocatedBlocks locatedBlocks = > StripedFileTestUtil.getLocatedBlocks(ecFile, dfs); > LocatedBlock lb = dfs.getClient().getLocatedBlocks(ecFile.toString(), 0) > .get(0); > DatanodeInfo[] dnLocs = lb.getLocations(); > LocatedStripedBlock lastBlock = > (LocatedStripedBlock)locatedBlocks.getLastLocatedBlock(); > DatanodeInfo[] storageInfos = lastBlock.getLocations(); > // > DatanodeDescriptor datanodeDescriptor = > cluster.getNameNode().getNamesystem() > > .getBlockManager().getDatanodeManager().getDatanode(storageInfos[6].getDatanodeUuid()); > BlockInfo firstBlock = fileNode.getBlocks()[0]; > DatanodeStorageInfo[] dStorageInfos = bm.getStorages(firstBlock); > // the first heartbeat will consume 3 replica tasks > for (int i = 0; i <= replicationStreamsHardLimit + 3; i++) { > BlockManagerTestUtil.addBlockToBeReplicated(datanodeDescriptor, new > Block(i), > new DatanodeStorageInfo[]{dStorageInfos[0]}); > } > assertEquals(dataBlocks + parityBlocks, dnLocs.length); > int[] decommNodeIndex = {3, 4}; > final List decommisionNodes = new ArrayList(); > // add the node which will be decommissioning > decommisionNodes.add(dnLocs[decommNodeIndex[0]]); > decommisionNodes.add(dnLocs[decommNodeIndex[1]]); > decommissionNode(0, decommisionNodes, AdminStates.DECOMMISSIONED); > assertEquals(decommisionNodes.size(), fsn.getNumDecomLiveDataNodes()); >
[jira] [Reopened] (HDFS-14768) EC : Busy DN replica should be consider in live replica check.
[ https://issues.apache.org/jira/browse/HDFS-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] guojh reopened HDFS-14768: -- > EC : Busy DN replica should be consider in live replica check. > -- > > Key: HDFS-14768 > URL: https://issues.apache.org/jira/browse/HDFS-14768 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, erasure-coding, hdfs, namenode >Affects Versions: 3.0.2 >Reporter: guojh >Assignee: guojh >Priority: Major > Labels: patch > Attachments: 1568275810244.jpg, 1568276338275.jpg, 1568771471942.jpg, > HDFS-14768.000.patch, HDFS-14768.001.patch, HDFS-14768.002.patch, > HDFS-14768.003.patch, HDFS-14768.004.patch, HDFS-14768.005.patch, > HDFS-14768.006.patch, HDFS-14768.007.patch, HDFS-14768.008.patch, > HDFS-14768.009.patch, HDFS-14768.010.patch, HDFS-14768.jpg, > guojh_UT_after_deomission.txt, guojh_UT_before_deomission.txt, > zhaoyiming_UT_after_deomission.txt, zhaoyiming_UT_beofre_deomission.txt > > > Policy is RS-6-3-1024K, version is hadoop 3.0.2; > We suppose a file's block Index is [0,1,2,3,4,5,6,7,8], And decommission > index[3,4], increase the index 6 datanode's > pendingReplicationWithoutTargets that make it large than > replicationStreamsHardLimit(we set 14). Then, After the method > chooseSourceDatanodes of BlockMananger, the liveBlockIndices is > [0,1,2,3,4,5,7,8], Block Counter is, Live:7, Decommission:2. > In method scheduleReconstruction of BlockManager, the additionalReplRequired > is 9 - 7 = 2. After Namenode choose two target Datanode, will assign a > erasureCode task to target datanode. > When datanode get the task will build targetIndices from liveBlockIndices > and target length. the code is blow. > {code:java} > // code placeholder > targetIndices = new short[targets.length]; > private void initTargetIndices() { > BitSet bitset = reconstructor.getLiveBitSet(); > int m = 0; hasValidTargets = false; > for (int i = 0; i < dataBlkNum + parityBlkNum; i++) { > if (!bitset.get) { > if (reconstructor.getBlockLen > 0) { > if (m < targets.length) { > targetIndices[m++] = (short)i; > hasValidTargets = true; > } > } > } > } > {code} > targetIndices[0]=6, and targetIndices[1] is aways 0 from initial value. > The StripedReader is aways create reader from first 6 index block, and is > [0,1,2,3,4,5] > Use the index [0,1,2,3,4,5] to build target index[6,0] will trigger the isal > bug. the block index6's data is corruption(all data is zero). > I write a unit test can stabilize repreduce. > {code:java} > // code placeholder > private int replicationStreamsHardLimit = > DFSConfigKeys.DFS_NAMENODE_REPLICATION_STREAMS_HARD_LIMIT_DEFAULT; > numDNs = dataBlocks + parityBlocks + 10; > @Test(timeout = 24) > public void testFileDecommission() throws Exception { > LOG.info("Starting test testFileDecommission"); > final Path ecFile = new Path(ecDir, "testFileDecommission"); > int writeBytes = cellSize * dataBlocks; > writeStripedFile(dfs, ecFile, writeBytes); > Assert.assertEquals(0, bm.numOfUnderReplicatedBlocks()); > FileChecksum fileChecksum1 = dfs.getFileChecksum(ecFile, writeBytes); > final INodeFile fileNode = cluster.getNamesystem().getFSDirectory() > .getINode4Write(ecFile.toString()).asFile(); > LocatedBlocks locatedBlocks = > StripedFileTestUtil.getLocatedBlocks(ecFile, dfs); > LocatedBlock lb = dfs.getClient().getLocatedBlocks(ecFile.toString(), 0) > .get(0); > DatanodeInfo[] dnLocs = lb.getLocations(); > LocatedStripedBlock lastBlock = > (LocatedStripedBlock)locatedBlocks.getLastLocatedBlock(); > DatanodeInfo[] storageInfos = lastBlock.getLocations(); > // > DatanodeDescriptor datanodeDescriptor = > cluster.getNameNode().getNamesystem() > > .getBlockManager().getDatanodeManager().getDatanode(storageInfos[6].getDatanodeUuid()); > BlockInfo firstBlock = fileNode.getBlocks()[0]; > DatanodeStorageInfo[] dStorageInfos = bm.getStorages(firstBlock); > // the first heartbeat will consume 3 replica tasks > for (int i = 0; i <= replicationStreamsHardLimit + 3; i++) { > BlockManagerTestUtil.addBlockToBeReplicated(datanodeDescriptor, new > Block(i), > new DatanodeStorageInfo[]{dStorageInfos[0]}); > } > assertEquals(dataBlocks + parityBlocks, dnLocs.length); > int[] decommNodeIndex = {3, 4}; > final List decommisionNodes = new ArrayList(); > // add the node which will be decommissioning > decommisionNodes.add(dnLocs[decommNodeIndex[0]]); > decommisionNodes.add(dnLocs[decommNodeIndex[1]]); > decommissionNode(0, decommisionNodes, AdminStates.DECOMMISSIONED); > assertEquals(decommisionNodes.size(), fsn.getNumDecomLiveDataNodes()); >
[jira] [Resolved] (HDFS-14768) EC : Busy DN replica should be consider in live replica check.
[ https://issues.apache.org/jira/browse/HDFS-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] guojh resolved HDFS-14768. -- Resolution: Abandoned > EC : Busy DN replica should be consider in live replica check. > -- > > Key: HDFS-14768 > URL: https://issues.apache.org/jira/browse/HDFS-14768 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, erasure-coding, hdfs, namenode >Affects Versions: 3.0.2 >Reporter: guojh >Assignee: guojh >Priority: Major > Labels: patch > Attachments: 1568275810244.jpg, 1568276338275.jpg, 1568771471942.jpg, > HDFS-14768.000.patch, HDFS-14768.001.patch, HDFS-14768.002.patch, > HDFS-14768.003.patch, HDFS-14768.004.patch, HDFS-14768.005.patch, > HDFS-14768.006.patch, HDFS-14768.007.patch, HDFS-14768.008.patch, > HDFS-14768.009.patch, HDFS-14768.010.patch, HDFS-14768.jpg, > guojh_UT_after_deomission.txt, guojh_UT_before_deomission.txt, > zhaoyiming_UT_after_deomission.txt, zhaoyiming_UT_beofre_deomission.txt > > > Policy is RS-6-3-1024K, version is hadoop 3.0.2; > We suppose a file's block Index is [0,1,2,3,4,5,6,7,8], And decommission > index[3,4], increase the index 6 datanode's > pendingReplicationWithoutTargets that make it large than > replicationStreamsHardLimit(we set 14). Then, After the method > chooseSourceDatanodes of BlockMananger, the liveBlockIndices is > [0,1,2,3,4,5,7,8], Block Counter is, Live:7, Decommission:2. > In method scheduleReconstruction of BlockManager, the additionalReplRequired > is 9 - 7 = 2. After Namenode choose two target Datanode, will assign a > erasureCode task to target datanode. > When datanode get the task will build targetIndices from liveBlockIndices > and target length. the code is blow. > {code:java} > // code placeholder > targetIndices = new short[targets.length]; > private void initTargetIndices() { > BitSet bitset = reconstructor.getLiveBitSet(); > int m = 0; hasValidTargets = false; > for (int i = 0; i < dataBlkNum + parityBlkNum; i++) { > if (!bitset.get) { > if (reconstructor.getBlockLen > 0) { > if (m < targets.length) { > targetIndices[m++] = (short)i; > hasValidTargets = true; > } > } > } > } > {code} > targetIndices[0]=6, and targetIndices[1] is aways 0 from initial value. > The StripedReader is aways create reader from first 6 index block, and is > [0,1,2,3,4,5] > Use the index [0,1,2,3,4,5] to build target index[6,0] will trigger the isal > bug. the block index6's data is corruption(all data is zero). > I write a unit test can stabilize repreduce. > {code:java} > // code placeholder > private int replicationStreamsHardLimit = > DFSConfigKeys.DFS_NAMENODE_REPLICATION_STREAMS_HARD_LIMIT_DEFAULT; > numDNs = dataBlocks + parityBlocks + 10; > @Test(timeout = 24) > public void testFileDecommission() throws Exception { > LOG.info("Starting test testFileDecommission"); > final Path ecFile = new Path(ecDir, "testFileDecommission"); > int writeBytes = cellSize * dataBlocks; > writeStripedFile(dfs, ecFile, writeBytes); > Assert.assertEquals(0, bm.numOfUnderReplicatedBlocks()); > FileChecksum fileChecksum1 = dfs.getFileChecksum(ecFile, writeBytes); > final INodeFile fileNode = cluster.getNamesystem().getFSDirectory() > .getINode4Write(ecFile.toString()).asFile(); > LocatedBlocks locatedBlocks = > StripedFileTestUtil.getLocatedBlocks(ecFile, dfs); > LocatedBlock lb = dfs.getClient().getLocatedBlocks(ecFile.toString(), 0) > .get(0); > DatanodeInfo[] dnLocs = lb.getLocations(); > LocatedStripedBlock lastBlock = > (LocatedStripedBlock)locatedBlocks.getLastLocatedBlock(); > DatanodeInfo[] storageInfos = lastBlock.getLocations(); > // > DatanodeDescriptor datanodeDescriptor = > cluster.getNameNode().getNamesystem() > > .getBlockManager().getDatanodeManager().getDatanode(storageInfos[6].getDatanodeUuid()); > BlockInfo firstBlock = fileNode.getBlocks()[0]; > DatanodeStorageInfo[] dStorageInfos = bm.getStorages(firstBlock); > // the first heartbeat will consume 3 replica tasks > for (int i = 0; i <= replicationStreamsHardLimit + 3; i++) { > BlockManagerTestUtil.addBlockToBeReplicated(datanodeDescriptor, new > Block(i), > new DatanodeStorageInfo[]{dStorageInfos[0]}); > } > assertEquals(dataBlocks + parityBlocks, dnLocs.length); > int[] decommNodeIndex = {3, 4}; > final List decommisionNodes = new ArrayList(); > // add the node which will be decommissioning > decommisionNodes.add(dnLocs[decommNodeIndex[0]]); > decommisionNodes.add(dnLocs[decommNodeIndex[1]]); > decommissionNode(0, decommisionNodes, AdminStates.DECOMMISSIONED); > assertEquals(decommisionNodes.size(), fsn.getNumDecomLiveDataNodes()); >
[jira] [Updated] (HDFS-14768) EC : Busy DN replica should be consider in live replica check.
[ https://issues.apache.org/jira/browse/HDFS-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] guojh updated HDFS-14768: - Status: Open (was: Patch Available) > EC : Busy DN replica should be consider in live replica check. > -- > > Key: HDFS-14768 > URL: https://issues.apache.org/jira/browse/HDFS-14768 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, erasure-coding, hdfs, namenode >Affects Versions: 3.0.2 >Reporter: guojh >Assignee: guojh >Priority: Major > Labels: patch > Attachments: 1568275810244.jpg, 1568276338275.jpg, 1568771471942.jpg, > HDFS-14768.000.patch, HDFS-14768.001.patch, HDFS-14768.002.patch, > HDFS-14768.003.patch, HDFS-14768.004.patch, HDFS-14768.005.patch, > HDFS-14768.006.patch, HDFS-14768.007.patch, HDFS-14768.008.patch, > HDFS-14768.009.patch, HDFS-14768.010.patch, HDFS-14768.jpg, > guojh_UT_after_deomission.txt, guojh_UT_before_deomission.txt, > zhaoyiming_UT_after_deomission.txt, zhaoyiming_UT_beofre_deomission.txt > > > Policy is RS-6-3-1024K, version is hadoop 3.0.2; > We suppose a file's block Index is [0,1,2,3,4,5,6,7,8], And decommission > index[3,4], increase the index 6 datanode's > pendingReplicationWithoutTargets that make it large than > replicationStreamsHardLimit(we set 14). Then, After the method > chooseSourceDatanodes of BlockMananger, the liveBlockIndices is > [0,1,2,3,4,5,7,8], Block Counter is, Live:7, Decommission:2. > In method scheduleReconstruction of BlockManager, the additionalReplRequired > is 9 - 7 = 2. After Namenode choose two target Datanode, will assign a > erasureCode task to target datanode. > When datanode get the task will build targetIndices from liveBlockIndices > and target length. the code is blow. > {code:java} > // code placeholder > targetIndices = new short[targets.length]; > private void initTargetIndices() { > BitSet bitset = reconstructor.getLiveBitSet(); > int m = 0; hasValidTargets = false; > for (int i = 0; i < dataBlkNum + parityBlkNum; i++) { > if (!bitset.get) { > if (reconstructor.getBlockLen > 0) { > if (m < targets.length) { > targetIndices[m++] = (short)i; > hasValidTargets = true; > } > } > } > } > {code} > targetIndices[0]=6, and targetIndices[1] is aways 0 from initial value. > The StripedReader is aways create reader from first 6 index block, and is > [0,1,2,3,4,5] > Use the index [0,1,2,3,4,5] to build target index[6,0] will trigger the isal > bug. the block index6's data is corruption(all data is zero). > I write a unit test can stabilize repreduce. > {code:java} > // code placeholder > private int replicationStreamsHardLimit = > DFSConfigKeys.DFS_NAMENODE_REPLICATION_STREAMS_HARD_LIMIT_DEFAULT; > numDNs = dataBlocks + parityBlocks + 10; > @Test(timeout = 24) > public void testFileDecommission() throws Exception { > LOG.info("Starting test testFileDecommission"); > final Path ecFile = new Path(ecDir, "testFileDecommission"); > int writeBytes = cellSize * dataBlocks; > writeStripedFile(dfs, ecFile, writeBytes); > Assert.assertEquals(0, bm.numOfUnderReplicatedBlocks()); > FileChecksum fileChecksum1 = dfs.getFileChecksum(ecFile, writeBytes); > final INodeFile fileNode = cluster.getNamesystem().getFSDirectory() > .getINode4Write(ecFile.toString()).asFile(); > LocatedBlocks locatedBlocks = > StripedFileTestUtil.getLocatedBlocks(ecFile, dfs); > LocatedBlock lb = dfs.getClient().getLocatedBlocks(ecFile.toString(), 0) > .get(0); > DatanodeInfo[] dnLocs = lb.getLocations(); > LocatedStripedBlock lastBlock = > (LocatedStripedBlock)locatedBlocks.getLastLocatedBlock(); > DatanodeInfo[] storageInfos = lastBlock.getLocations(); > // > DatanodeDescriptor datanodeDescriptor = > cluster.getNameNode().getNamesystem() > > .getBlockManager().getDatanodeManager().getDatanode(storageInfos[6].getDatanodeUuid()); > BlockInfo firstBlock = fileNode.getBlocks()[0]; > DatanodeStorageInfo[] dStorageInfos = bm.getStorages(firstBlock); > // the first heartbeat will consume 3 replica tasks > for (int i = 0; i <= replicationStreamsHardLimit + 3; i++) { > BlockManagerTestUtil.addBlockToBeReplicated(datanodeDescriptor, new > Block(i), > new DatanodeStorageInfo[]{dStorageInfos[0]}); > } > assertEquals(dataBlocks + parityBlocks, dnLocs.length); > int[] decommNodeIndex = {3, 4}; > final List decommisionNodes = new ArrayList(); > // add the node which will be decommissioning > decommisionNodes.add(dnLocs[decommNodeIndex[0]]); > decommisionNodes.add(dnLocs[decommNodeIndex[1]]); > decommissionNode(0, decommisionNodes, AdminStates.DECOMMISSIONED); > assertEquals(decommisionNodes.size(),
[jira] [Commented] (HDDS-2321) Ozone Block Token verify should not apply to all datanode cmd
[ https://issues.apache.org/jira/browse/HDDS-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964461#comment-16964461 ] Xiaoyu Yao commented on HDDS-2321: -- {quote}Since SCM has the root cert, it might be intresting if it send a token over, that way these commands are also verified. In the long run, or even the short run, these SCM commands to DNs will go away. {quote} Good point. We will use follow up JIRAs to add SCM and DN tokens for other command types. This one focus on Om block token check improvement but allows future extension for SCM/DN tokens. > Ozone Block Token verify should not apply to all datanode cmd > - > > Key: HDDS-2321 > URL: https://issues.apache.org/jira/browse/HDDS-2321 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Affects Versions: 0.4.1 >Reporter: Nilotpal Nandi >Assignee: Xiaoyu Yao >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > DN container protocol has cmd send from SCM or other DN, which do not bear OM > block token like OM client. We should restrict the OM Block token check only > for those issued from OM client. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2321) Ozone Block Token verify should not apply to all datanode cmd
[ https://issues.apache.org/jira/browse/HDDS-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDDS-2321: - Status: Patch Available (was: Open) > Ozone Block Token verify should not apply to all datanode cmd > - > > Key: HDDS-2321 > URL: https://issues.apache.org/jira/browse/HDDS-2321 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Affects Versions: 0.4.1 >Reporter: Nilotpal Nandi >Assignee: Xiaoyu Yao >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > DN container protocol has cmd send from SCM or other DN, which do not bear OM > block token like OM client. We should restrict the OM Block token check only > for those issued from OM client. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-2321) Ozone Block Token verify should not apply to all datanode cmd
[ https://issues.apache.org/jira/browse/HDDS-2321?focusedWorklogId=337100=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337100 ] ASF GitHub Bot logged work on HDDS-2321: Author: ASF GitHub Bot Created on: 31/Oct/19 23:47 Start Date: 31/Oct/19 23:47 Worklog Time Spent: 10m Work Description: xiaoyuyao commented on pull request #110: HDDS-2321. Ozone Block Token verify should not apply to all datanode … URL: https://github.com/apache/hadoop-ozone/pull/110 ## What changes were proposed in this pull request? * Change the TokenVerifier interface to check the command type and the block id. * Token verification based on token encode in the command done inside HddsDispatcher. * Remove the Grpc Client/Server CredentialInterceptor as it cannot fit into Ratis commands. * Added more unit test coverage on the Tokenverifier. * ## What is the link to the Apache JIRA https://issues.apache.org/jira/browse/HDDS-2321 ## How was this patch tested? Added Unit test testBlockTokenVerifier() Update Unit test in TestSecureContainerServer.java ozone secure smoke test. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337100) Remaining Estimate: 0h Time Spent: 10m > Ozone Block Token verify should not apply to all datanode cmd > - > > Key: HDDS-2321 > URL: https://issues.apache.org/jira/browse/HDDS-2321 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Affects Versions: 0.4.1 >Reporter: Nilotpal Nandi >Assignee: Xiaoyu Yao >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > DN container protocol has cmd send from SCM or other DN, which do not bear OM > block token like OM client. We should restrict the OM Block token check only > for those issued from OM client. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2321) Ozone Block Token verify should not apply to all datanode cmd
[ https://issues.apache.org/jira/browse/HDDS-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-2321: - Labels: pull-request-available (was: ) > Ozone Block Token verify should not apply to all datanode cmd > - > > Key: HDDS-2321 > URL: https://issues.apache.org/jira/browse/HDDS-2321 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Affects Versions: 0.4.1 >Reporter: Nilotpal Nandi >Assignee: Xiaoyu Yao >Priority: Major > Labels: pull-request-available > > DN container protocol has cmd send from SCM or other DN, which do not bear OM > block token like OM client. We should restrict the OM Block token check only > for those issued from OM client. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-1847) Datanode Kerberos principal and keytab config key looks inconsistent
[ https://issues.apache.org/jira/browse/HDDS-1847?focusedWorklogId=337099=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337099 ] ASF GitHub Bot logged work on HDDS-1847: Author: ASF GitHub Bot Created on: 31/Oct/19 23:45 Start Date: 31/Oct/19 23:45 Worklog Time Spent: 10m Work Description: xiaoyuyao commented on issue #1678: HDDS-1847: Datanode Kerberos principal and keytab config key looks inconsistent URL: https://github.com/apache/hadoop/pull/1678#issuecomment-548612304 The order of the initialization in StorageContainerManagerHttpServer cause NPE after this change, which failed secure acceptance tests. We can use HDDS-2393 to track the fix. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337099) Time Spent: 1h 20m (was: 1h 10m) > Datanode Kerberos principal and keytab config key looks inconsistent > > > Key: HDDS-1847 > URL: https://issues.apache.org/jira/browse/HDDS-1847 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Affects Versions: 0.5.0 >Reporter: Eric Yang >Assignee: Chris Teoh >Priority: Major > Labels: newbie, pull-request-available > Fix For: 0.5.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Ozone Kerberos configuration can be very confusing: > | config name | Description | > | hdds.scm.kerberos.principal | SCM service principal | > | hdds.scm.kerberos.keytab.file | SCM service keytab file | > | ozone.om.kerberos.principal | Ozone Manager service principal | > | ozone.om.kerberos.keytab.file | Ozone Manager keytab file | > | hdds.scm.http.kerberos.principal | SCM service spnego principal | > | hdds.scm.http.kerberos.keytab.file | SCM service spnego keytab file | > | ozone.om.http.kerberos.principal | Ozone Manager spnego principal | > | ozone.om.http.kerberos.keytab.file | Ozone Manager spnego keytab file | > | hdds.datanode.http.kerberos.keytab | Datanode spnego keytab file | > | hdds.datanode.http.kerberos.principal | Datanode spnego principal | > | dfs.datanode.kerberos.principal | Datanode service principal | > | dfs.datanode.keytab.file | Datanode service keytab file | > The prefix are very different for each of the datanode configuration. It > would be nice to have some consistency for datanode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14794) [SBN read] reportBadBlock is rejected by Observer.
[ https://issues.apache.org/jira/browse/HDFS-14794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964455#comment-16964455 ] Wei-Chiu Chuang commented on HDFS-14794: reportBadBlock is initiated by client or DataNode when they detect a block is bad. And then it is up to the active NameNode to schedule a replica for the bad block / invalidation. There would be a period of time where observer thinks a bad block is still good. But hopefully the duration is short. Client would retry other replicas IIRC. It's might be okay for observer to process reportBadBlock and mark a block replica corrupt but it should not try to schedule block replication/ invalidation. > [SBN read] reportBadBlock is rejected by Observer. > -- > > Key: HDFS-14794 > URL: https://issues.apache.org/jira/browse/HDFS-14794 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 2.10.0 >Reporter: Konstantin Shvachko >Priority: Major > > {{reportBadBlock}} is rejected by Observer via StandbyException > {code}StandbyException: Operation category WRITE is not supported in state > observer{code} > We should investigate what are the consequences of this and if we should > treat {{reportBadBlock}} as IBRs. Note that {{reportBadBlock}} is a part of > both {{ClientProtocol}} and {{DatanodeProtocol}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14794) [SBN read] reportBadBlock is rejected by Observer.
[ https://issues.apache.org/jira/browse/HDFS-14794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964455#comment-16964455 ] Wei-Chiu Chuang edited comment on HDFS-14794 at 10/31/19 11:41 PM: --- reportBadBlock is initiated by client or DataNode when they detect a block is bad. And then it is up to the active NameNode to schedule a replica for the bad block / invalidation. There would be a period of time where observer thinks a bad block is still good. But hopefully the duration is short. Client would retry other replicas IIRC. It's might be okay for observer to process reportBadBlock and mark a block replica corrupt but it should not try to schedule block replication/ invalidation. In summary, I don't think it affect correctness. Maybe a little drop in availability. was (Author: jojochuang): reportBadBlock is initiated by client or DataNode when they detect a block is bad. And then it is up to the active NameNode to schedule a replica for the bad block / invalidation. There would be a period of time where observer thinks a bad block is still good. But hopefully the duration is short. Client would retry other replicas IIRC. It's might be okay for observer to process reportBadBlock and mark a block replica corrupt but it should not try to schedule block replication/ invalidation. > [SBN read] reportBadBlock is rejected by Observer. > -- > > Key: HDFS-14794 > URL: https://issues.apache.org/jira/browse/HDFS-14794 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 2.10.0 >Reporter: Konstantin Shvachko >Priority: Major > > {{reportBadBlock}} is rejected by Observer via StandbyException > {code}StandbyException: Operation category WRITE is not supported in state > observer{code} > We should investigate what are the consequences of this and if we should > treat {{reportBadBlock}} as IBRs. Note that {{reportBadBlock}} is a part of > both {{ClientProtocol}} and {{DatanodeProtocol}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14927) RBF: Add metrics for async callers thread pool
[ https://issues.apache.org/jira/browse/HDFS-14927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964451#comment-16964451 ] Hadoop QA commented on HDFS-14927: -- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 54s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 36s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 40s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 26s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 56s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 47s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 54s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 7m 3s{color} | {color:green} hadoop-hdfs-rbf in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 28s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 64m 44s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 | | JIRA Issue | HDFS-14927 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12984556/HDFS-14927.009.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 68e07677bdb1 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / f9b99d2 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_222 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/28216/testReport/ | | Max. process+thread count | 2738 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs-rbf U: hadoop-hdfs-project/hadoop-hdfs-rbf | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/28216/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > RBF: Add metrics for async callers thread pool > -- > > Key: HDFS-14927 > URL:
[jira] [Commented] (HDFS-14720) DataNode shouldn't report block as bad block if the block length is Long.MAX_VALUE.
[ https://issues.apache.org/jira/browse/HDFS-14720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964450#comment-16964450 ] Wei-Chiu Chuang commented on HDFS-14720: BTW I think the fix is correct. +1 from me. > DataNode shouldn't report block as bad block if the block length is > Long.MAX_VALUE. > --- > > Key: HDFS-14720 > URL: https://issues.apache.org/jira/browse/HDFS-14720 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 3.1.1 >Reporter: Surendra Singh Lilhore >Assignee: hemanthboyina >Priority: Major > Attachments: HDFS-14720.001.patch > > > {noformat} > 2019-08-11 09:15:58,092 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Can't replicate block > BP-725378529-10.0.0.8-1410027444173:blk_13276745777_1112363330268 because > on-disk length 175085 is shorter than NameNode recorded length > 9223372036854775807.{noformat} > If the block length is Long.MAX_VALUE, means file belongs to this block is > deleted from the namenode and DN got the command after deletion of file. In > this case command should be ignored. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14720) DataNode shouldn't report block as bad block if the block length is Long.MAX_VALUE.
[ https://issues.apache.org/jira/browse/HDFS-14720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964443#comment-16964443 ] Wei-Chiu Chuang commented on HDFS-14720: I don't think this fix is relevant to HDFS-14794. This one was meant to solve a corner case. > DataNode shouldn't report block as bad block if the block length is > Long.MAX_VALUE. > --- > > Key: HDFS-14720 > URL: https://issues.apache.org/jira/browse/HDFS-14720 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 3.1.1 >Reporter: Surendra Singh Lilhore >Assignee: hemanthboyina >Priority: Major > Attachments: HDFS-14720.001.patch > > > {noformat} > 2019-08-11 09:15:58,092 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Can't replicate block > BP-725378529-10.0.0.8-1410027444173:blk_13276745777_1112363330268 because > on-disk length 175085 is shorter than NameNode recorded length > 9223372036854775807.{noformat} > If the block length is Long.MAX_VALUE, means file belongs to this block is > deleted from the namenode and DN got the command after deletion of file. In > this case command should be ignored. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14720) DataNode shouldn't report block as bad block if the block length is Long.MAX_VALUE.
[ https://issues.apache.org/jira/browse/HDFS-14720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964438#comment-16964438 ] Konstantin Shvachko commented on HDFS-14720: Hey guys, could you explain how this fixes the {{reportBadBlock()}} issue from HDFS-14794? > DataNode shouldn't report block as bad block if the block length is > Long.MAX_VALUE. > --- > > Key: HDFS-14720 > URL: https://issues.apache.org/jira/browse/HDFS-14720 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 3.1.1 >Reporter: Surendra Singh Lilhore >Assignee: hemanthboyina >Priority: Major > Attachments: HDFS-14720.001.patch > > > {noformat} > 2019-08-11 09:15:58,092 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > Can't replicate block > BP-725378529-10.0.0.8-1410027444173:blk_13276745777_1112363330268 because > on-disk length 175085 is shorter than NameNode recorded length > 9223372036854775807.{noformat} > If the block length is Long.MAX_VALUE, means file belongs to this block is > deleted from the namenode and DN got the command after deletion of file. In > this case command should be ignored. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2395) Handle Ozone S3 completeMPU to match with aws s3 behavior.
[ https://issues.apache.org/jira/browse/HDDS-2395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham updated HDDS-2395: - Summary: Handle Ozone S3 completeMPU to match with aws s3 behavior. (was: Handle completeMPU scenarios to match with aws s3 behavior.) > Handle Ozone S3 completeMPU to match with aws s3 behavior. > -- > > Key: HDDS-2395 > URL: https://issues.apache.org/jira/browse/HDDS-2395 > Project: Hadoop Distributed Data Store > Issue Type: Task >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > # When uploaded 2 parts, and when complete upload 1 part no error > # During complete multipart upload name/part number not matching with > uploaded part and part number then InvalidPart error > # When parts are not specified in sorted order InvalidPartOrder > # During complete multipart upload when no uploaded parts, and we specify > some parts then also InvalidPart > # Uploaded parts 1,2,3 and during complete we can do upload 1,3 (No error) > # When part 3 uploaded, complete with part 3 can be done -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-2388) Teragen test failure due to OM exception
[ https://issues.apache.org/jira/browse/HDDS-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964434#comment-16964434 ] Aravindan Vijayan commented on HDDS-2388: - [~shashikant] Is the OM crashing due to this error? This is on a different thread than the OM read/write path. > Teragen test failure due to OM exception > > > Key: HDDS-2388 > URL: https://issues.apache.org/jira/browse/HDDS-2388 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Affects Versions: 0.5.0 >Reporter: Shashikant Banerjee >Priority: Major > Fix For: 0.5.0 > > > Ran into below exception while running teragen: > {code:java} > Unable to get delta updates since sequenceNumber 79932 > org.rocksdb.RocksDBException: Requested sequence not yet written in the db > at org.rocksdb.RocksDB.getUpdatesSince(Native Method) > at org.rocksdb.RocksDB.getUpdatesSince(RocksDB.java:3587) > at > org.apache.hadoop.hdds.utils.db.RDBStore.getUpdatesSince(RDBStore.java:338) > at > org.apache.hadoop.ozone.om.OzoneManager.getDBUpdates(OzoneManager.java:3283) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.getOMDBUpdates(OzoneManagerRequestHandler.java:404) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.handle(OzoneManagerRequestHandler.java:314) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequestDirectlyToOM(OzoneManagerProtocolServerSideTranslatorPB.java:219) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(OzoneManagerProtocolServerSideTranslatorPB.java:134) > at > org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:72) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(OzoneManagerProtocolServerSideTranslatorPB.java:102) > at > org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneManagerService$2.callBlockingMethod(OzoneManagerProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:984) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:912) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2882) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-2364) Add a OM metrics to find the false positive rate for the keyMayExist
[ https://issues.apache.org/jira/browse/HDDS-2364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964433#comment-16964433 ] Aravindan Vijayan commented on HDDS-2364: - [~msingh] Thanks for the review. I will raise follow up JIRAs for metrics that are not already exposed through RocksDB. > Add a OM metrics to find the false positive rate for the keyMayExist > > > Key: HDDS-2364 > URL: https://issues.apache.org/jira/browse/HDDS-2364 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Affects Versions: 0.5.0 >Reporter: Mukul Kumar Singh >Assignee: Aravindan Vijayan >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Add a OM metrics to find the false positive rate for the keyMayExist. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14927) RBF: Add metrics for async callers thread pool
[ https://issues.apache.org/jira/browse/HDFS-14927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964431#comment-16964431 ] Hadoop QA commented on HDFS-14927: -- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 52s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 38s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 57s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 52s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 19s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 7m 36s{color} | {color:green} hadoop-hdfs-rbf in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 29s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 64m 6s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 | | JIRA Issue | HDFS-14927 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12984553/HDFS-14927.008.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 5950634d174e 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / f9b99d2 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_222 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/28215/testReport/ | | Max. process+thread count | 2751 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs-rbf U: hadoop-hdfs-project/hadoop-hdfs-rbf | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/28215/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > RBF: Add metrics for async callers thread pool > -- > > Key: HDFS-14927 > URL:
[jira] [Work logged] (HDDS-2395) Handle completeMPU scenarios to match with aws s3 behavior.
[ https://issues.apache.org/jira/browse/HDDS-2395?focusedWorklogId=337084=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337084 ] ASF GitHub Bot logged work on HDDS-2395: Author: ASF GitHub Bot Created on: 31/Oct/19 22:45 Start Date: 31/Oct/19 22:45 Worklog Time Spent: 10m Work Description: bharatviswa504 commented on pull request #109: HDDS-2395. Handle completeMPU scenarios to match with aws s3 behavior. URL: https://github.com/apache/hadoop-ozone/pull/109 ## What changes were proposed in this pull request? Fix few cases which were missed during complete Multipart upload. When uploaded 2 parts, and when complete upload 1 part no error During complete multipart upload name/part number not matching with uploaded part and part number then InvalidPart error When parts are not specified in sorted order InvalidPartOrder During complete multipart upload when no uploaded parts, and we specify some parts then also InvalidPart Uploaded parts 1,2,3 and during complete we can do upload 1,3 (No error) When part 3 uploaded, complete with part 3 can be done ## What is the link to the Apache JIRA https://issues.apache.org/jira/browse/HDDS-2395 ## How was this patch tested? Ran S3 smoke tests and also added smoke tests. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337084) Remaining Estimate: 0h Time Spent: 10m > Handle completeMPU scenarios to match with aws s3 behavior. > --- > > Key: HDDS-2395 > URL: https://issues.apache.org/jira/browse/HDDS-2395 > Project: Hadoop Distributed Data Store > Issue Type: Task >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > # When uploaded 2 parts, and when complete upload 1 part no error > # During complete multipart upload name/part number not matching with > uploaded part and part number then InvalidPart error > # When parts are not specified in sorted order InvalidPartOrder > # During complete multipart upload when no uploaded parts, and we specify > some parts then also InvalidPart > # Uploaded parts 1,2,3 and during complete we can do upload 1,3 (No error) > # When part 3 uploaded, complete with part 3 can be done -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2395) Handle completeMPU scenarios to match with aws s3 behavior.
[ https://issues.apache.org/jira/browse/HDDS-2395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-2395: - Labels: pull-request-available (was: ) > Handle completeMPU scenarios to match with aws s3 behavior. > --- > > Key: HDDS-2395 > URL: https://issues.apache.org/jira/browse/HDDS-2395 > Project: Hadoop Distributed Data Store > Issue Type: Task >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > Labels: pull-request-available > > # When uploaded 2 parts, and when complete upload 1 part no error > # During complete multipart upload name/part number not matching with > uploaded part and part number then InvalidPart error > # When parts are not specified in sorted order InvalidPartOrder > # During complete multipart upload when no uploaded parts, and we specify > some parts then also InvalidPart > # Uploaded parts 1,2,3 and during complete we can do upload 1,3 (No error) > # When part 3 uploaded, complete with part 3 can be done -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-2395) Handle completeMPU scenarios to match with aws s3 behavior.
Bharat Viswanadham created HDDS-2395: Summary: Handle completeMPU scenarios to match with aws s3 behavior. Key: HDDS-2395 URL: https://issues.apache.org/jira/browse/HDDS-2395 Project: Hadoop Distributed Data Store Issue Type: Task Reporter: Bharat Viswanadham # When uploaded 2 parts, and when complete upload 1 part no error # During complete multipart upload name/part number not matching with uploaded part and part number then InvalidPart error # When parts are not specified in sorted order InvalidPartOrder # During complete multipart upload when no uploaded parts, and we specify some parts then also InvalidPart # Uploaded parts 1,2,3 and during complete we can do upload 1,3 (No error) # When part 3 uploaded, complete with part 3 can be done -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-2395) Handle completeMPU scenarios to match with aws s3 behavior.
[ https://issues.apache.org/jira/browse/HDDS-2395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham reassigned HDDS-2395: Assignee: Bharat Viswanadham > Handle completeMPU scenarios to match with aws s3 behavior. > --- > > Key: HDDS-2395 > URL: https://issues.apache.org/jira/browse/HDDS-2395 > Project: Hadoop Distributed Data Store > Issue Type: Task >Reporter: Bharat Viswanadham >Assignee: Bharat Viswanadham >Priority: Major > > # When uploaded 2 parts, and when complete upload 1 part no error > # During complete multipart upload name/part number not matching with > uploaded part and part number then InvalidPart error > # When parts are not specified in sorted order InvalidPartOrder > # During complete multipart upload when no uploaded parts, and we specify > some parts then also InvalidPart > # Uploaded parts 1,2,3 and during complete we can do upload 1,3 (No error) > # When part 3 uploaded, complete with part 3 can be done -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14927) RBF: Add metrics for async callers thread pool
[ https://issues.apache.org/jira/browse/HDFS-14927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Leon Gao updated HDFS-14927: Attachment: HDFS-14927.009.patch > RBF: Add metrics for async callers thread pool > -- > > Key: HDFS-14927 > URL: https://issues.apache.org/jira/browse/HDFS-14927 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rbf >Reporter: Leon Gao >Assignee: Leon Gao >Priority: Minor > Attachments: HDFS-14927.001.patch, HDFS-14927.002.patch, > HDFS-14927.003.patch, HDFS-14927.004.patch, HDFS-14927.005.patch, > HDFS-14927.006.patch, HDFS-14927.007.patch, HDFS-14927.008.patch, > HDFS-14927.009.patch > > > It is good to add some monitoring on the async caller thread pool to handle > fan-out RPC client requests, so we know the utilization and when to bump up > dfs.federation.router.client.thread-size -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-2394) Ozone allows bucket name with underscore to be created but throws an error during put key operation
Vivek Ratnavel Subramanian created HDDS-2394: Summary: Ozone allows bucket name with underscore to be created but throws an error during put key operation Key: HDDS-2394 URL: https://issues.apache.org/jira/browse/HDDS-2394 Project: Hadoop Distributed Data Store Issue Type: Bug Components: Ozone Manager Affects Versions: 0.4.1 Reporter: Vivek Ratnavel Subramanian Assignee: Vivek Ratnavel Subramanian Steps to reproduce: aws s3api --endpoint http://localhost:9878 create-bucket --bucket ozone_test aws s3api --endpoint http://localhost:9878 put-object --bucket ozone_test --key ozone-site.xml --body /etc/hadoop/conf/ozone-site.xml S3 gateway throws a warning: {code:java} javax.servlet.ServletException: javax.servlet.ServletException: java.lang.IllegalArgumentException: Bucket or Volume name has an unsupported character : _ at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:139) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) at org.eclipse.jetty.server.Server.handle(Server.java:539) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:333) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108) at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671) at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589) at java.lang.Thread.run(Thread.java:748) Caused by: javax.servlet.ServletException: java.lang.IllegalArgumentException: Bucket or Volume name has an unsupported character : _ at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:432) at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:370) at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:389) at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:342) at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:229) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:840) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1780) at org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1628) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1767) at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1767) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:583) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:513) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119) ... 13 more {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-2393) HDDS-1847 broke some unit tests
[ https://issues.apache.org/jira/browse/HDDS-2393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Teoh reassigned HDDS-2393: Assignee: Chris Teoh Description: Siyao Meng commented on HDDS-1847: -- Looks like this commit breaks {{TestKeyManagerImpl}} in {{setUp()}} and {{cleanup()}}. Run {{TestKeyManagerImpl#testListStatus()}} to steadily repro. I believe there could be other tests that are broken by this. {code} java.lang.NullPointerException at org.apache.hadoop.hdds.scm.server.StorageContainerManagerHttpServer.getSpnegoPrincipal(StorageContainerManagerHttpServer.java:74) at org.apache.hadoop.hdds.server.BaseHttpServer.(BaseHttpServer.java:81) at org.apache.hadoop.hdds.scm.server.StorageContainerManagerHttpServer.(StorageContainerManagerHttpServer.java:36) at org.apache.hadoop.hdds.scm.server.StorageContainerManager.(StorageContainerManager.java:330) at org.apache.hadoop.hdds.scm.TestUtils.getScm(TestUtils.java:544) at org.apache.hadoop.ozone.om.TestKeyManagerImpl.setUp(TestKeyManagerImpl.java:150) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.junit.runner.JUnitCore.run(JUnitCore.java:160) at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) at com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47) at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242) at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70) {code} {code} java.lang.NullPointerException at org.apache.hadoop.ozone.om.TestKeyManagerImpl.cleanup(TestKeyManagerImpl.java:176) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:33) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.junit.runner.JUnitCore.run(JUnitCore.java:160) at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) at com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47) at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242) at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70) {code} was: Looks like this commit breaks {{TestKeyManagerImpl}} in {{setUp()}} and {{cleanup()}}. Run {{TestKeyManagerImpl#testListStatus()}} to steadily repro. I believe there could be other tests that are broken by this. {code} java.lang.NullPointerException at org.apache.hadoop.hdds.scm.server.StorageContainerManagerHttpServer.getSpnegoPrincipal(StorageContainerManagerHttpServer.java:74) at org.apache.hadoop.hdds.server.BaseHttpServer.(BaseHttpServer.java:81) at org.apache.hadoop.hdds.scm.server.StorageContainerManagerHttpServer.(StorageContainerManagerHttpServer.java:36) at org.apache.hadoop.hdds.scm.server.StorageContainerManager.(StorageContainerManager.java:330) at org.apache.hadoop.hdds.scm.TestUtils.getScm(TestUtils.java:544) at org.apache.hadoop.ozone.om.TestKeyManagerImpl.setUp(TestKeyManagerImpl.java:150) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at
[jira] [Assigned] (HDFS-13689) NameNodeRpcServer getEditsFromTxid assumes it is run on active NameNode
[ https://issues.apache.org/jira/browse/HDFS-13689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen reassigned HDFS-13689: -- Assignee: (was: Erik Krogen) > NameNodeRpcServer getEditsFromTxid assumes it is run on active NameNode > --- > > Key: HDFS-13689 > URL: https://issues.apache.org/jira/browse/HDFS-13689 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs, namenode >Reporter: Erik Krogen >Priority: Major > > {{NameNodeRpcServer#getEditsFromTxid}} currently decides which transactions > are able to be served, i.e. which transactions are durable, using the > following logic: > {code} > long syncTxid = log.getSyncTxId(); > // If we haven't synced anything yet, we can only read finalized > // segments since we can't reliably determine which txns in in-progress > // segments have actually been committed (e.g. written to a quorum of > JNs). > // If we have synced txns, we can definitely read up to syncTxid since > // syncTxid is only updated after a transaction is committed to all > // journals. (In-progress segments written by old writers are already > // discarded for us, so if we read any in-progress segments they are > // guaranteed to have been written by this NameNode.) > boolean readInProgress = syncTxid > 0; > {code} > This assumes that the NameNode serving this request is the current > writer/active NameNode, which may not be true in the ObserverNode situation. > Since {{selectInputStreams}} now has a {{onlyDurableTxns}} flag, which, if > enabled, will only return durable/committed transactions, we can instead > leverage this to provide the same functionality. We should utilize this to > avoid consistency issues when serving this request from the ObserverNode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-2393) HDDS-1847 broke some unit tests
Chris Teoh created HDDS-2393: Summary: HDDS-1847 broke some unit tests Key: HDDS-2393 URL: https://issues.apache.org/jira/browse/HDDS-2393 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Chris Teoh Looks like this commit breaks {{TestKeyManagerImpl}} in {{setUp()}} and {{cleanup()}}. Run {{TestKeyManagerImpl#testListStatus()}} to steadily repro. I believe there could be other tests that are broken by this. {code} java.lang.NullPointerException at org.apache.hadoop.hdds.scm.server.StorageContainerManagerHttpServer.getSpnegoPrincipal(StorageContainerManagerHttpServer.java:74) at org.apache.hadoop.hdds.server.BaseHttpServer.(BaseHttpServer.java:81) at org.apache.hadoop.hdds.scm.server.StorageContainerManagerHttpServer.(StorageContainerManagerHttpServer.java:36) at org.apache.hadoop.hdds.scm.server.StorageContainerManager.(StorageContainerManager.java:330) at org.apache.hadoop.hdds.scm.TestUtils.getScm(TestUtils.java:544) at org.apache.hadoop.ozone.om.TestKeyManagerImpl.setUp(TestKeyManagerImpl.java:150) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.junit.runner.JUnitCore.run(JUnitCore.java:160) at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) at com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47) at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242) at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70) {code} {code} java.lang.NullPointerException at org.apache.hadoop.ozone.om.TestKeyManagerImpl.cleanup(TestKeyManagerImpl.java:176) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:33) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.junit.runner.JUnitCore.run(JUnitCore.java:160) at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) at com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47) at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242) at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14927) RBF: Add metrics for async callers thread pool
[ https://issues.apache.org/jira/browse/HDFS-14927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964407#comment-16964407 ] Íñigo Goiri commented on HDFS-14927: I don't think you need to catch the exception and then throw it. Just having the finally should be enough. The exception will just surface. > RBF: Add metrics for async callers thread pool > -- > > Key: HDFS-14927 > URL: https://issues.apache.org/jira/browse/HDFS-14927 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rbf >Reporter: Leon Gao >Assignee: Leon Gao >Priority: Minor > Attachments: HDFS-14927.001.patch, HDFS-14927.002.patch, > HDFS-14927.003.patch, HDFS-14927.004.patch, HDFS-14927.005.patch, > HDFS-14927.006.patch, HDFS-14927.007.patch, HDFS-14927.008.patch > > > It is good to add some monitoring on the async caller thread pool to handle > fan-out RPC client requests, so we know the utilization and when to bump up > dfs.federation.router.client.thread-size -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-14443) Throwing RemoteException in the time of Read Operation
[ https://issues.apache.org/jira/browse/HDFS-14443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko resolved HDFS-14443. Resolution: Not A Problem Resolving as not a problem. Please reopen if its. > Throwing RemoteException in the time of Read Operation > -- > > Key: HDFS-14443 > URL: https://issues.apache.org/jira/browse/HDFS-14443 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ranith Sardar >Priority: Major > > 2019-04-19 20:54:59,178 DEBUG > org.apache.hadoop.io.retry.RetryInvocationHandler: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): > Operation category WRITE is not supported in state observer. Visit > [https://s.apache.org/sbnn-error] > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:98) > at > org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1990) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1443) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.msync(NameNodeRpcServer.java:1372) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.msync(ClientNamenodeProtocolServerSideTranslatorPB.java:1929) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:531) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1036) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:927) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:862) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2791) > , while invoking $Proxy5.getFileInfo over > [host-*-*-*-*/*.*.*.*:6*5,host-*-*-*-*/*.*.*.*:**,host-*-*-*-*/*.*.*.*:6**5]. > Trying to failover immediately. > > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): > Operation category WRITE is not supported in state observer. Visit > [https://s.apache.org/sbnn-error] > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:98) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-14020) Emulate Observer node falling far behind the Active
[ https://issues.apache.org/jira/browse/HDFS-14020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko resolved HDFS-14020. Resolution: Duplicate Resolving as duplicate since HDFS-13873 introduced {{testObserverFallBehind()}} in {{TestMultiObserverNode}}, which serves the purpose. This has also been already tested on live clusters. > Emulate Observer node falling far behind the Active > --- > > Key: HDFS-14020 > URL: https://issues.apache.org/jira/browse/HDFS-14020 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Sherwood Zheng >Assignee: Sherwood Zheng >Priority: Major > > Emulate Observer node falling far behind the Active. Ensure readers switch > over > to another Observer instead of waiting for the lagging Observer to catch up. > If > there is only a single Observer, it should fall back to the Active. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14927) RBF: Add metrics for async callers thread pool
[ https://issues.apache.org/jira/browse/HDFS-14927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Leon Gao updated HDFS-14927: Attachment: HDFS-14927.008.patch > RBF: Add metrics for async callers thread pool > -- > > Key: HDFS-14927 > URL: https://issues.apache.org/jira/browse/HDFS-14927 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rbf >Reporter: Leon Gao >Assignee: Leon Gao >Priority: Minor > Attachments: HDFS-14927.001.patch, HDFS-14927.002.patch, > HDFS-14927.003.patch, HDFS-14927.004.patch, HDFS-14927.005.patch, > HDFS-14927.006.patch, HDFS-14927.007.patch, HDFS-14927.008.patch > > > It is good to add some monitoring on the async caller thread pool to handle > fan-out RPC client requests, so we know the utilization and when to bump up > dfs.federation.router.client.thread-size -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-1847) Datanode Kerberos principal and keytab config key looks inconsistent
[ https://issues.apache.org/jira/browse/HDDS-1847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964390#comment-16964390 ] Anu Engineer commented on HDDS-1847: interesting, [~chris.t...@gmail.com] can you please take a look when you get a chance.? > Datanode Kerberos principal and keytab config key looks inconsistent > > > Key: HDDS-1847 > URL: https://issues.apache.org/jira/browse/HDDS-1847 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Affects Versions: 0.5.0 >Reporter: Eric Yang >Assignee: Chris Teoh >Priority: Major > Labels: newbie, pull-request-available > Fix For: 0.5.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Ozone Kerberos configuration can be very confusing: > | config name | Description | > | hdds.scm.kerberos.principal | SCM service principal | > | hdds.scm.kerberos.keytab.file | SCM service keytab file | > | ozone.om.kerberos.principal | Ozone Manager service principal | > | ozone.om.kerberos.keytab.file | Ozone Manager keytab file | > | hdds.scm.http.kerberos.principal | SCM service spnego principal | > | hdds.scm.http.kerberos.keytab.file | SCM service spnego keytab file | > | ozone.om.http.kerberos.principal | Ozone Manager spnego principal | > | ozone.om.http.kerberos.keytab.file | Ozone Manager spnego keytab file | > | hdds.datanode.http.kerberos.keytab | Datanode spnego keytab file | > | hdds.datanode.http.kerberos.principal | Datanode spnego principal | > | dfs.datanode.kerberos.principal | Datanode service principal | > | dfs.datanode.keytab.file | Datanode service keytab file | > The prefix are very different for each of the datanode configuration. It > would be nice to have some consistency for datanode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-1847) Datanode Kerberos principal and keytab config key looks inconsistent
[ https://issues.apache.org/jira/browse/HDDS-1847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964374#comment-16964374 ] Siyao Meng commented on HDDS-1847: -- Looks like this commit breaks {{TestKeyManagerImpl}} in {{setUp()}} and {{cleanup()}}. Run {{TestKeyManagerImpl#testListStatus()}} to steadily repro. I believe there could be other tests that are broken by this. {code} java.lang.NullPointerException at org.apache.hadoop.hdds.scm.server.StorageContainerManagerHttpServer.getSpnegoPrincipal(StorageContainerManagerHttpServer.java:74) at org.apache.hadoop.hdds.server.BaseHttpServer.(BaseHttpServer.java:81) at org.apache.hadoop.hdds.scm.server.StorageContainerManagerHttpServer.(StorageContainerManagerHttpServer.java:36) at org.apache.hadoop.hdds.scm.server.StorageContainerManager.(StorageContainerManager.java:330) at org.apache.hadoop.hdds.scm.TestUtils.getScm(TestUtils.java:544) at org.apache.hadoop.ozone.om.TestKeyManagerImpl.setUp(TestKeyManagerImpl.java:150) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.junit.runner.JUnitCore.run(JUnitCore.java:160) at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) at com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47) at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242) at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70) {code} {code} java.lang.NullPointerException at org.apache.hadoop.ozone.om.TestKeyManagerImpl.cleanup(TestKeyManagerImpl.java:176) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:33) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.junit.runner.JUnitCore.run(JUnitCore.java:160) at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) at com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47) at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242) at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70) {code} > Datanode Kerberos principal and keytab config key looks inconsistent > > > Key: HDDS-1847 > URL: https://issues.apache.org/jira/browse/HDDS-1847 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Affects Versions: 0.5.0 >Reporter: Eric Yang >Assignee: Chris Teoh >Priority: Major > Labels: newbie, pull-request-available > Fix For: 0.5.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Ozone Kerberos configuration can be very confusing: > | config name | Description | > | hdds.scm.kerberos.principal | SCM service principal | > | hdds.scm.kerberos.keytab.file | SCM service keytab file | > | ozone.om.kerberos.principal | Ozone Manager service principal | > | ozone.om.kerberos.keytab.file | Ozone Manager keytab file | > | hdds.scm.http.kerberos.principal | SCM service spnego principal | > | hdds.scm.http.kerberos.keytab.file | SCM service spnego keytab file | > | ozone.om.http.kerberos.principal | Ozone Manager spnego principal | > | ozone.om.http.kerberos.keytab.file | Ozone
[jira] [Updated] (HDDS-2392) Fix TestScmSafeMode#testSCMSafeModeRestrictedOp
[ https://issues.apache.org/jira/browse/HDDS-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanisha Koneru updated HDDS-2392: - Description: After ratis upgrade (HDDS-2340), TestScmSafeMode#testSCMSafeModeRestrictedOp fails as the DNs fail to restart XceiverServerRatis. RaftServer#start() fails with following exception: {code:java} java.io.IOException: java.lang.IllegalStateException: Not started at org.apache.ratis.util.IOUtils.asIOException(IOUtils.java:54) at org.apache.ratis.util.IOUtils.toIOException(IOUtils.java:61) at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:70) at org.apache.ratis.server.impl.RaftServerProxy.getImpls(RaftServerProxy.java:284) at org.apache.ratis.server.impl.RaftServerProxy.start(RaftServerProxy.java:296) at org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis.start(XceiverServerRatis.java:421) at org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.start(OzoneContainer.java:215) at org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:110) at org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:42) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.IllegalStateException: Not started at org.apache.ratis.thirdparty.com.google.common.base.Preconditions.checkState(Preconditions.java:504) at org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.getPort(ServerImpl.java:176) at org.apache.ratis.grpc.server.GrpcService.lambda$new$2(GrpcService.java:143) at org.apache.ratis.util.MemoizedSupplier.get(MemoizedSupplier.java:62) at org.apache.ratis.grpc.server.GrpcService.getInetSocketAddress(GrpcService.java:182) at org.apache.ratis.server.impl.RaftServerImpl.lambda$new$0(RaftServerImpl.java:84) at org.apache.ratis.util.MemoizedSupplier.get(MemoizedSupplier.java:62) at org.apache.ratis.server.impl.RaftServerImpl.getPeer(RaftServerImpl.java:136) at org.apache.ratis.server.impl.RaftServerMetrics.(RaftServerMetrics.java:70) at org.apache.ratis.server.impl.RaftServerMetrics.getRaftServerMetrics(RaftServerMetrics.java:62) at org.apache.ratis.server.impl.RaftServerImpl.(RaftServerImpl.java:119) at org.apache.ratis.server.impl.RaftServerProxy.lambda$newRaftServerImpl$2(RaftServerProxy.java:208) at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590) {code} was: After ratis upgrade (HDDS-2340), TestScmSafeMode#testSCMSafeModeRestrictedOp fails as the DNs fail to restart XceiverServerRatis. RaftServer#start() fails with following exception: {code:java} java.io.IOException: java.lang.IllegalStateException: Not startedjava.io.IOException: java.lang.IllegalStateException: Not started at org.apache.ratis.util.IOUtils.asIOException(IOUtils.java:54) at org.apache.ratis.util.IOUtils.toIOException(IOUtils.java:61) at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:70) at org.apache.ratis.server.impl.RaftServerProxy.getImpls(RaftServerProxy.java:284) at org.apache.ratis.server.impl.RaftServerProxy.start(RaftServerProxy.java:296) at org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis.start(XceiverServerRatis.java:421) at org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.start(OzoneContainer.java:215) at org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:110) at org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:42) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)Caused by: java.lang.IllegalStateException: Not started at org.apache.ratis.thirdparty.com.google.common.base.Preconditions.checkState(Preconditions.java:504) at org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.getPort(ServerImpl.java:176) at org.apache.ratis.grpc.server.GrpcService.lambda$new$2(GrpcService.java:143) at
[jira] [Created] (HDDS-2392) Fix TestScmSafeMode#testSCMSafeModeRestrictedOp
Hanisha Koneru created HDDS-2392: Summary: Fix TestScmSafeMode#testSCMSafeModeRestrictedOp Key: HDDS-2392 URL: https://issues.apache.org/jira/browse/HDDS-2392 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Hanisha Koneru After ratis upgrade (HDDS-2340), TestScmSafeMode#testSCMSafeModeRestrictedOp fails as the DNs fail to restart XceiverServerRatis. RaftServer#start() fails with following exception: {code:java} java.io.IOException: java.lang.IllegalStateException: Not startedjava.io.IOException: java.lang.IllegalStateException: Not started at org.apache.ratis.util.IOUtils.asIOException(IOUtils.java:54) at org.apache.ratis.util.IOUtils.toIOException(IOUtils.java:61) at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:70) at org.apache.ratis.server.impl.RaftServerProxy.getImpls(RaftServerProxy.java:284) at org.apache.ratis.server.impl.RaftServerProxy.start(RaftServerProxy.java:296) at org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis.start(XceiverServerRatis.java:421) at org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.start(OzoneContainer.java:215) at org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:110) at org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:42) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)Caused by: java.lang.IllegalStateException: Not started at org.apache.ratis.thirdparty.com.google.common.base.Preconditions.checkState(Preconditions.java:504) at org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.getPort(ServerImpl.java:176) at org.apache.ratis.grpc.server.GrpcService.lambda$new$2(GrpcService.java:143) at org.apache.ratis.util.MemoizedSupplier.get(MemoizedSupplier.java:62) at org.apache.ratis.grpc.server.GrpcService.getInetSocketAddress(GrpcService.java:182) at org.apache.ratis.server.impl.RaftServerImpl.lambda$new$0(RaftServerImpl.java:84) at org.apache.ratis.util.MemoizedSupplier.get(MemoizedSupplier.java:62) at org.apache.ratis.server.impl.RaftServerImpl.getPeer(RaftServerImpl.java:136) at org.apache.ratis.server.impl.RaftServerMetrics.(RaftServerMetrics.java:70) at org.apache.ratis.server.impl.RaftServerMetrics.getRaftServerMetrics(RaftServerMetrics.java:62) at org.apache.ratis.server.impl.RaftServerImpl.(RaftServerImpl.java:119) at org.apache.ratis.server.impl.RaftServerProxy.lambda$newRaftServerImpl$2(RaftServerProxy.java:208) at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14948) Improve HttpFS Server
[ https://issues.apache.org/jira/browse/HDFS-14948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964354#comment-16964354 ] Wei-Chiu Chuang commented on HDFS-14948: [~smeng] and other contributors added a number of REST APIs missing compared to WebHDFS. But I'm pretty sure they're mostly backported in trunk only or 3.x. > Improve HttpFS Server > - > > Key: HDFS-14948 > URL: https://issues.apache.org/jira/browse/HDFS-14948 > Project: Hadoop HDFS > Issue Type: Improvement > Components: httpfs >Reporter: Kihwal Lee >Assignee: Ahmed Hussein >Priority: Major > > We see increasing use of HttpFS as a compatibility bridge and also as a > bridge between different security domains. As it gains more users, people are > finding missing pieces and bugs in it. There already are efforts to tackle > some of these issues and this jira aims to make ongoing works and future > works more coherent. I do not really want to make it as a umbrella jira, but > a place for recording all related works. That way, one can easily figure out > what is missing in their version of HttpFS and what to backport, if necessary. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-2384) Large chunks during write can have memory pressure on DN with multiple clients
[ https://issues.apache.org/jira/browse/HDDS-2384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964348#comment-16964348 ] Anu Engineer commented on HDDS-2384: Thank you for flagging this issue. I think this is a hard to solve problem in the current architecture. I would like to explore some possibilitues on how we can solve this issue. 1. We add the support for a buffer pool inside data node. A buffer pool would a large chunk of memory that data node will pin and internally treat it as a set of buffers that can be used for I/O. When we read or write data, we will also use this buffer pool. That way, we can limit the maximum committed memory that we will end up using for data path. 2. In order to do that, we will now need the ability to read data not in 16 MB chunks, but perhaps in smaller 8KB kind of size(assuming the page size is going to be 8KB in the buffer pool). 3. The advantage of such an approach is that we will read data only as much memory we have, but the network layer still might have to buffer this data. 4. This also allows us to push back against a client that is sending or trying to read too much data from the data node at any given time. Question: Do you think such a change would address this issue ? If you have other suggestions, I would love to hear them. Once more, thank you for flagging this issue. > Large chunks during write can have memory pressure on DN with multiple clients > -- > > Key: HDDS-2384 > URL: https://issues.apache.org/jira/browse/HDDS-2384 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Rajesh Balamohan >Assignee: Anu Engineer >Priority: Major > Labels: performance > > During large file writes, it ends up writing {{16 MB}} chunks. > https://github.com/apache/hadoop-ozone/blob/master/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java#L691 > In large clusters, 100s of clients may connect to DN. In such cases, > depending on the incoming write workload mem load on DN can increase > significantly. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14884) Add sanity check that zone key equals feinfo key while setting Xattrs
[ https://issues.apache.org/jira/browse/HDFS-14884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964345#comment-16964345 ] Hadoop QA commented on HDFS-14884: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 22m 34s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} branch-2 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 6s{color} | {color:green} branch-2 passed {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 15s{color} | {color:red} hadoop-hdfs in branch-2 failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 48s{color} | {color:green} branch-2 passed with JDK v1.8.0_222 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 29s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 55s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 3s{color} | {color:green} branch-2 passed {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 16s{color} | {color:red} hadoop-hdfs in branch-2 failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 44s{color} | {color:green} branch-2 passed with JDK v1.8.0_222 {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 12s{color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 12s{color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s{color} | {color:green} the patch passed with JDK v1.8.0_222 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 8s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 13s{color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 41s{color} | {color:green} the patch passed with JDK v1.8.0_222 {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 70m 32s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 27s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}116m 23s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.balancer.TestBalancerRPCDelay | | | hadoop.hdfs.qjournal.server.TestJournalNodeRespectsBindHostKeys | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:f555aa740b5 | | JIRA Issue | HDFS-14884 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12984512/HDFS-14884-branch-2.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 4ca513ad11a9 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | branch-2 / a36dbe6 | | maven |
[jira] [Assigned] (HDDS-2384) Large chunks during write can have memory pressure on DN with multiple clients
[ https://issues.apache.org/jira/browse/HDDS-2384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anu Engineer reassigned HDDS-2384: -- Assignee: Anu Engineer > Large chunks during write can have memory pressure on DN with multiple clients > -- > > Key: HDDS-2384 > URL: https://issues.apache.org/jira/browse/HDDS-2384 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Rajesh Balamohan >Assignee: Anu Engineer >Priority: Major > Labels: performance > > During large file writes, it ends up writing {{16 MB}} chunks. > https://github.com/apache/hadoop-ozone/blob/master/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java#L691 > In large clusters, 100s of clients may connect to DN. In such cases, > depending on the incoming write workload mem load on DN can increase > significantly. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14920) Erasure Coding: Decommission may hang If one or more datanodes are out of service during decommission
[ https://issues.apache.org/jira/browse/HDFS-14920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964315#comment-16964315 ] Hudson commented on HDFS-14920: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17590 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17590/]) HDFS-14920. Erasure Coding: Decommission may hang If one or more (ayushsaxena: rev 9d25ae7669eed1a047578b574f42bd121b445a3c) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/PendingReconstructionBlocks.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/NumberReplicas.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManagerTestUtil.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDecommissionWithStriped.java > Erasure Coding: Decommission may hang If one or more datanodes are out of > service during decommission > --- > > Key: HDFS-14920 > URL: https://issues.apache.org/jira/browse/HDFS-14920 > Project: Hadoop HDFS > Issue Type: Bug > Components: ec >Affects Versions: 3.0.3, 3.2.1, 3.1.3 >Reporter: Fei Hui >Assignee: Fei Hui >Priority: Major > Fix For: 3.3.0, 3.1.4, 3.2.2 > > Attachments: HDFS-14920.001.patch, HDFS-14920.002.patch, > HDFS-14920.003.patch, HDFS-14920.004.patch, HDFS-14920.005.patch > > > Decommission test hangs in our clusters. > Have seen the messages as follow > {quote} > 2019-10-22 15:58:51,514 TRACE > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager: Block > blk_-9223372035600425840_372987973 numExpected=9, numLive=5 > 2019-10-22 15:58:51,514 INFO BlockStateChange: Block: > blk_-9223372035600425840_372987973, Expected Replicas: 9, live replicas: 5, > corrupt replicas: 0, decommissioned replicas: 0, decommissioning replicas: 4, > maintenance replicas: 0, live entering maintenance replicas: 0, excess > replicas: 0, Is Open File: false, Datanodes having this block: > 10.255.43.57:50010 10.255.53.12:50010 10.255.63.12:50010 10.255.62.39:50010 > 10.255.37.36:50010 10.255.33.15:50010 10.255.69.29:50010 10.255.51.13:50010 > 10.255.64.15:50010 , Current Datanode: 10.255.69.29:50010, Is current > datanode decommissioning: true, Is current datanode entering maintenance: > false > 2019-10-22 15:58:51,514 DEBUG > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager: Node > 10.255.69.29:50010 still has 1 blocks to replicate before it is a candidate > to finish Decommission In Progress > {quote} > After digging the source code and cluster log, guess it happens as follow > steps. > # Storage strategy is RS-6-3-1024k. > # EC block b consists of b0, b1, b2, b3, b4, b5, b6, b7, b8, b0 is from > datanode dn0, b1 is from datanode dn1, ...etc > # At the beginning dn0 is in decommission progress, b0 is replicated > successfully, and dn0 is staill in decommission progress. > # Later b1, b2, b3 in decommission progress, and dn4 containing b4 is out of > service, so need to reconstruct, and create ErasureCodingWork to do it, in > the ErasureCodingWork, additionalReplRequired is 4 > # Because hasAllInternalBlocks is false, Will call > ErasureCodingWork#addTaskToDatanode -> > DatanodeDescriptor#addBlockToBeErasureCoded, and send > BlockECReconstructionInfo task to Datanode > # DataNode can not reconstruction the block because targets is 4, greater > than 3( parity number). > There is a problem as follow, from BlockManager.java#scheduleReconstruction > {code} > // should reconstruct all the internal blocks before scheduling > // replication task for decommissioning node(s). > if (additionalReplRequired - numReplicas.decommissioning() - > numReplicas.liveEnteringMaintenanceReplicas() > 0) { > additionalReplRequired = additionalReplRequired - > numReplicas.decommissioning() - > numReplicas.liveEnteringMaintenanceReplicas(); > } > {code} > Should reconstruction firstly and then replicate for decommissioning. Because > numReplicas.decommissioning() is 4, and additionalReplRequired is 4, that's > wrong, > numReplicas.decommissioning() should be 3, it should exclude live replica. > If so, additionalReplRequired will be 1, reconstruction will schedule as > expected. After that, decommission goes on. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
[jira] [Updated] (HDFS-14936) Add getNumOfChildren() for interface InnerNode
[ https://issues.apache.org/jira/browse/HDFS-14936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ayush Saxena updated HDFS-14936: Fix Version/s: 3.3.0 Hadoop Flags: Reviewed Resolution: Fixed Status: Resolved (was: Patch Available) > Add getNumOfChildren() for interface InnerNode > -- > > Key: HDFS-14936 > URL: https://issues.apache.org/jira/browse/HDFS-14936 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Lisheng Sun >Assignee: Lisheng Sun >Priority: Minor > Fix For: 3.3.0 > > Attachments: HDFS-14936.001.patch, HDFS-14936.002.patch, > HDFS-14936.003.patch > > > current code InnerNode subclass InnerNodeImpl and DFSTopologyNodeImpl both > have getNumOfChildren(). > so Add getNumOfChildren() for interface InnerNode and remove unnessary > getNumOfChildren() in DFSTopologyNodeImpl. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-1847) Datanode Kerberos principal and keytab config key looks inconsistent
[ https://issues.apache.org/jira/browse/HDDS-1847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anu Engineer resolved HDDS-1847. Fix Version/s: 0.5.0 Resolution: Fixed I have committed this patch to the master branch. > Datanode Kerberos principal and keytab config key looks inconsistent > > > Key: HDDS-1847 > URL: https://issues.apache.org/jira/browse/HDDS-1847 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Affects Versions: 0.5.0 >Reporter: Eric Yang >Assignee: Chris Teoh >Priority: Major > Labels: newbie, pull-request-available > Fix For: 0.5.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Ozone Kerberos configuration can be very confusing: > | config name | Description | > | hdds.scm.kerberos.principal | SCM service principal | > | hdds.scm.kerberos.keytab.file | SCM service keytab file | > | ozone.om.kerberos.principal | Ozone Manager service principal | > | ozone.om.kerberos.keytab.file | Ozone Manager keytab file | > | hdds.scm.http.kerberos.principal | SCM service spnego principal | > | hdds.scm.http.kerberos.keytab.file | SCM service spnego keytab file | > | ozone.om.http.kerberos.principal | Ozone Manager spnego principal | > | ozone.om.http.kerberos.keytab.file | Ozone Manager spnego keytab file | > | hdds.datanode.http.kerberos.keytab | Datanode spnego keytab file | > | hdds.datanode.http.kerberos.principal | Datanode spnego principal | > | dfs.datanode.kerberos.principal | Datanode service principal | > | dfs.datanode.keytab.file | Datanode service keytab file | > The prefix are very different for each of the datanode configuration. It > would be nice to have some consistency for datanode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-1847) Datanode Kerberos principal and keytab config key looks inconsistent
[ https://issues.apache.org/jira/browse/HDDS-1847?focusedWorklogId=336990=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-336990 ] ASF GitHub Bot logged work on HDDS-1847: Author: ASF GitHub Bot Created on: 31/Oct/19 18:32 Start Date: 31/Oct/19 18:32 Worklog Time Spent: 10m Work Description: anuengineer commented on issue #1678: HDDS-1847: Datanode Kerberos principal and keytab config key looks inconsistent URL: https://github.com/apache/hadoop/pull/1678#issuecomment-548510969 I have committed this patch into hadoop-ozone branch. Not apache:trunk. Here is the commit info. commit 8527a9d9ceb0e1b2ba3bfc8ebc06e7589135f7f3 (HEAD -> master, origin/master, origin/HEAD) Author: Anu Engineer Date: Thu Oct 31 11:19:54 2019 -0700 HDDS-1847: Datanode Kerberos principal and keytab config key looks inconsistent Contributed by christeoh. @christeoh Thank you for the contribution. @macroadster Thanks for comments and filing the JIRA. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 336990) Time Spent: 1h (was: 50m) > Datanode Kerberos principal and keytab config key looks inconsistent > > > Key: HDDS-1847 > URL: https://issues.apache.org/jira/browse/HDDS-1847 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Affects Versions: 0.5.0 >Reporter: Eric Yang >Assignee: Chris Teoh >Priority: Major > Labels: newbie, pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > Ozone Kerberos configuration can be very confusing: > | config name | Description | > | hdds.scm.kerberos.principal | SCM service principal | > | hdds.scm.kerberos.keytab.file | SCM service keytab file | > | ozone.om.kerberos.principal | Ozone Manager service principal | > | ozone.om.kerberos.keytab.file | Ozone Manager keytab file | > | hdds.scm.http.kerberos.principal | SCM service spnego principal | > | hdds.scm.http.kerberos.keytab.file | SCM service spnego keytab file | > | ozone.om.http.kerberos.principal | Ozone Manager spnego principal | > | ozone.om.http.kerberos.keytab.file | Ozone Manager spnego keytab file | > | hdds.datanode.http.kerberos.keytab | Datanode spnego keytab file | > | hdds.datanode.http.kerberos.principal | Datanode spnego principal | > | dfs.datanode.kerberos.principal | Datanode service principal | > | dfs.datanode.keytab.file | Datanode service keytab file | > The prefix are very different for each of the datanode configuration. It > would be nice to have some consistency for datanode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-1847) Datanode Kerberos principal and keytab config key looks inconsistent
[ https://issues.apache.org/jira/browse/HDDS-1847?focusedWorklogId=336991=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-336991 ] ASF GitHub Bot logged work on HDDS-1847: Author: ASF GitHub Bot Created on: 31/Oct/19 18:32 Start Date: 31/Oct/19 18:32 Worklog Time Spent: 10m Work Description: anuengineer commented on pull request #1678: HDDS-1847: Datanode Kerberos principal and keytab config key looks inconsistent URL: https://github.com/apache/hadoop/pull/1678 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 336991) Time Spent: 1h 10m (was: 1h) > Datanode Kerberos principal and keytab config key looks inconsistent > > > Key: HDDS-1847 > URL: https://issues.apache.org/jira/browse/HDDS-1847 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Affects Versions: 0.5.0 >Reporter: Eric Yang >Assignee: Chris Teoh >Priority: Major > Labels: newbie, pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > Ozone Kerberos configuration can be very confusing: > | config name | Description | > | hdds.scm.kerberos.principal | SCM service principal | > | hdds.scm.kerberos.keytab.file | SCM service keytab file | > | ozone.om.kerberos.principal | Ozone Manager service principal | > | ozone.om.kerberos.keytab.file | Ozone Manager keytab file | > | hdds.scm.http.kerberos.principal | SCM service spnego principal | > | hdds.scm.http.kerberos.keytab.file | SCM service spnego keytab file | > | ozone.om.http.kerberos.principal | Ozone Manager spnego principal | > | ozone.om.http.kerberos.keytab.file | Ozone Manager spnego keytab file | > | hdds.datanode.http.kerberos.keytab | Datanode spnego keytab file | > | hdds.datanode.http.kerberos.principal | Datanode spnego principal | > | dfs.datanode.kerberos.principal | Datanode service principal | > | dfs.datanode.keytab.file | Datanode service keytab file | > The prefix are very different for each of the datanode configuration. It > would be nice to have some consistency for datanode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14920) Erasure Coding: Decommission may hang If one or more datanodes are out of service during decommission
[ https://issues.apache.org/jira/browse/HDFS-14920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ayush Saxena updated HDFS-14920: Fix Version/s: 3.2.2 3.1.4 3.3.0 Hadoop Flags: Reviewed Resolution: Fixed Status: Resolved (was: Patch Available) Committed to trunk. Thanx [~ferhui] for the contribution!!! > Erasure Coding: Decommission may hang If one or more datanodes are out of > service during decommission > --- > > Key: HDFS-14920 > URL: https://issues.apache.org/jira/browse/HDFS-14920 > Project: Hadoop HDFS > Issue Type: Bug > Components: ec >Affects Versions: 3.0.3, 3.2.1, 3.1.3 >Reporter: Fei Hui >Assignee: Fei Hui >Priority: Major > Fix For: 3.3.0, 3.1.4, 3.2.2 > > Attachments: HDFS-14920.001.patch, HDFS-14920.002.patch, > HDFS-14920.003.patch, HDFS-14920.004.patch, HDFS-14920.005.patch > > > Decommission test hangs in our clusters. > Have seen the messages as follow > {quote} > 2019-10-22 15:58:51,514 TRACE > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager: Block > blk_-9223372035600425840_372987973 numExpected=9, numLive=5 > 2019-10-22 15:58:51,514 INFO BlockStateChange: Block: > blk_-9223372035600425840_372987973, Expected Replicas: 9, live replicas: 5, > corrupt replicas: 0, decommissioned replicas: 0, decommissioning replicas: 4, > maintenance replicas: 0, live entering maintenance replicas: 0, excess > replicas: 0, Is Open File: false, Datanodes having this block: > 10.255.43.57:50010 10.255.53.12:50010 10.255.63.12:50010 10.255.62.39:50010 > 10.255.37.36:50010 10.255.33.15:50010 10.255.69.29:50010 10.255.51.13:50010 > 10.255.64.15:50010 , Current Datanode: 10.255.69.29:50010, Is current > datanode decommissioning: true, Is current datanode entering maintenance: > false > 2019-10-22 15:58:51,514 DEBUG > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager: Node > 10.255.69.29:50010 still has 1 blocks to replicate before it is a candidate > to finish Decommission In Progress > {quote} > After digging the source code and cluster log, guess it happens as follow > steps. > # Storage strategy is RS-6-3-1024k. > # EC block b consists of b0, b1, b2, b3, b4, b5, b6, b7, b8, b0 is from > datanode dn0, b1 is from datanode dn1, ...etc > # At the beginning dn0 is in decommission progress, b0 is replicated > successfully, and dn0 is staill in decommission progress. > # Later b1, b2, b3 in decommission progress, and dn4 containing b4 is out of > service, so need to reconstruct, and create ErasureCodingWork to do it, in > the ErasureCodingWork, additionalReplRequired is 4 > # Because hasAllInternalBlocks is false, Will call > ErasureCodingWork#addTaskToDatanode -> > DatanodeDescriptor#addBlockToBeErasureCoded, and send > BlockECReconstructionInfo task to Datanode > # DataNode can not reconstruction the block because targets is 4, greater > than 3( parity number). > There is a problem as follow, from BlockManager.java#scheduleReconstruction > {code} > // should reconstruct all the internal blocks before scheduling > // replication task for decommissioning node(s). > if (additionalReplRequired - numReplicas.decommissioning() - > numReplicas.liveEnteringMaintenanceReplicas() > 0) { > additionalReplRequired = additionalReplRequired - > numReplicas.decommissioning() - > numReplicas.liveEnteringMaintenanceReplicas(); > } > {code} > Should reconstruction firstly and then replicate for decommissioning. Because > numReplicas.decommissioning() is 4, and additionalReplRequired is 4, that's > wrong, > numReplicas.decommissioning() should be 3, it should exclude live replica. > If so, additionalReplRequired will be 1, reconstruction will schedule as > expected. After that, decommission goes on. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2375) Refactor BlockOutputStream to allow flexible buffering
[ https://issues.apache.org/jira/browse/HDDS-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz-wo Sze updated HDDS-2375: - Description: In HDDS-2331, we found that Ozone client allocates a ByteBuffer with chunk size (e.g. 16MB ) to store data, unregarded the actual data size. The ByteBuffer will create a byte[] with chunk size. When the ByteBuffer is wrapped to a ByteString the byte[] remains in the ByteString. As a result, when the actual data size is small (e.g. 1MB), a lot of memory spaces (15MB) are wasted. In this JIRA, we refactor BlockOutputStream so that the buffering becomes more flexible. In a later JIRA (HDDS-2386), we implement chunk buffer using a list of smaller buffers which are allocated only if needed. was: In HDDS-2331, we found that Ozone client allocates a ByteBuffer with chunk size (e.g. 16MB ) to store data, unregarded the actual data size. The ByteBuffer will create a byte[] with chunk size. When the ByteBuffer is wrapped to a ByteString the byte[] remains in the ByteString. As a result, when the actual data size is small (e.g. 1MB), a lot of memory spaces (15MB) are wasted. In this JIRA, we refactor BlockOutputStream so that the buffering becomes more flexible. In a later JIRA, we could implement chunk buffer using a list of smaller buffers which are allocated only if needed. > Refactor BlockOutputStream to allow flexible buffering > -- > > Key: HDDS-2375 > URL: https://issues.apache.org/jira/browse/HDDS-2375 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Ozone Client >Reporter: Tsz-wo Sze >Assignee: Tsz-wo Sze >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > In HDDS-2331, we found that Ozone client allocates a ByteBuffer with chunk > size (e.g. 16MB ) to store data, unregarded the actual data size. The > ByteBuffer will create a byte[] with chunk size. When the ByteBuffer is > wrapped to a ByteString the byte[] remains in the ByteString. > As a result, when the actual data size is small (e.g. 1MB), a lot of memory > spaces (15MB) are wasted. > In this JIRA, we refactor BlockOutputStream so that the buffering becomes > more flexible. In a later JIRA (HDDS-2386), we implement chunk buffer using > a list of smaller buffers which are allocated only if needed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14936) Add getNumOfChildren() for interface InnerNode
[ https://issues.apache.org/jira/browse/HDFS-14936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964301#comment-16964301 ] Hudson commented on HDFS-14936: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17589 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17589/]) HDFS-14936. Add getNumOfChildren() for interface InnerNode. Contributed (ayushsaxena: rev d9fbedc4ae41d3dc688cf6b697f0fb46a28b47c5) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/net/TestNetworkTopology.java * (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/InnerNodeImpl.java * (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/InnerNode.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/net/DFSTopologyNodeImpl.java > Add getNumOfChildren() for interface InnerNode > -- > > Key: HDFS-14936 > URL: https://issues.apache.org/jira/browse/HDFS-14936 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Lisheng Sun >Assignee: Lisheng Sun >Priority: Minor > Attachments: HDFS-14936.001.patch, HDFS-14936.002.patch, > HDFS-14936.003.patch > > > current code InnerNode subclass InnerNodeImpl and DFSTopologyNodeImpl both > have getNumOfChildren(). > so Add getNumOfChildren() for interface InnerNode and remove unnessary > getNumOfChildren() in DFSTopologyNodeImpl. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-2386) Implement incremental ChunkBuffer
[ https://issues.apache.org/jira/browse/HDDS-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964298#comment-16964298 ] Tsz-wo Sze commented on HDDS-2386: -- o2386_20191031b.patch: fixes test failures. > Implement incremental ChunkBuffer > - > > Key: HDDS-2386 > URL: https://issues.apache.org/jira/browse/HDDS-2386 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Ozone Client >Reporter: Tsz-wo Sze >Assignee: Tsz-wo Sze >Priority: Major > Attachments: o2386_20191030.patch, o2386_20191031b.patch > > > HDDS-2375 introduces a ChunkBuffer for flexible buffering. In this JIRA, we > implement ChunkBuffer with an incremental buffering so that the memory spaces > are allocated incrementally. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-2388) Teragen test failure due to OM exception
[ https://issues.apache.org/jira/browse/HDDS-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964297#comment-16964297 ] Bharat Viswanadham commented on HDDS-2388: -- If WAL has been cleared, then we get this error. Not sure if it is critical error or not for recon. Tagging [~avijayan]. Shashi is this causing test failure means, can you provide more info. (As I see this API is only used by recon) > Teragen test failure due to OM exception > > > Key: HDDS-2388 > URL: https://issues.apache.org/jira/browse/HDDS-2388 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Affects Versions: 0.5.0 >Reporter: Shashikant Banerjee >Priority: Major > Fix For: 0.5.0 > > > Ran into below exception while running teragen: > {code:java} > Unable to get delta updates since sequenceNumber 79932 > org.rocksdb.RocksDBException: Requested sequence not yet written in the db > at org.rocksdb.RocksDB.getUpdatesSince(Native Method) > at org.rocksdb.RocksDB.getUpdatesSince(RocksDB.java:3587) > at > org.apache.hadoop.hdds.utils.db.RDBStore.getUpdatesSince(RDBStore.java:338) > at > org.apache.hadoop.ozone.om.OzoneManager.getDBUpdates(OzoneManager.java:3283) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.getOMDBUpdates(OzoneManagerRequestHandler.java:404) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.handle(OzoneManagerRequestHandler.java:314) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequestDirectlyToOM(OzoneManagerProtocolServerSideTranslatorPB.java:219) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(OzoneManagerProtocolServerSideTranslatorPB.java:134) > at > org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:72) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(OzoneManagerProtocolServerSideTranslatorPB.java:102) > at > org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneManagerService$2.callBlockingMethod(OzoneManagerProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:984) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:912) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2882) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-2370) Remove classpath in RunningWithHDFS.md ozone-hdfs/docker-compose as dir 'ozoneplugin' is not exist anymore
[ https://issues.apache.org/jira/browse/HDDS-2370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964296#comment-16964296 ] Anu Engineer commented on HDDS-2370: Yeap, I am fine with removing it. I don't think we are testing or running the plugin inside HDFS any more. > Remove classpath in RunningWithHDFS.md ozone-hdfs/docker-compose as dir > 'ozoneplugin' is not exist anymore > -- > > Key: HDDS-2370 > URL: https://issues.apache.org/jira/browse/HDDS-2370 > Project: Hadoop Distributed Data Store > Issue Type: Task > Components: documentation >Reporter: luhuachao >Priority: Major > Labels: pull-request-available > Attachments: HDDS-2370.1.patch > > Time Spent: 10m > Remaining Estimate: 0h > > In RunningWithHDFS.md > {code:java} > export > HADOOP_CLASSPATH=/opt/ozone/share/hadoop/ozoneplugin/hadoop-ozone-datanode-plugin.jar{code} > ozone-hdfs/docker-compose.yaml > > {code:java} > environment: > HADOOP_CLASSPATH: /opt/ozone/share/hadoop/ozoneplugin/*.jar > {code} > when i run hddsdatanodeservice as pulgin in hdfs datanode, it comes out with > the error below , there is no constructor without parameter. > > > {code:java} > 2019-10-21 21:38:56,391 ERROR datanode.DataNode > (DataNode.java:startPlugins(972)) - Unable to load DataNode plugins. > Specified list of plugins: org.apache.hadoop.ozone.HddsDatanodeService > java.lang.RuntimeException: java.lang.NoSuchMethodException: > org.apache.hadoop.ozone.HddsDatanodeService.() > {code} > what i doubt is that, ozone-0.5 not support running as a plugin in hdfs > datanode now ? if so, > why donnot we remove doc RunningWithHDFS.md ? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2386) Implement incremental ChunkBuffer
[ https://issues.apache.org/jira/browse/HDDS-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz-wo Sze updated HDDS-2386: - Attachment: o2386_20191031b.patch > Implement incremental ChunkBuffer > - > > Key: HDDS-2386 > URL: https://issues.apache.org/jira/browse/HDDS-2386 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Ozone Client >Reporter: Tsz-wo Sze >Assignee: Tsz-wo Sze >Priority: Major > Attachments: o2386_20191030.patch, o2386_20191031b.patch > > > HDDS-2375 introduces a ChunkBuffer for flexible buffering. In this JIRA, we > implement ChunkBuffer with an incremental buffering so that the memory spaces > are allocated incrementally. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14920) Erasure Coding: Decommission may hang If one or more datanodes are out of service during decommission
[ https://issues.apache.org/jira/browse/HDFS-14920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964286#comment-16964286 ] Ayush Saxena commented on HDFS-14920: - Thanx [~ferhui] for the patch. v005 LGTM +1 > Erasure Coding: Decommission may hang If one or more datanodes are out of > service during decommission > --- > > Key: HDFS-14920 > URL: https://issues.apache.org/jira/browse/HDFS-14920 > Project: Hadoop HDFS > Issue Type: Bug > Components: ec >Affects Versions: 3.0.3, 3.2.1, 3.1.3 >Reporter: Fei Hui >Assignee: Fei Hui >Priority: Major > Attachments: HDFS-14920.001.patch, HDFS-14920.002.patch, > HDFS-14920.003.patch, HDFS-14920.004.patch, HDFS-14920.005.patch > > > Decommission test hangs in our clusters. > Have seen the messages as follow > {quote} > 2019-10-22 15:58:51,514 TRACE > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager: Block > blk_-9223372035600425840_372987973 numExpected=9, numLive=5 > 2019-10-22 15:58:51,514 INFO BlockStateChange: Block: > blk_-9223372035600425840_372987973, Expected Replicas: 9, live replicas: 5, > corrupt replicas: 0, decommissioned replicas: 0, decommissioning replicas: 4, > maintenance replicas: 0, live entering maintenance replicas: 0, excess > replicas: 0, Is Open File: false, Datanodes having this block: > 10.255.43.57:50010 10.255.53.12:50010 10.255.63.12:50010 10.255.62.39:50010 > 10.255.37.36:50010 10.255.33.15:50010 10.255.69.29:50010 10.255.51.13:50010 > 10.255.64.15:50010 , Current Datanode: 10.255.69.29:50010, Is current > datanode decommissioning: true, Is current datanode entering maintenance: > false > 2019-10-22 15:58:51,514 DEBUG > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager: Node > 10.255.69.29:50010 still has 1 blocks to replicate before it is a candidate > to finish Decommission In Progress > {quote} > After digging the source code and cluster log, guess it happens as follow > steps. > # Storage strategy is RS-6-3-1024k. > # EC block b consists of b0, b1, b2, b3, b4, b5, b6, b7, b8, b0 is from > datanode dn0, b1 is from datanode dn1, ...etc > # At the beginning dn0 is in decommission progress, b0 is replicated > successfully, and dn0 is staill in decommission progress. > # Later b1, b2, b3 in decommission progress, and dn4 containing b4 is out of > service, so need to reconstruct, and create ErasureCodingWork to do it, in > the ErasureCodingWork, additionalReplRequired is 4 > # Because hasAllInternalBlocks is false, Will call > ErasureCodingWork#addTaskToDatanode -> > DatanodeDescriptor#addBlockToBeErasureCoded, and send > BlockECReconstructionInfo task to Datanode > # DataNode can not reconstruction the block because targets is 4, greater > than 3( parity number). > There is a problem as follow, from BlockManager.java#scheduleReconstruction > {code} > // should reconstruct all the internal blocks before scheduling > // replication task for decommissioning node(s). > if (additionalReplRequired - numReplicas.decommissioning() - > numReplicas.liveEnteringMaintenanceReplicas() > 0) { > additionalReplRequired = additionalReplRequired - > numReplicas.decommissioning() - > numReplicas.liveEnteringMaintenanceReplicas(); > } > {code} > Should reconstruction firstly and then replicate for decommissioning. Because > numReplicas.decommissioning() is 4, and additionalReplRequired is 4, that's > wrong, > numReplicas.decommissioning() should be 3, it should exclude live replica. > If so, additionalReplRequired will be 1, reconstruction will schedule as > expected. After that, decommission goes on. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14936) Add getNumOfChildren() for interface InnerNode
[ https://issues.apache.org/jira/browse/HDFS-14936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964283#comment-16964283 ] Ayush Saxena commented on HDFS-14936: - Committed to trunk. Thanx [~leosun08] for the contribution [~elgoiri] and [~smeng] for the reviews!!! > Add getNumOfChildren() for interface InnerNode > -- > > Key: HDFS-14936 > URL: https://issues.apache.org/jira/browse/HDFS-14936 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Lisheng Sun >Assignee: Lisheng Sun >Priority: Minor > Attachments: HDFS-14936.001.patch, HDFS-14936.002.patch, > HDFS-14936.003.patch > > > current code InnerNode subclass InnerNodeImpl and DFSTopologyNodeImpl both > have getNumOfChildren(). > so Add getNumOfChildren() for interface InnerNode and remove unnessary > getNumOfChildren() in DFSTopologyNodeImpl. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14284) RBF: Log Router identifier when reporting exceptions
[ https://issues.apache.org/jira/browse/HDFS-14284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964271#comment-16964271 ] Ayush Saxena commented on HDFS-14284: - Thanx [~hemanthboyina] for the patch. As I said before, please give a check to the RouterIOException by extracting from the RemoteException. You can get the RemoteException as : {code:java} RemoteException re = LambdaTestUtils.intercept(RemoteException.class, "Cannot locate a registered namenode for ns0 from " + routerContext.getRouter().getRouterId(), () -> routerProtocol.addBlock(testPath, clientName, newBlock, null, 1, null, null)); RouterIOException rioe = (RouterIOException) re.unwrapRemoteException(RouterIOException.class); rioe.getMessage(); // Have assertion checks for this and similarly for routerID {code} You can do something like this, To manually unwrap, you need to have a constructor with just {{String}} as param.Else it shall throw {{NoMethodException}}. You can create one and set the message and RouterID into it and then try. I had a quick rough try it worked. Give a try, if you face issues, Let me know. I will try help write. > RBF: Log Router identifier when reporting exceptions > > > Key: HDFS-14284 > URL: https://issues.apache.org/jira/browse/HDFS-14284 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Íñigo Goiri >Assignee: hemanthboyina >Priority: Major > Attachments: HDFS-14284.001.patch, HDFS-14284.002.patch, > HDFS-14284.003.patch, HDFS-14284.004.patch, HDFS-14284.005.patch, > HDFS-14284.006.patch, HDFS-14284.007.patch, HDFS-14284.008.patch > > > The typical setup is to use multiple Routers through > ConfiguredFailoverProxyProvider. > In a regular HA Namenode setup, it is easy to know which NN was used. > However, in RBF, any Router can be the one reporting the exception and it is > hard to know which was the one. > We should have a way to identify which Router/Namenode was the one triggering > the exception. > This would also apply with Observer Namenodes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org