[jira] Commented: (HDFS-1583) Improve backup-node sync performance by wrapping RPC parameters
[ https://issues.apache.org/jira/browse/HDFS-1583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981176#action_12981176 ] Liyin Liang commented on HDFS-1583: --- Hi Todd, This is mainly caused by the serialization of array. The job is done by : {code:} ObjectWritable::writeObject(DataOutput out, Object instance, Class declaredClass, Configuration conf) {code} This function traverses the array and serialize each element as an object. According to my test, an byte array with 8000 elements will grow up to 56008 elements after serialization (2.4ms). However, a wrapped object size is 8094 after serialization (0.03ms). By the way, there is a array wrapper class already: {code:} public class ArrayWritable implements Writable {code} This class is used in FSEditLog to log operations, e.g. FSEditLog::logMkDir(String path, INode newNode). I'll update the patch to use ArrayWritable. Improve backup-node sync performance by wrapping RPC parameters --- Key: HDFS-1583 URL: https://issues.apache.org/jira/browse/HDFS-1583 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Reporter: Liyin Liang Fix For: 0.23.0 Attachments: HDFS-1583-1.patch The journal edit records are sent by the active name-node to the backup-node with RPC: {code:} public void journal(NamenodeRegistration registration, int jAction, int length, byte[] records) throws IOException; {code} During the name-node throughput benchmark, the size of byte array _records_ is around *8000*. Then the serialization and deserialization is time-consuming. I wrote a simple application to test RPC with byte array parameter. When the size got to 8000, each RPC call need about 6 ms. While name-node sync 8k byte to local disk only need 0.3~0.4ms. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-884) DataNode makeInstance should report the directory list when failing to start up
[ https://issues.apache.org/jira/browse/HDFS-884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981212#action_12981212 ] Steve Loughran commented on HDFS-884: - no need to thank me, all I provided was a deployment where none of the directories were valid :) DataNode makeInstance should report the directory list when failing to start up --- Key: HDFS-884 URL: https://issues.apache.org/jira/browse/HDFS-884 Project: Hadoop HDFS Issue Type: Improvement Components: data-node Affects Versions: 0.22.0 Reporter: Steve Loughran Assignee: Steve Loughran Priority: Minor Fix For: 0.22.0 Attachments: HDFS-884.patch, HDFS-884.patch, InvalidDirs.patch, InvalidDirs.patch, InvalidDirs.patch When {{Datanode.makeInstance()}} cannot work with one of the directories in dfs.data.dir, it logs this at warn level (while losing the stack trace). It should include the nested exception for better troubleshooting. Then, when all dirs in the list fail, an exception is thrown, but this exception does not include the list of directories. It should list the absolute path of every missing/failing directory, so that whoever sees the exception can see where to start looking for problems: either the filesystem or the configuration. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1583) Improve backup-node sync performance by wrapping RPC parameters
[ https://issues.apache.org/jira/browse/HDFS-1583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981382#action_12981382 ] Hadoop QA commented on HDFS-1583: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12468257/HDFS-1583-2.patch against trunk revision 1058402. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.cli.TestHDFSCLI org.apache.hadoop.hdfs.server.datanode.TestBlockRecovery org.apache.hadoop.hdfs.server.namenode.TestStorageRestore -1 contrib tests. The patch failed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/105//testReport/ Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/105//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/105//console This message is automatically generated. Improve backup-node sync performance by wrapping RPC parameters --- Key: HDFS-1583 URL: https://issues.apache.org/jira/browse/HDFS-1583 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Reporter: Liyin Liang Fix For: 0.23.0 Attachments: HDFS-1583-1.patch, HDFS-1583-2.patch The journal edit records are sent by the active name-node to the backup-node with RPC: {code:} public void journal(NamenodeRegistration registration, int jAction, int length, byte[] records) throws IOException; {code} During the name-node throughput benchmark, the size of byte array _records_ is around *8000*. Then the serialization and deserialization is time-consuming. I wrote a simple application to test RPC with byte array parameter. When the size got to 8000, each RPC call need about 6 ms. While name-node sync 8k byte to local disk only need 0.3~0.4ms. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1583) Improve backup-node sync performance by wrapping RPC parameters
[ https://issues.apache.org/jira/browse/HDFS-1583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981404#action_12981404 ] Konstantin Shvachko commented on HDFS-1583: --- Liyin, This is a nice optimization, and thanks for measuring the performance. I think this is critical for 0.22 release. Writing edits to BN should not be slower than writing to disk. byte array with 8000 elements will grow up to 56008 elements after serialization This makes RPC very inefficient. Java arrays can not hold different type instances (see ArrayStoreException for refs), so serializing type name for each element does not make sense. Type should be stored only once for the entire array. Does anybody remember discussions or jiras open for it? We need to decide if it should be fixed in RPC or locally for BackupNode only. Seems that RPC level fix would optimize communication in general, but will be massively backward incompatible. It could be a good time to do that now before the major release. Improve backup-node sync performance by wrapping RPC parameters --- Key: HDFS-1583 URL: https://issues.apache.org/jira/browse/HDFS-1583 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Reporter: Liyin Liang Fix For: 0.23.0 Attachments: HDFS-1583-1.patch, HDFS-1583-2.patch The journal edit records are sent by the active name-node to the backup-node with RPC: {code:} public void journal(NamenodeRegistration registration, int jAction, int length, byte[] records) throws IOException; {code} During the name-node throughput benchmark, the size of byte array _records_ is around *8000*. Then the serialization and deserialization is time-consuming. I wrote a simple application to test RPC with byte array parameter. When the size got to 8000, each RPC call need about 6 ms. While name-node sync 8k byte to local disk only need 0.3~0.4ms. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1583) Improve backup-node sync performance by wrapping RPC parameters
[ https://issues.apache.org/jira/browse/HDFS-1583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981415#action_12981415 ] Todd Lipcon commented on HDFS-1583: --- We did this optimization for the RPC layer in HBase long ago (HBASE-82). Here's the current code: https://github.com/apache/hbase/blob/trunk/src/main/java/org/apache/hadoop/hbase/io/HbaseObjectWritable.java#L387 Is there a way to do the change to ObjectWritable in a way that the new version can still read old data? Improve backup-node sync performance by wrapping RPC parameters --- Key: HDFS-1583 URL: https://issues.apache.org/jira/browse/HDFS-1583 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Reporter: Liyin Liang Fix For: 0.23.0 Attachments: HDFS-1583-1.patch, HDFS-1583-2.patch The journal edit records are sent by the active name-node to the backup-node with RPC: {code:} public void journal(NamenodeRegistration registration, int jAction, int length, byte[] records) throws IOException; {code} During the name-node throughput benchmark, the size of byte array _records_ is around *8000*. Then the serialization and deserialization is time-consuming. I wrote a simple application to test RPC with byte array parameter. When the size got to 8000, each RPC call need about 6 ms. While name-node sync 8k byte to local disk only need 0.3~0.4ms. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HDFS-1583) Improve backup-node sync performance by wrapping RPC parameters
[ https://issues.apache.org/jira/browse/HDFS-1583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon reassigned HDFS-1583: - Assignee: Liyin Liang Improve backup-node sync performance by wrapping RPC parameters --- Key: HDFS-1583 URL: https://issues.apache.org/jira/browse/HDFS-1583 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Reporter: Liyin Liang Assignee: Liyin Liang Fix For: 0.23.0 Attachments: HDFS-1583-1.patch, HDFS-1583-2.patch The journal edit records are sent by the active name-node to the backup-node with RPC: {code:} public void journal(NamenodeRegistration registration, int jAction, int length, byte[] records) throws IOException; {code} During the name-node throughput benchmark, the size of byte array _records_ is around *8000*. Then the serialization and deserialization is time-consuming. I wrote a simple application to test RPC with byte array parameter. When the size got to 8000, each RPC call need about 6 ms. While name-node sync 8k byte to local disk only need 0.3~0.4ms. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1583) Improve backup-node sync performance by wrapping RPC parameters
[ https://issues.apache.org/jira/browse/HDFS-1583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981477#action_12981477 ] Konstantin Shvachko commented on HDFS-1583: --- Looks like there is a jira for that HADOOP-6949. I'll reopen it as we have more benchmarks now and let's move the RPC discussion there. Improve backup-node sync performance by wrapping RPC parameters --- Key: HDFS-1583 URL: https://issues.apache.org/jira/browse/HDFS-1583 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Reporter: Liyin Liang Assignee: Liyin Liang Fix For: 0.23.0 Attachments: HDFS-1583-1.patch, HDFS-1583-2.patch The journal edit records are sent by the active name-node to the backup-node with RPC: {code:} public void journal(NamenodeRegistration registration, int jAction, int length, byte[] records) throws IOException; {code} During the name-node throughput benchmark, the size of byte array _records_ is around *8000*. Then the serialization and deserialization is time-consuming. I wrote a simple application to test RPC with byte array parameter. When the size got to 8000, each RPC call need about 6 ms. While name-node sync 8k byte to local disk only need 0.3~0.4ms. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1572) Checkpointer should trigger checkpoint with specified period.
[ https://issues.apache.org/jira/browse/HDFS-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakob Homan updated HDFS-1572: -- Status: Open (was: Patch Available) Checkpointer should trigger checkpoint with specified period. - Key: HDFS-1572 URL: https://issues.apache.org/jira/browse/HDFS-1572 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.22.0 Reporter: Liyin Liang Priority: Blocker Fix For: 0.21.0 Attachments: 1527-1.diff, 1572-2.diff, HDFS-1572-2.patch, HDFS-1572-3.patch, HDFS-1572.patch {code:} long now = now(); boolean shouldCheckpoint = false; if(now = lastCheckpointTime + periodMSec) { shouldCheckpoint = true; } else { long size = getJournalSize(); if(size = checkpointSize) shouldCheckpoint = true; } {code} {dfs.namenode.checkpoint.period} in configuration determines the period of checkpoint. However, with above code, the Checkpointer triggers a checkpoint every 5 minutes (periodMSec=5*60*1000). According to SecondaryNameNode.java, the first *if* statement should be: {code:} if(now = lastCheckpointTime + 1000 * checkpointPeriod) { {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1572) Checkpointer should trigger checkpoint with specified period.
[ https://issues.apache.org/jira/browse/HDFS-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakob Homan updated HDFS-1572: -- Attachment: HDFS-1572-3.patch Good catch Konstantin. Changed the sleep time to be the minimum of time-based or size-based intervals. This way, regardless of whichever is lower, we'll sleep for the minimum time needed. Checkpointer should trigger checkpoint with specified period. - Key: HDFS-1572 URL: https://issues.apache.org/jira/browse/HDFS-1572 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.22.0 Reporter: Liyin Liang Priority: Blocker Fix For: 0.21.0 Attachments: 1527-1.diff, 1572-2.diff, HDFS-1572-2.patch, HDFS-1572-3.patch, HDFS-1572.patch {code:} long now = now(); boolean shouldCheckpoint = false; if(now = lastCheckpointTime + periodMSec) { shouldCheckpoint = true; } else { long size = getJournalSize(); if(size = checkpointSize) shouldCheckpoint = true; } {code} {dfs.namenode.checkpoint.period} in configuration determines the period of checkpoint. However, with above code, the Checkpointer triggers a checkpoint every 5 minutes (periodMSec=5*60*1000). According to SecondaryNameNode.java, the first *if* statement should be: {code:} if(now = lastCheckpointTime + 1000 * checkpointPeriod) { {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1572) Checkpointer should trigger checkpoint with specified period.
[ https://issues.apache.org/jira/browse/HDFS-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakob Homan updated HDFS-1572: -- Status: Patch Available (was: Open) Hudson! Checkpointer should trigger checkpoint with specified period. - Key: HDFS-1572 URL: https://issues.apache.org/jira/browse/HDFS-1572 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.22.0 Reporter: Liyin Liang Priority: Blocker Fix For: 0.21.0 Attachments: 1527-1.diff, 1572-2.diff, HDFS-1572-2.patch, HDFS-1572-3.patch, HDFS-1572.patch {code:} long now = now(); boolean shouldCheckpoint = false; if(now = lastCheckpointTime + periodMSec) { shouldCheckpoint = true; } else { long size = getJournalSize(); if(size = checkpointSize) shouldCheckpoint = true; } {code} {dfs.namenode.checkpoint.period} in configuration determines the period of checkpoint. However, with above code, the Checkpointer triggers a checkpoint every 5 minutes (periodMSec=5*60*1000). According to SecondaryNameNode.java, the first *if* statement should be: {code:} if(now = lastCheckpointTime + 1000 * checkpointPeriod) { {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1582) Remove auto-generated native build files
[ https://issues.apache.org/jira/browse/HDFS-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981498#action_12981498 ] Eli Collins commented on HDFS-1582: --- Hey Roman, My bad, looks like this error was due to a partially updated libtool package on my system, after rebooting the build with your patch it compiles as expected. Thanks, Eli Remove auto-generated native build files Key: HDFS-1582 URL: https://issues.apache.org/jira/browse/HDFS-1582 Project: Hadoop HDFS Issue Type: Improvement Components: contrib/libhdfs Reporter: Roman Shaposhnik Fix For: 0.23.0 Attachments: HADOOP-6436.patch Original Estimate: 24h Remaining Estimate: 24h The repo currently includes the automake and autoconf generated files for the native build. Per discussion on HADOOP-6421 let's remove them and use the host's automake and autoconf. We should also do this for libhdfs and fuse-dfs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1582) Remove auto-generated native build files
[ https://issues.apache.org/jira/browse/HDFS-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981506#action_12981506 ] Hadoop QA commented on HDFS-1582: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12468203/HADOOP-6436.patch against trunk revision 1057414. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.fs.permission.TestStickyBit org.apache.hadoop.hdfs.security.TestDelegationToken org.apache.hadoop.hdfs.server.common.TestDistributedUpgrade org.apache.hadoop.hdfs.server.datanode.TestBlockReport org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics org.apache.hadoop.hdfs.server.namenode.TestBackupNode org.apache.hadoop.hdfs.server.namenode.TestBlocksWithNotEnoughRacks org.apache.hadoop.hdfs.server.namenode.TestBlockTokenWithDFS org.apache.hadoop.hdfs.server.namenode.TestCheckpoint org.apache.hadoop.hdfs.server.namenode.TestFsck org.apache.hadoop.hdfs.server.namenode.TestNameEditsConfigs org.apache.hadoop.hdfs.server.namenode.TestStorageRestore org.apache.hadoop.hdfs.TestCrcCorruption org.apache.hadoop.hdfs.TestDatanodeBlockScanner org.apache.hadoop.hdfs.TestDatanodeDeath org.apache.hadoop.hdfs.TestDFSClientRetries org.apache.hadoop.hdfs.TestDFSFinalize org.apache.hadoop.hdfs.TestDFSRollback org.apache.hadoop.hdfs.TestDFSShell org.apache.hadoop.hdfs.TestDFSStartupVersions org.apache.hadoop.hdfs.TestDFSStorageStateRecovery org.apache.hadoop.hdfs.TestDFSUpgradeFromImage org.apache.hadoop.hdfs.TestDFSUpgrade org.apache.hadoop.hdfs.TestDistributedFileSystem org.apache.hadoop.hdfs.TestFileAppend2 org.apache.hadoop.hdfs.TestFileAppend3 org.apache.hadoop.hdfs.TestFileAppend4 org.apache.hadoop.hdfs.TestFileAppend org.apache.hadoop.hdfs.TestFileConcurrentReader org.apache.hadoop.hdfs.TestFileCreationNamenodeRestart org.apache.hadoop.hdfs.TestFileCreation org.apache.hadoop.hdfs.TestHDFSFileSystemContract org.apache.hadoop.hdfs.TestHDFSTrash org.apache.hadoop.hdfs.TestPread org.apache.hadoop.hdfs.TestQuota org.apache.hadoop.hdfs.TestReplication org.apache.hadoop.hdfs.TestRestartDFS org.apache.hadoop.hdfs.TestSetrepDecreasing org.apache.hadoop.hdfs.TestSetrepIncreasing org.apache.hadoop.hdfs.TestWriteConfigurationToDFS -1 contrib tests. The patch failed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/101//testReport/ Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/101//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/101//console This message is automatically generated. Remove auto-generated native build files Key: HDFS-1582 URL: https://issues.apache.org/jira/browse/HDFS-1582 Project: Hadoop HDFS Issue Type: Improvement Components: contrib/libhdfs Reporter: Roman Shaposhnik Fix For: 0.23.0 Attachments: HADOOP-6436.patch Original Estimate: 24h Remaining Estimate: 24h The repo currently includes the automake and autoconf generated files for the native build. Per discussion on HADOOP-6421 let's remove them and use the host's automake and autoconf. We should also do this for libhdfs and fuse-dfs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue
[jira] Updated: (HDFS-1547) Improve decommission mechanism
[ https://issues.apache.org/jira/browse/HDFS-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Srinivas updated HDFS-1547: -- Status: Open (was: Patch Available) Improve decommission mechanism -- Key: HDFS-1547 URL: https://issues.apache.org/jira/browse/HDFS-1547 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 0.23.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas Fix For: 0.23.0 Attachments: HDFS-1547.1.patch, HDFS-1547.patch, show-stats-broken.txt Current decommission mechanism driven using exclude file has several issues. This bug proposes some changes in the mechanism for better manageability. See the proposal in the next comment for more details. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1547) Improve decommission mechanism
[ https://issues.apache.org/jira/browse/HDFS-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Srinivas updated HDFS-1547: -- Attachment: HDFS-1547.2.patch Attached patch fixes the following: # Cluster stats updates had a bug. Thanks Todd for pointing it out. I have fixed it and added tests to check this. # Node removed from include file was not being shutdown. I added this functionality back and added a test to test this. # Decommissioning/decommissioned update the stats as given below: #* Node used capacity is counted towards cluster used capacity. #* Node capacity is not counted towards the cluster capacity. Only node used capacity is used counted towards cluster capacity. #* Node capacity remaining is not counted towards cluster capacity remaining. # Cleaned up TestDecommission and moved it to junit4. Improve decommission mechanism -- Key: HDFS-1547 URL: https://issues.apache.org/jira/browse/HDFS-1547 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 0.23.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas Fix For: 0.23.0 Attachments: HDFS-1547.1.patch, HDFS-1547.2.patch, HDFS-1547.patch, show-stats-broken.txt Current decommission mechanism driven using exclude file has several issues. This bug proposes some changes in the mechanism for better manageability. See the proposal in the next comment for more details. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1547) Improve decommission mechanism
[ https://issues.apache.org/jira/browse/HDFS-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Srinivas updated HDFS-1547: -- Status: Patch Available (was: Open) Improve decommission mechanism -- Key: HDFS-1547 URL: https://issues.apache.org/jira/browse/HDFS-1547 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 0.23.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas Fix For: 0.23.0 Attachments: HDFS-1547.1.patch, HDFS-1547.2.patch, HDFS-1547.patch, show-stats-broken.txt Current decommission mechanism driven using exclude file has several issues. This bug proposes some changes in the mechanism for better manageability. See the proposal in the next comment for more details. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1547) Improve decommission mechanism
[ https://issues.apache.org/jira/browse/HDFS-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Srinivas updated HDFS-1547: -- Attachment: HDFS-1547.3.patch Additional change to test cluster stats when a datanode decommissioning is stopped. Improve decommission mechanism -- Key: HDFS-1547 URL: https://issues.apache.org/jira/browse/HDFS-1547 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 0.23.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas Fix For: 0.23.0 Attachments: HDFS-1547.1.patch, HDFS-1547.2.patch, HDFS-1547.3.patch, HDFS-1547.patch, show-stats-broken.txt Current decommission mechanism driven using exclude file has several issues. This bug proposes some changes in the mechanism for better manageability. See the proposal in the next comment for more details. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1547) Improve decommission mechanism
[ https://issues.apache.org/jira/browse/HDFS-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981557#action_12981557 ] Hadoop QA commented on HDFS-1547: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12468302/HDFS-1547.2.patch against trunk revision 1058402. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 13 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.hdfs.server.namenode.TestStorageRestore -1 contrib tests. The patch failed contrib unit tests. -1 system test framework. The patch failed system test framework compile. Test results: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/107//testReport/ Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/107//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/107//console This message is automatically generated. Improve decommission mechanism -- Key: HDFS-1547 URL: https://issues.apache.org/jira/browse/HDFS-1547 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 0.23.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas Fix For: 0.23.0 Attachments: HDFS-1547.1.patch, HDFS-1547.2.patch, HDFS-1547.3.patch, HDFS-1547.patch, show-stats-broken.txt Current decommission mechanism driven using exclude file has several issues. This bug proposes some changes in the mechanism for better manageability. See the proposal in the next comment for more details. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1547) Improve decommission mechanism
[ https://issues.apache.org/jira/browse/HDFS-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Srinivas updated HDFS-1547: -- Status: Open (was: Patch Available) Improve decommission mechanism -- Key: HDFS-1547 URL: https://issues.apache.org/jira/browse/HDFS-1547 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 0.23.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas Fix For: 0.23.0 Attachments: HDFS-1547.1.patch, HDFS-1547.2.patch, HDFS-1547.3.patch, HDFS-1547.patch, show-stats-broken.txt Current decommission mechanism driven using exclude file has several issues. This bug proposes some changes in the mechanism for better manageability. See the proposal in the next comment for more details. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1547) Improve decommission mechanism
[ https://issues.apache.org/jira/browse/HDFS-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Srinivas updated HDFS-1547: -- Status: Patch Available (was: Open) Improve decommission mechanism -- Key: HDFS-1547 URL: https://issues.apache.org/jira/browse/HDFS-1547 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 0.23.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas Fix For: 0.23.0 Attachments: HDFS-1547.1.patch, HDFS-1547.2.patch, HDFS-1547.3.patch, HDFS-1547.patch, show-stats-broken.txt Current decommission mechanism driven using exclude file has several issues. This bug proposes some changes in the mechanism for better manageability. See the proposal in the next comment for more details. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1578) First step towards data transter protocol compatibility: support DatanodeProtocol#getDataTransferProtocolVersion
[ https://issues.apache.org/jira/browse/HDFS-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981647#action_12981647 ] Todd Lipcon commented on HDFS-1578: --- bq. Todd, I really like your proposal of including data transfer version # in DN's descriptor. Is there a simple way of making this work without breaking protocol compatiblity? Hmm, you're trying to add a compatibility layer that is itself compatible with previous versions? Seems tricky, since we'd have to add a field to DatanodeInfo... so, no, no good ideas if that is a goal. First step towards data transter protocol compatibility: support DatanodeProtocol#getDataTransferProtocolVersion Key: HDFS-1578 URL: https://issues.apache.org/jira/browse/HDFS-1578 Project: Hadoop HDFS Issue Type: New Feature Reporter: Hairong Kuang Assignee: Hairong Kuang Fix For: 0.23.0 HADOOP-6904 allows us to handle RPC changes in a compatible way. However, we have one more protocol to take care of, the data transfer protocol, which a dfs client uses to read data from or write data to a datanode. My proposal is to add a new RPC getDataTransferVersion to DatanodeProtocol that returns the data transfer protocol version running on the datanode. A dfs client gets the datanode's version number before it reads from/writes to a datanode. With this, the dfs client could behave differently according to datanode's data transfer version. This provides a base for us to make data transfer protocol changes in a compatible way. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.