[jira] Commented: (HDFS-1583) Improve backup-node sync performance by wrapping RPC parameters

2011-01-13 Thread Liyin Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981176#action_12981176
 ] 

Liyin Liang commented on HDFS-1583:
---

Hi Todd,
This is mainly caused by the serialization of array. The job is done by :
{code:}
ObjectWritable::writeObject(DataOutput out, Object instance,
 Class declaredClass, 
 Configuration conf)
{code}

This function traverses the array and serialize each element as an object. 
According to my test, an byte array with 8000 elements will grow up to 56008 
elements after serialization (2.4ms). However, a wrapped object size is 8094 
after serialization (0.03ms).

By the way, there is a array wrapper class already: 
{code:}
public class ArrayWritable implements Writable
{code}
This class is used in FSEditLog to log operations, e.g. 
FSEditLog::logMkDir(String path, INode newNode).
I'll update the patch to use ArrayWritable.

 Improve backup-node sync performance by wrapping RPC parameters
 ---

 Key: HDFS-1583
 URL: https://issues.apache.org/jira/browse/HDFS-1583
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Reporter: Liyin Liang
 Fix For: 0.23.0

 Attachments: HDFS-1583-1.patch


 The journal edit records are sent by the active name-node to the backup-node 
 with RPC:
 {code:}
   public void journal(NamenodeRegistration registration,
   int jAction,
   int length,
   byte[] records) throws IOException;
 {code}
 During the name-node throughput benchmark, the size of byte array _records_ 
 is around *8000*.  Then the serialization and deserialization is 
 time-consuming. I wrote a simple application to test RPC with byte array 
 parameter. When the size got to 8000, each RPC call need about 6 ms. While 
 name-node sync 8k byte to local disk only need  0.3~0.4ms.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-884) DataNode makeInstance should report the directory list when failing to start up

2011-01-13 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981212#action_12981212
 ] 

Steve Loughran commented on HDFS-884:
-

no need to thank me, all I provided was a deployment where none of the 
directories were valid :)

 DataNode makeInstance should report the directory list when failing to start 
 up
 ---

 Key: HDFS-884
 URL: https://issues.apache.org/jira/browse/HDFS-884
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node
Affects Versions: 0.22.0
Reporter: Steve Loughran
Assignee: Steve Loughran
Priority: Minor
 Fix For: 0.22.0

 Attachments: HDFS-884.patch, HDFS-884.patch, InvalidDirs.patch, 
 InvalidDirs.patch, InvalidDirs.patch


 When {{Datanode.makeInstance()}} cannot work with one of the directories in 
 dfs.data.dir, it logs this at warn level (while losing the stack trace). 
 It should include the nested exception for better troubleshooting. Then, when 
 all dirs in the list fail, an exception is thrown, but this exception does 
 not include the list of directories. It should list the absolute path of 
 every missing/failing directory, so that whoever sees the exception can see 
 where to start looking for problems: either the filesystem or the 
 configuration. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1583) Improve backup-node sync performance by wrapping RPC parameters

2011-01-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981382#action_12981382
 ] 

Hadoop QA commented on HDFS-1583:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12468257/HDFS-1583-2.patch
  against trunk revision 1058402.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed these core unit tests:
  org.apache.hadoop.cli.TestHDFSCLI
  org.apache.hadoop.hdfs.server.datanode.TestBlockRecovery
  org.apache.hadoop.hdfs.server.namenode.TestStorageRestore

-1 contrib tests.  The patch failed contrib unit tests.

+1 system test framework.  The patch passed system test framework compile.

Test results: 
https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/105//testReport/
Findbugs warnings: 
https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/105//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/105//console

This message is automatically generated.

 Improve backup-node sync performance by wrapping RPC parameters
 ---

 Key: HDFS-1583
 URL: https://issues.apache.org/jira/browse/HDFS-1583
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Reporter: Liyin Liang
 Fix For: 0.23.0

 Attachments: HDFS-1583-1.patch, HDFS-1583-2.patch


 The journal edit records are sent by the active name-node to the backup-node 
 with RPC:
 {code:}
   public void journal(NamenodeRegistration registration,
   int jAction,
   int length,
   byte[] records) throws IOException;
 {code}
 During the name-node throughput benchmark, the size of byte array _records_ 
 is around *8000*.  Then the serialization and deserialization is 
 time-consuming. I wrote a simple application to test RPC with byte array 
 parameter. When the size got to 8000, each RPC call need about 6 ms. While 
 name-node sync 8k byte to local disk only need  0.3~0.4ms.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1583) Improve backup-node sync performance by wrapping RPC parameters

2011-01-13 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981404#action_12981404
 ] 

Konstantin Shvachko commented on HDFS-1583:
---

Liyin, This is a nice optimization, and thanks for measuring the performance.
I think this is critical for 0.22 release. Writing edits to BN should not be 
slower than writing to disk.

 byte array with 8000 elements will grow up to 56008 elements after 
 serialization

This makes RPC very inefficient. 
Java arrays can not hold different type instances (see ArrayStoreException for 
refs), so serializing type name for each element does not make sense. Type 
should be stored only once for the entire array.
Does anybody remember discussions or jiras open for it?

We need to decide if it should be fixed in RPC or locally for BackupNode only. 
Seems that RPC level fix would optimize communication in general, but will be 
massively backward incompatible. It could be a good time to do that now before 
the major release.

 Improve backup-node sync performance by wrapping RPC parameters
 ---

 Key: HDFS-1583
 URL: https://issues.apache.org/jira/browse/HDFS-1583
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Reporter: Liyin Liang
 Fix For: 0.23.0

 Attachments: HDFS-1583-1.patch, HDFS-1583-2.patch


 The journal edit records are sent by the active name-node to the backup-node 
 with RPC:
 {code:}
   public void journal(NamenodeRegistration registration,
   int jAction,
   int length,
   byte[] records) throws IOException;
 {code}
 During the name-node throughput benchmark, the size of byte array _records_ 
 is around *8000*.  Then the serialization and deserialization is 
 time-consuming. I wrote a simple application to test RPC with byte array 
 parameter. When the size got to 8000, each RPC call need about 6 ms. While 
 name-node sync 8k byte to local disk only need  0.3~0.4ms.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1583) Improve backup-node sync performance by wrapping RPC parameters

2011-01-13 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981415#action_12981415
 ] 

Todd Lipcon commented on HDFS-1583:
---

We did this optimization for the RPC layer in HBase long ago (HBASE-82). Here's 
the current code:

https://github.com/apache/hbase/blob/trunk/src/main/java/org/apache/hadoop/hbase/io/HbaseObjectWritable.java#L387

Is there a way to do the change to ObjectWritable in a way that the new version 
can still read old data?

 Improve backup-node sync performance by wrapping RPC parameters
 ---

 Key: HDFS-1583
 URL: https://issues.apache.org/jira/browse/HDFS-1583
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Reporter: Liyin Liang
 Fix For: 0.23.0

 Attachments: HDFS-1583-1.patch, HDFS-1583-2.patch


 The journal edit records are sent by the active name-node to the backup-node 
 with RPC:
 {code:}
   public void journal(NamenodeRegistration registration,
   int jAction,
   int length,
   byte[] records) throws IOException;
 {code}
 During the name-node throughput benchmark, the size of byte array _records_ 
 is around *8000*.  Then the serialization and deserialization is 
 time-consuming. I wrote a simple application to test RPC with byte array 
 parameter. When the size got to 8000, each RPC call need about 6 ms. While 
 name-node sync 8k byte to local disk only need  0.3~0.4ms.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HDFS-1583) Improve backup-node sync performance by wrapping RPC parameters

2011-01-13 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon reassigned HDFS-1583:
-

Assignee: Liyin Liang

 Improve backup-node sync performance by wrapping RPC parameters
 ---

 Key: HDFS-1583
 URL: https://issues.apache.org/jira/browse/HDFS-1583
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Reporter: Liyin Liang
Assignee: Liyin Liang
 Fix For: 0.23.0

 Attachments: HDFS-1583-1.patch, HDFS-1583-2.patch


 The journal edit records are sent by the active name-node to the backup-node 
 with RPC:
 {code:}
   public void journal(NamenodeRegistration registration,
   int jAction,
   int length,
   byte[] records) throws IOException;
 {code}
 During the name-node throughput benchmark, the size of byte array _records_ 
 is around *8000*.  Then the serialization and deserialization is 
 time-consuming. I wrote a simple application to test RPC with byte array 
 parameter. When the size got to 8000, each RPC call need about 6 ms. While 
 name-node sync 8k byte to local disk only need  0.3~0.4ms.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1583) Improve backup-node sync performance by wrapping RPC parameters

2011-01-13 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981477#action_12981477
 ] 

Konstantin Shvachko commented on HDFS-1583:
---

Looks like there is a jira for that HADOOP-6949. 
I'll reopen it as we have more benchmarks now and let's move the RPC discussion 
there.

 Improve backup-node sync performance by wrapping RPC parameters
 ---

 Key: HDFS-1583
 URL: https://issues.apache.org/jira/browse/HDFS-1583
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Reporter: Liyin Liang
Assignee: Liyin Liang
 Fix For: 0.23.0

 Attachments: HDFS-1583-1.patch, HDFS-1583-2.patch


 The journal edit records are sent by the active name-node to the backup-node 
 with RPC:
 {code:}
   public void journal(NamenodeRegistration registration,
   int jAction,
   int length,
   byte[] records) throws IOException;
 {code}
 During the name-node throughput benchmark, the size of byte array _records_ 
 is around *8000*.  Then the serialization and deserialization is 
 time-consuming. I wrote a simple application to test RPC with byte array 
 parameter. When the size got to 8000, each RPC call need about 6 ms. While 
 name-node sync 8k byte to local disk only need  0.3~0.4ms.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1572) Checkpointer should trigger checkpoint with specified period.

2011-01-13 Thread Jakob Homan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan updated HDFS-1572:
--

Status: Open  (was: Patch Available)

 Checkpointer should trigger checkpoint with specified period.
 -

 Key: HDFS-1572
 URL: https://issues.apache.org/jira/browse/HDFS-1572
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.22.0
Reporter: Liyin Liang
Priority: Blocker
 Fix For: 0.21.0

 Attachments: 1527-1.diff, 1572-2.diff, HDFS-1572-2.patch, 
 HDFS-1572-3.patch, HDFS-1572.patch


 {code:}
   long now = now();
   boolean shouldCheckpoint = false;
   if(now = lastCheckpointTime + periodMSec) {
 shouldCheckpoint = true;
   } else {
 long size = getJournalSize();
 if(size = checkpointSize)
   shouldCheckpoint = true;
   }
 {code}
 {dfs.namenode.checkpoint.period} in configuration determines the period of 
 checkpoint. However, with above code, the Checkpointer triggers a checkpoint 
 every 5 minutes (periodMSec=5*60*1000). According to SecondaryNameNode.java, 
 the first *if*  statement should be:
  {code:}
 if(now = lastCheckpointTime + 1000 * checkpointPeriod) {
  {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1572) Checkpointer should trigger checkpoint with specified period.

2011-01-13 Thread Jakob Homan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan updated HDFS-1572:
--

Attachment: HDFS-1572-3.patch

Good catch Konstantin.  Changed the sleep time to be the minimum of time-based 
or size-based intervals.  This way, regardless of whichever is lower, we'll 
sleep for the minimum time needed.  

 Checkpointer should trigger checkpoint with specified period.
 -

 Key: HDFS-1572
 URL: https://issues.apache.org/jira/browse/HDFS-1572
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.22.0
Reporter: Liyin Liang
Priority: Blocker
 Fix For: 0.21.0

 Attachments: 1527-1.diff, 1572-2.diff, HDFS-1572-2.patch, 
 HDFS-1572-3.patch, HDFS-1572.patch


 {code:}
   long now = now();
   boolean shouldCheckpoint = false;
   if(now = lastCheckpointTime + periodMSec) {
 shouldCheckpoint = true;
   } else {
 long size = getJournalSize();
 if(size = checkpointSize)
   shouldCheckpoint = true;
   }
 {code}
 {dfs.namenode.checkpoint.period} in configuration determines the period of 
 checkpoint. However, with above code, the Checkpointer triggers a checkpoint 
 every 5 minutes (periodMSec=5*60*1000). According to SecondaryNameNode.java, 
 the first *if*  statement should be:
  {code:}
 if(now = lastCheckpointTime + 1000 * checkpointPeriod) {
  {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1572) Checkpointer should trigger checkpoint with specified period.

2011-01-13 Thread Jakob Homan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan updated HDFS-1572:
--

Status: Patch Available  (was: Open)

Hudson!

 Checkpointer should trigger checkpoint with specified period.
 -

 Key: HDFS-1572
 URL: https://issues.apache.org/jira/browse/HDFS-1572
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.22.0
Reporter: Liyin Liang
Priority: Blocker
 Fix For: 0.21.0

 Attachments: 1527-1.diff, 1572-2.diff, HDFS-1572-2.patch, 
 HDFS-1572-3.patch, HDFS-1572.patch


 {code:}
   long now = now();
   boolean shouldCheckpoint = false;
   if(now = lastCheckpointTime + periodMSec) {
 shouldCheckpoint = true;
   } else {
 long size = getJournalSize();
 if(size = checkpointSize)
   shouldCheckpoint = true;
   }
 {code}
 {dfs.namenode.checkpoint.period} in configuration determines the period of 
 checkpoint. However, with above code, the Checkpointer triggers a checkpoint 
 every 5 minutes (periodMSec=5*60*1000). According to SecondaryNameNode.java, 
 the first *if*  statement should be:
  {code:}
 if(now = lastCheckpointTime + 1000 * checkpointPeriod) {
  {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1582) Remove auto-generated native build files

2011-01-13 Thread Eli Collins (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981498#action_12981498
 ] 

Eli Collins commented on HDFS-1582:
---

Hey Roman,

My bad, looks like this error was due to a partially updated libtool package on 
my system, after rebooting the build with your patch it compiles as expected.

Thanks,
Eli

 Remove auto-generated native build files
 

 Key: HDFS-1582
 URL: https://issues.apache.org/jira/browse/HDFS-1582
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: contrib/libhdfs
Reporter: Roman Shaposhnik
 Fix For: 0.23.0

 Attachments: HADOOP-6436.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 The repo currently includes the automake and autoconf generated files for the 
 native build. Per discussion on HADOOP-6421 let's remove them and use the 
 host's automake and autoconf. We should also do this for libhdfs and 
 fuse-dfs. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1582) Remove auto-generated native build files

2011-01-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981506#action_12981506
 ] 

Hadoop QA commented on HDFS-1582:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12468203/HADOOP-6436.patch
  against trunk revision 1057414.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 2 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed these core unit tests:
  org.apache.hadoop.fs.permission.TestStickyBit
  org.apache.hadoop.hdfs.security.TestDelegationToken
  org.apache.hadoop.hdfs.server.common.TestDistributedUpgrade
  org.apache.hadoop.hdfs.server.datanode.TestBlockReport
  
org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting
  
org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics
  org.apache.hadoop.hdfs.server.namenode.TestBackupNode
  
org.apache.hadoop.hdfs.server.namenode.TestBlocksWithNotEnoughRacks
  org.apache.hadoop.hdfs.server.namenode.TestBlockTokenWithDFS
  org.apache.hadoop.hdfs.server.namenode.TestCheckpoint
  org.apache.hadoop.hdfs.server.namenode.TestFsck
  org.apache.hadoop.hdfs.server.namenode.TestNameEditsConfigs
  org.apache.hadoop.hdfs.server.namenode.TestStorageRestore
  org.apache.hadoop.hdfs.TestCrcCorruption
  org.apache.hadoop.hdfs.TestDatanodeBlockScanner
  org.apache.hadoop.hdfs.TestDatanodeDeath
  org.apache.hadoop.hdfs.TestDFSClientRetries
  org.apache.hadoop.hdfs.TestDFSFinalize
  org.apache.hadoop.hdfs.TestDFSRollback
  org.apache.hadoop.hdfs.TestDFSShell
  org.apache.hadoop.hdfs.TestDFSStartupVersions
  org.apache.hadoop.hdfs.TestDFSStorageStateRecovery
  org.apache.hadoop.hdfs.TestDFSUpgradeFromImage
  org.apache.hadoop.hdfs.TestDFSUpgrade
  org.apache.hadoop.hdfs.TestDistributedFileSystem
  org.apache.hadoop.hdfs.TestFileAppend2
  org.apache.hadoop.hdfs.TestFileAppend3
  org.apache.hadoop.hdfs.TestFileAppend4
  org.apache.hadoop.hdfs.TestFileAppend
  org.apache.hadoop.hdfs.TestFileConcurrentReader
  org.apache.hadoop.hdfs.TestFileCreationNamenodeRestart
  org.apache.hadoop.hdfs.TestFileCreation
  org.apache.hadoop.hdfs.TestHDFSFileSystemContract
  org.apache.hadoop.hdfs.TestHDFSTrash
  org.apache.hadoop.hdfs.TestPread
  org.apache.hadoop.hdfs.TestQuota
  org.apache.hadoop.hdfs.TestReplication
  org.apache.hadoop.hdfs.TestRestartDFS
  org.apache.hadoop.hdfs.TestSetrepDecreasing
  org.apache.hadoop.hdfs.TestSetrepIncreasing
  org.apache.hadoop.hdfs.TestWriteConfigurationToDFS

-1 contrib tests.  The patch failed contrib unit tests.

+1 system test framework.  The patch passed system test framework compile.

Test results: 
https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/101//testReport/
Findbugs warnings: 
https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/101//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/101//console

This message is automatically generated.

 Remove auto-generated native build files
 

 Key: HDFS-1582
 URL: https://issues.apache.org/jira/browse/HDFS-1582
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: contrib/libhdfs
Reporter: Roman Shaposhnik
 Fix For: 0.23.0

 Attachments: HADOOP-6436.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 The repo currently includes the automake and autoconf generated files for the 
 native build. Per discussion on HADOOP-6421 let's remove them and use the 
 host's automake and autoconf. We should also do this for libhdfs and 
 fuse-dfs. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue 

[jira] Updated: (HDFS-1547) Improve decommission mechanism

2011-01-13 Thread Suresh Srinivas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas updated HDFS-1547:
--

Status: Open  (was: Patch Available)

 Improve decommission mechanism
 --

 Key: HDFS-1547
 URL: https://issues.apache.org/jira/browse/HDFS-1547
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.23.0
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas
 Fix For: 0.23.0

 Attachments: HDFS-1547.1.patch, HDFS-1547.patch, show-stats-broken.txt


 Current decommission mechanism driven using exclude file has several issues. 
 This bug proposes some changes in the mechanism for better manageability. See 
 the proposal in the next comment for more details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1547) Improve decommission mechanism

2011-01-13 Thread Suresh Srinivas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas updated HDFS-1547:
--

Attachment: HDFS-1547.2.patch

Attached patch fixes the following:
# Cluster stats updates had a bug. Thanks Todd for pointing it out. I have 
fixed it and added tests to check this.
# Node removed from include file was not being shutdown. I added this 
functionality back and added a test to test this.
# Decommissioning/decommissioned update the stats as given below:
#* Node used capacity is counted towards cluster used capacity.
#* Node capacity is not counted towards the cluster capacity. Only node used 
capacity is used counted towards cluster capacity.
#* Node capacity remaining is not counted towards cluster capacity remaining.
# Cleaned up TestDecommission and moved it to junit4.

 Improve decommission mechanism
 --

 Key: HDFS-1547
 URL: https://issues.apache.org/jira/browse/HDFS-1547
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.23.0
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas
 Fix For: 0.23.0

 Attachments: HDFS-1547.1.patch, HDFS-1547.2.patch, HDFS-1547.patch, 
 show-stats-broken.txt


 Current decommission mechanism driven using exclude file has several issues. 
 This bug proposes some changes in the mechanism for better manageability. See 
 the proposal in the next comment for more details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1547) Improve decommission mechanism

2011-01-13 Thread Suresh Srinivas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas updated HDFS-1547:
--

Status: Patch Available  (was: Open)

 Improve decommission mechanism
 --

 Key: HDFS-1547
 URL: https://issues.apache.org/jira/browse/HDFS-1547
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.23.0
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas
 Fix For: 0.23.0

 Attachments: HDFS-1547.1.patch, HDFS-1547.2.patch, HDFS-1547.patch, 
 show-stats-broken.txt


 Current decommission mechanism driven using exclude file has several issues. 
 This bug proposes some changes in the mechanism for better manageability. See 
 the proposal in the next comment for more details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1547) Improve decommission mechanism

2011-01-13 Thread Suresh Srinivas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas updated HDFS-1547:
--

Attachment: HDFS-1547.3.patch

Additional change to test cluster stats when a datanode decommissioning is 
stopped.

 Improve decommission mechanism
 --

 Key: HDFS-1547
 URL: https://issues.apache.org/jira/browse/HDFS-1547
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.23.0
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas
 Fix For: 0.23.0

 Attachments: HDFS-1547.1.patch, HDFS-1547.2.patch, HDFS-1547.3.patch, 
 HDFS-1547.patch, show-stats-broken.txt


 Current decommission mechanism driven using exclude file has several issues. 
 This bug proposes some changes in the mechanism for better manageability. See 
 the proposal in the next comment for more details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1547) Improve decommission mechanism

2011-01-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981557#action_12981557
 ] 

Hadoop QA commented on HDFS-1547:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12468302/HDFS-1547.2.patch
  against trunk revision 1058402.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 13 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed these core unit tests:
  org.apache.hadoop.hdfs.server.namenode.TestStorageRestore

-1 contrib tests.  The patch failed contrib unit tests.

-1 system test framework.  The patch failed system test framework compile.

Test results: 
https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/107//testReport/
Findbugs warnings: 
https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/107//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/107//console

This message is automatically generated.

 Improve decommission mechanism
 --

 Key: HDFS-1547
 URL: https://issues.apache.org/jira/browse/HDFS-1547
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.23.0
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas
 Fix For: 0.23.0

 Attachments: HDFS-1547.1.patch, HDFS-1547.2.patch, HDFS-1547.3.patch, 
 HDFS-1547.patch, show-stats-broken.txt


 Current decommission mechanism driven using exclude file has several issues. 
 This bug proposes some changes in the mechanism for better manageability. See 
 the proposal in the next comment for more details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1547) Improve decommission mechanism

2011-01-13 Thread Suresh Srinivas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas updated HDFS-1547:
--

Status: Open  (was: Patch Available)

 Improve decommission mechanism
 --

 Key: HDFS-1547
 URL: https://issues.apache.org/jira/browse/HDFS-1547
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.23.0
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas
 Fix For: 0.23.0

 Attachments: HDFS-1547.1.patch, HDFS-1547.2.patch, HDFS-1547.3.patch, 
 HDFS-1547.patch, show-stats-broken.txt


 Current decommission mechanism driven using exclude file has several issues. 
 This bug proposes some changes in the mechanism for better manageability. See 
 the proposal in the next comment for more details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1547) Improve decommission mechanism

2011-01-13 Thread Suresh Srinivas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas updated HDFS-1547:
--

Status: Patch Available  (was: Open)

 Improve decommission mechanism
 --

 Key: HDFS-1547
 URL: https://issues.apache.org/jira/browse/HDFS-1547
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.23.0
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas
 Fix For: 0.23.0

 Attachments: HDFS-1547.1.patch, HDFS-1547.2.patch, HDFS-1547.3.patch, 
 HDFS-1547.patch, show-stats-broken.txt


 Current decommission mechanism driven using exclude file has several issues. 
 This bug proposes some changes in the mechanism for better manageability. See 
 the proposal in the next comment for more details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1578) First step towards data transter protocol compatibility: support DatanodeProtocol#getDataTransferProtocolVersion

2011-01-13 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981647#action_12981647
 ] 

Todd Lipcon commented on HDFS-1578:
---

bq. Todd, I really like your proposal of including data transfer version # in 
DN's descriptor. Is there a simple way of making this work without breaking 
protocol compatiblity?

Hmm, you're trying to add a compatibility layer that is itself compatible with 
previous versions? Seems tricky, since we'd have to add a field to 
DatanodeInfo... so, no, no good ideas if that is a goal.

 First step towards data transter protocol compatibility: support 
 DatanodeProtocol#getDataTransferProtocolVersion
 

 Key: HDFS-1578
 URL: https://issues.apache.org/jira/browse/HDFS-1578
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.23.0


 HADOOP-6904 allows us to handle RPC changes in a compatible way. However, we 
 have one more protocol to take care of, the data transfer protocol, which a 
 dfs client uses to read data from or write data to a datanode.
  
 My proposal is to add a new RPC getDataTransferVersion to DatanodeProtocol 
 that returns the data transfer protocol version running on the datanode. A 
 dfs client gets the datanode's version number before it reads from/writes to 
 a datanode. With this, the dfs client could behave differently according to 
 datanode's data transfer version. This provides a base for us to make data 
 transfer protocol changes in a compatible way.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.