[jira] [Commented] (HDFS-4403) DFSClient can infer checksum type when not provided by reading first byte

2013-01-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13559648#comment-13559648
 ] 

Hudson commented on HDFS-4403:
--

Integrated in Hadoop-Mapreduce-trunk #1321 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1321/])
HDFS-4403. DFSClient can infer checksum type when not provided by reading 
first byte. Contributed by Todd Lipcon. (Revision 1436730)

 Result = SUCCESS
todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1436730
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileChecksumServlets.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/datatransfer.proto


> DFSClient can infer checksum type when not provided by reading first byte
> -
>
> Key: HDFS-4403
> URL: https://issues.apache.org/jira/browse/HDFS-4403
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.0.2-alpha
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Minor
> Fix For: 3.0.0, 2.0.3-alpha
>
> Attachments: hdfs-4403.txt, hdfs-4403.txt
>
>
> HDFS-3177 added the checksum type to OpBlockChecksumResponseProto, but the 
> new protobuf field is optional, with a default of CRC32. This means that this 
> API, when used against an older cluster (like earlier 0.23 releases) will 
> falsely return CRC32 even if that cluster has written files with CRC32C. This 
> can cause issues for distcp, for example.
> Instead of defaulting the protobuf field to CRC32, we can leave it with no 
> default, and if the OpBlockChecksumResponseProto has no checksum type set, 
> the client can send OP_READ_BLOCK to read the first byte of the block, then 
> grab the checksum type out of that response (which has always been present)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4403) DFSClient can infer checksum type when not provided by reading first byte

2013-01-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13559602#comment-13559602
 ] 

Hudson commented on HDFS-4403:
--

Integrated in Hadoop-Hdfs-trunk #1293 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1293/])
HDFS-4403. DFSClient can infer checksum type when not provided by reading 
first byte. Contributed by Todd Lipcon. (Revision 1436730)

 Result = FAILURE
todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1436730
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileChecksumServlets.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/datatransfer.proto


> DFSClient can infer checksum type when not provided by reading first byte
> -
>
> Key: HDFS-4403
> URL: https://issues.apache.org/jira/browse/HDFS-4403
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.0.2-alpha
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Minor
> Fix For: 3.0.0, 2.0.3-alpha
>
> Attachments: hdfs-4403.txt, hdfs-4403.txt
>
>
> HDFS-3177 added the checksum type to OpBlockChecksumResponseProto, but the 
> new protobuf field is optional, with a default of CRC32. This means that this 
> API, when used against an older cluster (like earlier 0.23 releases) will 
> falsely return CRC32 even if that cluster has written files with CRC32C. This 
> can cause issues for distcp, for example.
> Instead of defaulting the protobuf field to CRC32, we can leave it with no 
> default, and if the OpBlockChecksumResponseProto has no checksum type set, 
> the client can send OP_READ_BLOCK to read the first byte of the block, then 
> grab the checksum type out of that response (which has always been present)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4403) DFSClient can infer checksum type when not provided by reading first byte

2013-01-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13559541#comment-13559541
 ] 

Hudson commented on HDFS-4403:
--

Integrated in Hadoop-Yarn-trunk #104 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/104/])
HDFS-4403. DFSClient can infer checksum type when not provided by reading 
first byte. Contributed by Todd Lipcon. (Revision 1436730)

 Result = SUCCESS
todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1436730
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileChecksumServlets.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/datatransfer.proto


> DFSClient can infer checksum type when not provided by reading first byte
> -
>
> Key: HDFS-4403
> URL: https://issues.apache.org/jira/browse/HDFS-4403
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.0.2-alpha
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Minor
> Fix For: 3.0.0, 2.0.3-alpha
>
> Attachments: hdfs-4403.txt, hdfs-4403.txt
>
>
> HDFS-3177 added the checksum type to OpBlockChecksumResponseProto, but the 
> new protobuf field is optional, with a default of CRC32. This means that this 
> API, when used against an older cluster (like earlier 0.23 releases) will 
> falsely return CRC32 even if that cluster has written files with CRC32C. This 
> can cause issues for distcp, for example.
> Instead of defaulting the protobuf field to CRC32, we can leave it with no 
> default, and if the OpBlockChecksumResponseProto has no checksum type set, 
> the client can send OP_READ_BLOCK to read the first byte of the block, then 
> grab the checksum type out of that response (which has always been present)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4403) DFSClient can infer checksum type when not provided by reading first byte

2013-01-21 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13559397#comment-13559397
 ] 

Suresh Srinivas commented on HDFS-4403:
---

Todd, sorry got busy with other things. +1 for the change as well.

Consider adding a brief release note on the issue with prior branch in the 
release notes to help users understand the issue.

> DFSClient can infer checksum type when not provided by reading first byte
> -
>
> Key: HDFS-4403
> URL: https://issues.apache.org/jira/browse/HDFS-4403
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.0.2-alpha
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Minor
> Fix For: 3.0.0, 2.0.3-alpha
>
> Attachments: hdfs-4403.txt, hdfs-4403.txt
>
>
> HDFS-3177 added the checksum type to OpBlockChecksumResponseProto, but the 
> new protobuf field is optional, with a default of CRC32. This means that this 
> API, when used against an older cluster (like earlier 0.23 releases) will 
> falsely return CRC32 even if that cluster has written files with CRC32C. This 
> can cause issues for distcp, for example.
> Instead of defaulting the protobuf field to CRC32, we can leave it with no 
> default, and if the OpBlockChecksumResponseProto has no checksum type set, 
> the client can send OP_READ_BLOCK to read the first byte of the block, then 
> grab the checksum type out of that response (which has always been present)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4403) DFSClient can infer checksum type when not provided by reading first byte

2013-01-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13559329#comment-13559329
 ] 

Hudson commented on HDFS-4403:
--

Integrated in Hadoop-trunk-Commit #3265 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/3265/])
HDFS-4403. DFSClient can infer checksum type when not provided by reading 
first byte. Contributed by Todd Lipcon. (Revision 1436730)

 Result = SUCCESS
todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1436730
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileChecksumServlets.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/datatransfer.proto


> DFSClient can infer checksum type when not provided by reading first byte
> -
>
> Key: HDFS-4403
> URL: https://issues.apache.org/jira/browse/HDFS-4403
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.0.2-alpha
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Minor
> Fix For: 3.0.0, 2.0.3-alpha
>
> Attachments: hdfs-4403.txt, hdfs-4403.txt
>
>
> HDFS-3177 added the checksum type to OpBlockChecksumResponseProto, but the 
> new protobuf field is optional, with a default of CRC32. This means that this 
> API, when used against an older cluster (like earlier 0.23 releases) will 
> falsely return CRC32 even if that cluster has written files with CRC32C. This 
> can cause issues for distcp, for example.
> Instead of defaulting the protobuf field to CRC32, we can leave it with no 
> default, and if the OpBlockChecksumResponseProto has no checksum type set, 
> the client can send OP_READ_BLOCK to read the first byte of the block, then 
> grab the checksum type out of that response (which has always been present)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4403) DFSClient can infer checksum type when not provided by reading first byte

2013-01-17 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13556944#comment-13556944
 ] 

Aaron T. Myers commented on HDFS-4403:
--

+1, the patch looks good to me.

> DFSClient can infer checksum type when not provided by reading first byte
> -
>
> Key: HDFS-4403
> URL: https://issues.apache.org/jira/browse/HDFS-4403
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.0.2-alpha
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Minor
> Attachments: hdfs-4403.txt, hdfs-4403.txt
>
>
> HDFS-3177 added the checksum type to OpBlockChecksumResponseProto, but the 
> new protobuf field is optional, with a default of CRC32. This means that this 
> API, when used against an older cluster (like earlier 0.23 releases) will 
> falsely return CRC32 even if that cluster has written files with CRC32C. This 
> can cause issues for distcp, for example.
> Instead of defaulting the protobuf field to CRC32, we can leave it with no 
> default, and if the OpBlockChecksumResponseProto has no checksum type set, 
> the client can send OP_READ_BLOCK to read the first byte of the block, then 
> grab the checksum type out of that response (which has always been present)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4403) DFSClient can infer checksum type when not provided by reading first byte

2013-01-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13555671#comment-13555671
 ] 

Hadoop QA commented on HDFS-4403:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12565199/hdfs-4403.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/3849//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3849//console

This message is automatically generated.

> DFSClient can infer checksum type when not provided by reading first byte
> -
>
> Key: HDFS-4403
> URL: https://issues.apache.org/jira/browse/HDFS-4403
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.0.2-alpha
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Minor
> Attachments: hdfs-4403.txt, hdfs-4403.txt
>
>
> HDFS-3177 added the checksum type to OpBlockChecksumResponseProto, but the 
> new protobuf field is optional, with a default of CRC32. This means that this 
> API, when used against an older cluster (like earlier 0.23 releases) will 
> falsely return CRC32 even if that cluster has written files with CRC32C. This 
> can cause issues for distcp, for example.
> Instead of defaulting the protobuf field to CRC32, we can leave it with no 
> default, and if the OpBlockChecksumResponseProto has no checksum type set, 
> the client can send OP_READ_BLOCK to read the first byte of the block, then 
> grab the checksum type out of that response (which has always been present)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4403) DFSClient can infer checksum type when not provided by reading first byte

2013-01-16 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1342#comment-1342
 ] 

Colin Patrick McCabe commented on HDFS-4403:


{code}
* @param src The file path
* @return The checksum 
*/
-  public static MD5MD5CRC32FileChecksum getFileChecksum(String src,
+  static MD5MD5CRC32FileChecksum getFileChecksum(String src,
+  String clientName,
{code}

Need some JavaDoc about the new parameter, {{clientName}}.

{code}
-  final DataChecksum.Type ct = PBHelper.convert(checksumData
-  .getCrcType());
+  DataChecksum.Type ct;
...
{code}

For what it's worth, {{ct}} can still be {{final}} here (it only gets set once).

I realize this isn't new with your change, but I find it kind of odd that the 
size of our *output* {{BufferedOutputStream}} is derived from the size of 
{{io.file.buffer.size}}.  Surely, these requests are small, and there's no 
reason to allocate a giant send buffer just because we wanted more file 
buffering?  Plus these are not "file buffers"-- we're not sending file data.

> DFSClient can infer checksum type when not provided by reading first byte
> -
>
> Key: HDFS-4403
> URL: https://issues.apache.org/jira/browse/HDFS-4403
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.0.2-alpha
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Minor
> Attachments: hdfs-4403.txt
>
>
> HDFS-3177 added the checksum type to OpBlockChecksumResponseProto, but the 
> new protobuf field is optional, with a default of CRC32. This means that this 
> API, when used against an older cluster (like earlier 0.23 releases) will 
> falsely return CRC32 even if that cluster has written files with CRC32C. This 
> can cause issues for distcp, for example.
> Instead of defaulting the protobuf field to CRC32, we can leave it with no 
> default, and if the OpBlockChecksumResponseProto has no checksum type set, 
> the client can send OP_READ_BLOCK to read the first byte of the block, then 
> grab the checksum type out of that response (which has always been present)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4403) DFSClient can infer checksum type when not provided by reading first byte

2013-01-15 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13554527#comment-13554527
 ] 

Todd Lipcon commented on HDFS-4403:
---

bq. new client talks to old server: new clients infers the checksum type by 
reading first byte

If you mark it 'required', then the new client will fail to deserialize a 
response from an old server, due to the missing field. To verify this, I 
changed my test protobuf above to make the field required, and tried to 
deserialize a byte array which didn't include the field using 
MyTestProto.parseFrom. It threw "InvalidProtocolBufferException: Message 
missing required fields: testField"

BTW, the protobuf guide also talks about this specific case of changing the 
default:
{quote}
Changing a default value is generally OK, as long as you remember that default 
values are never sent over the wire. Thus, if a program receives a message in 
which a particular field isn't set, the program will see the default value as 
it was defined in that program's version of the protocol. It will NOT see the 
default value that was defined in the sender's code.
{quote}

> DFSClient can infer checksum type when not provided by reading first byte
> -
>
> Key: HDFS-4403
> URL: https://issues.apache.org/jira/browse/HDFS-4403
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.0.2-alpha
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Minor
> Attachments: hdfs-4403.txt
>
>
> HDFS-3177 added the checksum type to OpBlockChecksumResponseProto, but the 
> new protobuf field is optional, with a default of CRC32. This means that this 
> API, when used against an older cluster (like earlier 0.23 releases) will 
> falsely return CRC32 even if that cluster has written files with CRC32C. This 
> can cause issues for distcp, for example.
> Instead of defaulting the protobuf field to CRC32, we can leave it with no 
> default, and if the OpBlockChecksumResponseProto has no checksum type set, 
> the client can send OP_READ_BLOCK to read the first byte of the block, then 
> grab the checksum type out of that response (which has always been present)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4403) DFSClient can infer checksum type when not provided by reading first byte

2013-01-15 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13554512#comment-13554512
 ] 

Suresh Srinivas commented on HDFS-4403:
---

bq. old client, new server: new server sets type. old client ignores it. Again, 
can't go back in time to change the old client, but nothing breaks.
If the new server sets crcType, the old client does not ignore it. It does use 
the crcType sent by the server. If we can ensure new server always sets the 
crcType, there is no issue.

Let me try to explain it better. May be I misunderstood the following comment: 
bq. Any new server always sets the checksum type explicitly, regardless of what 
type it is. It's only the old (pre-HDFS-3177) servers that wouldn't set one.
How do you ensure new server always sets the checksum type explicitly? What 
prevents a future (incorrect) change (say in 2.0.3) where server stops setting 
the crcType, because it is an optional field? In that case the old client (that 
had default value in proto definition) uses the default CRC type.

I believe, the correct change is to make the crcType field required. That way:
- old/new client talks to new server: new server always is forced to set the 
*required* field and old client/new clients use the crcType.
- new client talks to old server: new clients infers the checksum type by 
reading first byte
- old client and old server - cannot be solved as you described.

Making crcType required is not incompatible, given default value/explicitly set 
value was used by the old client.



> DFSClient can infer checksum type when not provided by reading first byte
> -
>
> Key: HDFS-4403
> URL: https://issues.apache.org/jira/browse/HDFS-4403
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.0.2-alpha
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Minor
> Attachments: hdfs-4403.txt
>
>
> HDFS-3177 added the checksum type to OpBlockChecksumResponseProto, but the 
> new protobuf field is optional, with a default of CRC32. This means that this 
> API, when used against an older cluster (like earlier 0.23 releases) will 
> falsely return CRC32 even if that cluster has written files with CRC32C. This 
> can cause issues for distcp, for example.
> Instead of defaulting the protobuf field to CRC32, we can leave it with no 
> default, and if the OpBlockChecksumResponseProto has no checksum type set, 
> the client can send OP_READ_BLOCK to read the first byte of the block, then 
> grab the checksum type out of that response (which has always been present)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4403) DFSClient can infer checksum type when not provided by reading first byte

2013-01-15 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13554484#comment-13554484
 ] 

Todd Lipcon commented on HDFS-4403:
---

I don't really follow...

'optional' is generally used in protobuf for adding new features compatibly. 
Old clients don't know about it, so they don't care. New clients already have 
to deal with old servers which don't set the field, so they have back-compat 
paths (like the one I'm adding here).

If there's an optional field, and a client assumes it's always set, then it's a 
bug, I agree. That's why the patch is calling hasChecksumType() here and going 
to a fallback (compatibility) path.

In this case, the options are:
- old client, old server: there's a bug. We can't go back in time and change 
the code.
- new client, new server: new server sets checksum type. new client reads it. 
Everything works great. (HDFS-3177)
- new client, old server: old server doesn't set the type. HDFS-3177 set a 
default to CRC32, which wasn't correct. This patch instead fixes that to 
fallback to another (compatible) method to determine the type, rather than 
defaulting to some arbitrary choice.
- old client, new server: new server sets type. old client ignores it. Again, 
can't go back in time to change the old client, but nothing breaks.

If in the future we change the server to not set a type, then it's our own 
fault that old clients wouldn't know what to do. That's just life. This is how 
you do versioning with protobuf: see the Language Guide at 
https://developers.google.com/protocol-buffers/docs/proto#updating :
{quote}
Any new fields that you add should be optional or repeated. This means that any 
messages serialized by code using your "old" message format can be parsed by 
your new generated code, as they won't be missing any required elements. You 
should set up sensible default values for these elements so that new code can 
properly interact with messages generated by old code. Similarly, messages 
created by your new code can be parsed by your old code: old binaries simply 
ignore the new field when parsing. However, the unknown fields are not 
discarded, and if the message is later serialized, the unknown fields are 
serialized along with it – so if the message is passed on to new code, the new 
fields are still available. Note that preservation of unknown fields is 
currently not available for Python.
{quote}

Here there is no "sensible default value", so we use a "sensible fallback code 
path" instead.

> DFSClient can infer checksum type when not provided by reading first byte
> -
>
> Key: HDFS-4403
> URL: https://issues.apache.org/jira/browse/HDFS-4403
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.0.2-alpha
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Minor
> Attachments: hdfs-4403.txt
>
>
> HDFS-3177 added the checksum type to OpBlockChecksumResponseProto, but the 
> new protobuf field is optional, with a default of CRC32. This means that this 
> API, when used against an older cluster (like earlier 0.23 releases) will 
> falsely return CRC32 even if that cluster has written files with CRC32C. This 
> can cause issues for distcp, for example.
> Instead of defaulting the protobuf field to CRC32, we can leave it with no 
> default, and if the OpBlockChecksumResponseProto has no checksum type set, 
> the client can send OP_READ_BLOCK to read the first byte of the block, then 
> grab the checksum type out of that response (which has always been present)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4403) DFSClient can infer checksum type when not provided by reading first byte

2013-01-15 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13554477#comment-13554477
 ] 

Suresh Srinivas commented on HDFS-4403:
---

bq. Any new server always sets the checksum type explicitly, regardless of what 
type it is. It's only the old (pre-HDFS-3177) servers that wouldn't set one.
Given that the field is optional, assuming that new server always sets the 
checksum type explicitly is an incorrect assumption. Only way to ensure that is 
by making the field required, right? One cannot prevent a future change where 
checksum type is not set (unless you or I review that change).

> DFSClient can infer checksum type when not provided by reading first byte
> -
>
> Key: HDFS-4403
> URL: https://issues.apache.org/jira/browse/HDFS-4403
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.0.2-alpha
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Minor
> Attachments: hdfs-4403.txt
>
>
> HDFS-3177 added the checksum type to OpBlockChecksumResponseProto, but the 
> new protobuf field is optional, with a default of CRC32. This means that this 
> API, when used against an older cluster (like earlier 0.23 releases) will 
> falsely return CRC32 even if that cluster has written files with CRC32C. This 
> can cause issues for distcp, for example.
> Instead of defaulting the protobuf field to CRC32, we can leave it with no 
> default, and if the OpBlockChecksumResponseProto has no checksum type set, 
> the client can send OP_READ_BLOCK to read the first byte of the block, then 
> grab the checksum type out of that response (which has always been present)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4403) DFSClient can infer checksum type when not provided by reading first byte

2013-01-15 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13554467#comment-13554467
 ] 

Todd Lipcon commented on HDFS-4403:
---

bq. Is it possible that the server does not set the crcType and expects the 
client to infer the type. The checksum happens to be something other than 
CHECKSUM_CRC32. The old client treats it as CHECKSUM_CRC32 and runs into issues?

That's the bug this is trying to fix. If you currently run a newer 0.23-branch 
client against an earlier 0.23, or a 2.x branch (after making it IPC 
compatible) against 2.0.0, it will incorrectly report CHECKSUM_CRC32 back to 
the caller, even if the file is actually CHECKSUM_CRC32C. This results in 
distcp giving back errors about the copy failing due to checksum mismatch, etc.

Any _new_ server always sets the checksum type explicitly, regardless of what 
type it is. It's only the old (pre-HDFS-3177) servers that wouldn't set one.

> DFSClient can infer checksum type when not provided by reading first byte
> -
>
> Key: HDFS-4403
> URL: https://issues.apache.org/jira/browse/HDFS-4403
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.0.2-alpha
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Minor
> Attachments: hdfs-4403.txt
>
>
> HDFS-3177 added the checksum type to OpBlockChecksumResponseProto, but the 
> new protobuf field is optional, with a default of CRC32. This means that this 
> API, when used against an older cluster (like earlier 0.23 releases) will 
> falsely return CRC32 even if that cluster has written files with CRC32C. This 
> can cause issues for distcp, for example.
> Instead of defaulting the protobuf field to CRC32, we can leave it with no 
> default, and if the OpBlockChecksumResponseProto has no checksum type set, 
> the client can send OP_READ_BLOCK to read the first byte of the block, then 
> grab the checksum type out of that response (which has always been present)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4403) DFSClient can infer checksum type when not provided by reading first byte

2013-01-15 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13554450#comment-13554450
 ] 

Suresh Srinivas commented on HDFS-4403:
---

bq. So, in your example, with the old clients talking to a new server which 
doesn't set crc type, the old clients would continue to use whatever default 
they'd defined locally.
Makes sense. So the old client when the field is not set uses CHECKSUM_CRC32.

Is it possible that the server does not set the crcType and expects the client 
to infer the type. The checksum happens to be something other than 
CHECKSUM_CRC32. The old client treats it as CHECKSUM_CRC32 and runs into issues?

> DFSClient can infer checksum type when not provided by reading first byte
> -
>
> Key: HDFS-4403
> URL: https://issues.apache.org/jira/browse/HDFS-4403
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.0.2-alpha
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Minor
> Attachments: hdfs-4403.txt
>
>
> HDFS-3177 added the checksum type to OpBlockChecksumResponseProto, but the 
> new protobuf field is optional, with a default of CRC32. This means that this 
> API, when used against an older cluster (like earlier 0.23 releases) will 
> falsely return CRC32 even if that cluster has written files with CRC32C. This 
> can cause issues for distcp, for example.
> Instead of defaulting the protobuf field to CRC32, we can leave it with no 
> default, and if the OpBlockChecksumResponseProto has no checksum type set, 
> the client can send OP_READ_BLOCK to read the first byte of the block, then 
> grab the checksum type out of that response (which has always been present)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4403) DFSClient can infer checksum type when not provided by reading first byte

2013-01-15 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13554423#comment-13554423
 ] 

Todd Lipcon commented on HDFS-4403:
---

bq. The old client code just called getCrcType() without checking hasCrcType(). 
The old clients talking to new server with this change, get null pointer 
exception, if the new server does not set the crcType

That's not how defaults work in protobuf -- the protobuf maintains a separate 
flag for "hasFieldX", which is set to false upon construction even if the field 
has a default. So, on the server side, if an optional field isn't explicitly 
set, it won't serialize the default of that field to the wire when it's 
serialized. That is to say, given that the field has always been optional, the 
case of a server not setting it will be handled the same regardless of whether 
or not it has a default.

To illustrate, I made the following test proto:
{code}
message MyTestProto {
  optional string testField = 1 [ default = "hello world" ];
}
{code}

And ran the following code:

{code}
MyTestProto pb = MyTestProto.newBuilder().build();
System.err.println("has field? " + pb.hasTestField());
System.err.println("field value: " + pb.getTestField());
System.err.println("serialized: " + 
StringUtils.byteToHexString(pb.toByteArray()));
{code}

Output:

{code}
has field? false
field value: hello world
serialized: 
{code}

Note how the default value isn't put on the "wire" when the protobuf is 
serialized to a byte array.

So, in your example, with the old clients talking to a new server which doesn't 
set crc type, the old clients would continue to use whatever default they'd 
defined locally.

> DFSClient can infer checksum type when not provided by reading first byte
> -
>
> Key: HDFS-4403
> URL: https://issues.apache.org/jira/browse/HDFS-4403
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.0.2-alpha
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Minor
> Attachments: hdfs-4403.txt
>
>
> HDFS-3177 added the checksum type to OpBlockChecksumResponseProto, but the 
> new protobuf field is optional, with a default of CRC32. This means that this 
> API, when used against an older cluster (like earlier 0.23 releases) will 
> falsely return CRC32 even if that cluster has written files with CRC32C. This 
> can cause issues for distcp, for example.
> Instead of defaulting the protobuf field to CRC32, we can leave it with no 
> default, and if the OpBlockChecksumResponseProto has no checksum type set, 
> the client can send OP_READ_BLOCK to read the first byte of the block, then 
> grab the checksum type out of that response (which has always been present)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4403) DFSClient can infer checksum type when not provided by reading first byte

2013-01-15 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13554396#comment-13554396
 ] 

Suresh Srinivas commented on HDFS-4403:
---

Prior to this change, the client always got default crcType in 
OpBlockChecksumResponseProto. The old client code just called getCrcType() 
without checking hasCrcType(). The old clients talking to new server with this 
change, get null pointer exception, if the new server does not set the crcType.


> DFSClient can infer checksum type when not provided by reading first byte
> -
>
> Key: HDFS-4403
> URL: https://issues.apache.org/jira/browse/HDFS-4403
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.0.2-alpha
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Minor
> Attachments: hdfs-4403.txt
>
>
> HDFS-3177 added the checksum type to OpBlockChecksumResponseProto, but the 
> new protobuf field is optional, with a default of CRC32. This means that this 
> API, when used against an older cluster (like earlier 0.23 releases) will 
> falsely return CRC32 even if that cluster has written files with CRC32C. This 
> can cause issues for distcp, for example.
> Instead of defaulting the protobuf field to CRC32, we can leave it with no 
> default, and if the OpBlockChecksumResponseProto has no checksum type set, 
> the client can send OP_READ_BLOCK to read the first byte of the block, then 
> grab the checksum type out of that response (which has always been present)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4403) DFSClient can infer checksum type when not provided by reading first byte

2013-01-15 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13554175#comment-13554175
 ] 

Todd Lipcon commented on HDFS-4403:
---

Hi Suresh. I don't see how that would be incompatible. You can add or remove 
defaults and maintain compatibility as far as I can imagine. What client/server 
combination would not work given this change?

> DFSClient can infer checksum type when not provided by reading first byte
> -
>
> Key: HDFS-4403
> URL: https://issues.apache.org/jira/browse/HDFS-4403
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.0.2-alpha
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Minor
> Attachments: hdfs-4403.txt
>
>
> HDFS-3177 added the checksum type to OpBlockChecksumResponseProto, but the 
> new protobuf field is optional, with a default of CRC32. This means that this 
> API, when used against an older cluster (like earlier 0.23 releases) will 
> falsely return CRC32 even if that cluster has written files with CRC32C. This 
> can cause issues for distcp, for example.
> Instead of defaulting the protobuf field to CRC32, we can leave it with no 
> default, and if the OpBlockChecksumResponseProto has no checksum type set, 
> the client can send OP_READ_BLOCK to read the first byte of the block, then 
> grab the checksum type out of that response (which has always been present)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4403) DFSClient can infer checksum type when not provided by reading first byte

2013-01-15 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13554083#comment-13554083
 ] 

Suresh Srinivas commented on HDFS-4403:
---

Given the protobuf change where optional field no longer carries the previous 
default value, this should be marked incompatible?

> DFSClient can infer checksum type when not provided by reading first byte
> -
>
> Key: HDFS-4403
> URL: https://issues.apache.org/jira/browse/HDFS-4403
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.0.2-alpha
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Minor
> Attachments: hdfs-4403.txt
>
>
> HDFS-3177 added the checksum type to OpBlockChecksumResponseProto, but the 
> new protobuf field is optional, with a default of CRC32. This means that this 
> API, when used against an older cluster (like earlier 0.23 releases) will 
> falsely return CRC32 even if that cluster has written files with CRC32C. This 
> can cause issues for distcp, for example.
> Instead of defaulting the protobuf field to CRC32, we can leave it with no 
> default, and if the OpBlockChecksumResponseProto has no checksum type set, 
> the client can send OP_READ_BLOCK to read the first byte of the block, then 
> grab the checksum type out of that response (which has always been present)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4403) DFSClient can infer checksum type when not provided by reading first byte

2013-01-14 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13553500#comment-13553500
 ] 

Todd Lipcon commented on HDFS-4403:
---

Release audit is due to HADOOP-9097. The lack of unit tests is justified above.

> DFSClient can infer checksum type when not provided by reading first byte
> -
>
> Key: HDFS-4403
> URL: https://issues.apache.org/jira/browse/HDFS-4403
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.0.2-alpha
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Minor
> Attachments: hdfs-4403.txt
>
>
> HDFS-3177 added the checksum type to OpBlockChecksumResponseProto, but the 
> new protobuf field is optional, with a default of CRC32. This means that this 
> API, when used against an older cluster (like earlier 0.23 releases) will 
> falsely return CRC32 even if that cluster has written files with CRC32C. This 
> can cause issues for distcp, for example.
> Instead of defaulting the protobuf field to CRC32, we can leave it with no 
> default, and if the OpBlockChecksumResponseProto has no checksum type set, 
> the client can send OP_READ_BLOCK to read the first byte of the block, then 
> grab the checksum type out of that response (which has always been present)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4403) DFSClient can infer checksum type when not provided by reading first byte

2013-01-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13553481#comment-13553481
 ] 

Hadoop QA commented on HDFS-4403:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12564863/hdfs-4403.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 2 
release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/3841//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/3841//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3841//console

This message is automatically generated.

> DFSClient can infer checksum type when not provided by reading first byte
> -
>
> Key: HDFS-4403
> URL: https://issues.apache.org/jira/browse/HDFS-4403
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.0.2-alpha
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Minor
> Attachments: hdfs-4403.txt
>
>
> HDFS-3177 added the checksum type to OpBlockChecksumResponseProto, but the 
> new protobuf field is optional, with a default of CRC32. This means that this 
> API, when used against an older cluster (like earlier 0.23 releases) will 
> falsely return CRC32 even if that cluster has written files with CRC32C. This 
> can cause issues for distcp, for example.
> Instead of defaulting the protobuf field to CRC32, we can leave it with no 
> default, and if the OpBlockChecksumResponseProto has no checksum type set, 
> the client can send OP_READ_BLOCK to read the first byte of the block, then 
> grab the checksum type out of that response (which has always been present)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira