[jira] Commented: (HADOOP-6298) BytesWritable#getBytes is a bad name that leads to programming mistakes

Tom White (JIRA) Thu, 08 Oct 2009 08:27:03 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-6298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763520#action_12763520
 ]


Tom White commented on HADOOP-6298:
-----------------------------------

I don't think this proposal is about changing the API, it's about renaming the 
method to more accurately describe its contract. Text.getBytes() behaves 
differently to String.getBytes(). It is a problem that trips up users; see, for 
example, 
http://www.nabble.com/can%27t-read-the-SequenceFile-correctly-td21866960.html.

We could deprecate getBytes() (on BinaryComparable and its subclasses 
BytesWritable and Text) in 0.22 and create getPaddedBytes() as Nathan suggests, 
which is identical in functionality. Then in the next release we would remove 
getBytes(). This change would not have any impact on efficiency, since it is 
purely a rename.

Nathan, what's the use case for getNonPaddedValue()? It's possible that by 
exposing it, it becomes easy to write an inefficient program since copying in 
maps or reduces is normally expensive.



> BytesWritable#getBytes is a bad name that leads to programming mistakes
> -----------------------------------------------------------------------
>
>                 Key: HADOOP-6298
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6298
>             Project: Hadoop Common
>          Issue Type: Improvement
>    Affects Versions: 0.20.1
>            Reporter: Nathan Marz
>
> Pretty much everyone at Rapleaf who has worked with Hadoop has misused 
> BytesWritable#getBytes at some point, not expecting the byte array to be 
> padded. I think we can completely alleviate these programming mistakes by 
> deprecating and renaming this method (again) to be more descriptive. I 
> propose "getPaddedBytes()" or "getPaddedValue()". It would also be helpful to 
> have a helper method "getNonPaddedValue()" that makes a copy into a 
> non-padded byte array. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6298) BytesWritable#getBytes is a bad name that leads to programming mistakes

Reply via email to