[ https://issues.apache.org/jira/browse/HADOOP-6298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763520#action_12763520 ]
Tom White commented on HADOOP-6298: ----------------------------------- I don't think this proposal is about changing the API, it's about renaming the method to more accurately describe its contract. Text.getBytes() behaves differently to String.getBytes(). It is a problem that trips up users; see, for example, http://www.nabble.com/can%27t-read-the-SequenceFile-correctly-td21866960.html. We could deprecate getBytes() (on BinaryComparable and its subclasses BytesWritable and Text) in 0.22 and create getPaddedBytes() as Nathan suggests, which is identical in functionality. Then in the next release we would remove getBytes(). This change would not have any impact on efficiency, since it is purely a rename. Nathan, what's the use case for getNonPaddedValue()? It's possible that by exposing it, it becomes easy to write an inefficient program since copying in maps or reduces is normally expensive. > BytesWritable#getBytes is a bad name that leads to programming mistakes > ----------------------------------------------------------------------- > > Key: HADOOP-6298 > URL: https://issues.apache.org/jira/browse/HADOOP-6298 > Project: Hadoop Common > Issue Type: Improvement > Affects Versions: 0.20.1 > Reporter: Nathan Marz > > Pretty much everyone at Rapleaf who has worked with Hadoop has misused > BytesWritable#getBytes at some point, not expecting the byte array to be > padded. I think we can completely alleviate these programming mistakes by > deprecating and renaming this method (again) to be more descriptive. I > propose "getPaddedBytes()" or "getPaddedValue()". It would also be helpful to > have a helper method "getNonPaddedValue()" that makes a copy into a > non-padded byte array. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.