[ http://issues.apache.org/jira/browse/HADOOP-550?page=all ]
Hairong Kuang updated HADOOP-550: --------------------------------- Attachment: text.patch > Text constructure can throw exception > ------------------------------------- > > Key: HADOOP-550 > URL: http://issues.apache.org/jira/browse/HADOOP-550 > Project: Hadoop > Issue Type: Bug > Affects Versions: 0.6.2 > Reporter: Bryan Pendleton > Assigned To: Hairong Kuang > Fix For: 0.7.0 > > Attachments: text.patch > > > I finally got back around to moving my working code to using Text objects. > And, once again, switching to Text (from UTF8) means my jobs are failing. > This time, its better defined - constructing a Text from a string extracted > from Real World data makes the Text object constructor throw a > CharacterCodingException. This may be legit - I don't actually understand UTF > well enough to understand what's wrong with the supplied string. I'm > assembling a series of strings, some of which are user-supplied, and > something causes the Text constructor to barf. > However, this is still completely unacceptable. If I need to stuff textual > data someplace - I need the container to *do* it. If user-supplied inputs > can't be stored as a "UTF" aware text value, then another container needs to > be brought into existence. Sure, I can use a BytesWritable, but, as its name > implies - Text should handle "text". If Text is supposed to == > "StringWritable", then, well, it doesn't, yet. > I admit to being a few weeks' back in the bleeding edge at this point, so > maybe my particluar Text bug has been fixed, though the only fixes to Text I > see are adopting it into more of the internals of Hadoop. This argument goes > double in that case - if we're using Text objects internally, it should > really be a totally solid object - construct one from a String, get one back, > but _never_ throw a content-related Exception. Or, if Text is not the right > object because its data-sensitive, then I argue we shouldn't use it in any > case where data might kill it - internal, or anywhere else (by default). > Please, don't remove UTF8, for now. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira