From: "Inger, Matthew" <[EMAIL PROTECTED]> > Unicode has nothing to do with quote characters. Unicode > is two byte representation of a character (rather than a > single byte ASCII), and really only deals with character > sets that have more than 127 characters. Not quite true actually. If you look at the top of the JDK Character class, you will find constants for INITIAL_QUOTE_PUNCTUATION and FINAL_QUOTE_PUNCTUATION amongst others. The unicode standard classifies each of the characters into one or more of these groups. Thus you can write Character.getType(ch) and compare the type returned.
Stephen > However, i will put in the ability to have a set for the > quote and for the whitespace characters as well since it > seems not to have a performance impact as far as i can tell. > > -----Original Message----- > From: Stephen Colebourne [mailto:[EMAIL PROTECTED] > Sent: Friday, November 14, 2003 5:25 PM > To: Jakarta Commons Developers List > Subject: Re: [lang] [Bug 22692] - StringUtils.split ignores empty items > > > (BTW: I've been meaning to and forgetting to change the subject line to > include [lang]. Please use this for emails directed to lang) > > This interface approach should work OK. Perhaps if the interface was > isMatch(char ch) then it could be used in two forms, one for delimiters, > one for quotes. Thus: > setDelimiterMatcher(Tokenizer.Matcher m) > setQuoteMatcher(Tokenizer.Matcher m) > setWhitespaceMatcher(Tokenizer.Matcher m) > The last might allow a simplification - I don't know for sure. > > (I believe that unicode may have a group of 'quote' characters.Single quote, > double quote, opening and closing quotes, etc... so I think that allowing > for this is probably a good idea) > > Stephen > > From: "Inger, Matthew" <[EMAIL PROTECTED]> > > What about an interface: > > > > public class DelimitedTokenizer { > > > > public static interface DelimiterSet { > > public boolean isDelimiter(char c); > > } > > } > > > > and having the ability to pass in this > > interface. Of course, we'd still have a > > single char version as well, so someone > > might pass either a single char or an implementation > > of this interface as the delimiter. I suppose I could > > do the same thing for quotes, but i find that less useful. > > > > > > -----Original Message----- > > From: Stephen Colebourne [mailto:[EMAIL PROTECTED] > > Sent: Friday, November 14, 2003 4:41 PM > > To: Jakarta Commons Developers List > > Subject: Re: [Bug 22692] - StringUtils.split ignores empty items > > > > > > Could the check as to whether a char is a delimiter be made into a method? > > The base class could just handle single characters, but people/us could > then > > write subclasses for multiple delimiters or a check such as > > Character.isWhitespace(). I haven't checked, but is this feasible? > > > > Stephen > > > > From: "Inger, Matthew" <[EMAIL PROTECTED]> > > > checking for more than 1 delimiter or quote character would most likely > > > slow the implementation down significantly, and i'm not > > > sure we would want to do that. > > > > > > The rest of the suggestions could easily be implemented. > > > > > > -----Original Message----- > > > From: Stephen Colebourne [mailto:[EMAIL PROTECTED] > > > Sent: Friday, November 14, 2003 4:18 PM > > > To: Jakarta Commons Developers List > > > Subject: Re: [Bug 22692] - StringUtils.split ignores empty items > > > > > > > > > Thank you for your submission. The implementation looks to have the > basics > > > of what is needed for a StringTokenizer replacement. My suggestions: > > > > > > 1) The implementation is perhaps a little too CSV focussed at present. > For > > > example, by default I would expect settings similar to StringTokenizer, > > > splitting on whitespace. > > > > > > 2) There is no ability to suport multiple delimiters or multiple quote > > > tokens. Related to #2. > > > > > > 3) There seems to be no way to ignore null/empty strings (ie. not return > > > them) > > > > > > 4) The coding style doesn't match the rest of [lang], ie. curly brackets > > > most noticeably > > > > > > 5) Implement java.util.Iterator to gives extra flexibility. (no need to > > > implement remove()) Keep nextToken() of course! > > > > > > 6) Maybe add nextTokenAsBoolean(), nextTokenAsInt() to handle the most > > > common conversions when reading a known format file like CSV. > > > > > > I definitely want to see a Tokenizer in [lang], and this looks like a > good > > > start. (I suggest Tokenizer is a sufficiently good name). We also need > to > > > ensure that it performs well! > > > Thanks > > > Stephen > > > > > > ----- Original Message ----- > > > From: "Inger, Matthew" <[EMAIL PROTECTED]> > > > > FYI: I have submitted the DelimitedTokenizer class. > > > > Could one of the committers please review this defect, > > > > and commit the new files I have uploaded? Or, i'd be > > > > open to being a committer myself, and just checking it > > > > in using cvs. > > > > > > > > > > > > -----Original Message----- > > > > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] > > > > Sent: Wednesday, November 12, 2003 10:00 AM > > > > To: [EMAIL PROTECTED] > > > > Subject: DO NOT REPLY [Bug 22692] - StringUtils.split ignores empty > > > > items > > > > > > > > > > > > DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG > > > > RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT > > > > <http://nagoya.apache.org/bugzilla/show_bug.cgi?id=22692>. > > > > ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND > > > > INSERTED IN THE BUG DATABASE. > > > > > > > > http://nagoya.apache.org/bugzilla/show_bug.cgi?id=22692 > > > > > > > > StringUtils.split ignores empty items > > > > > > > > > > > > > > > > > > > > > > > > ------- Additional Comments From [EMAIL PROTECTED] 2003-11-12 14:59 > > > > ------- > > > > The attachment uploaded at 14:56 supercedes the one uploaded at 13:20 > > > > > > > > --------------------------------------------------------------------- > > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]