From: "Inger, Matthew" <[EMAIL PROTECTED]>
> Unicode has nothing to do with quote characters.  Unicode
> is two byte representation of a character (rather than a
> single byte ASCII), and really only deals with character
> sets that have more than 127 characters.
Not quite true actually. If you look at the top of the JDK Character class,
you will find constants for INITIAL_QUOTE_PUNCTUATION and
FINAL_QUOTE_PUNCTUATION amongst others. The unicode standard classifies each
of the characters into one or more of these groups. Thus you can write
Character.getType(ch) and compare the type returned.

Stephen

> However, i will put in the ability to have a set for the
> quote and for the whitespace characters as well since it
> seems not to have a performance impact as far as i can tell.
>
> -----Original Message-----
> From: Stephen Colebourne [mailto:[EMAIL PROTECTED]
> Sent: Friday, November 14, 2003 5:25 PM
> To: Jakarta Commons Developers List
> Subject: Re: [lang] [Bug 22692] - StringUtils.split ignores empty items
>
>
> (BTW: I've been meaning to and forgetting to change the subject line to
> include [lang]. Please use this for emails directed to lang)
>
> This interface approach should work OK. Perhaps if the interface was
> isMatch(char ch)  then it could be used in two forms, one for delimiters,
> one for quotes. Thus:
>   setDelimiterMatcher(Tokenizer.Matcher m)
>   setQuoteMatcher(Tokenizer.Matcher m)
>   setWhitespaceMatcher(Tokenizer.Matcher m)
> The last might allow a simplification - I don't know for sure.
>
> (I believe that unicode may have a group of 'quote' characters.Single
quote,
> double quote, opening and closing quotes, etc... so I think that allowing
> for this is probably a good idea)
>
> Stephen
>
> From: "Inger, Matthew" <[EMAIL PROTECTED]>
> > What about an interface:
> >
> > public class DelimitedTokenizer {
> >
> >    public static interface DelimiterSet {
> >        public boolean isDelimiter(char c);
> >    }
> > }
> >
> > and having the ability to pass in this
> > interface.  Of course, we'd still have a
> > single char version as well, so someone
> > might pass either a single char or an implementation
> > of this interface as the delimiter.  I suppose I could
> > do the same thing for quotes, but i find that less useful.
> >
> >
> > -----Original Message-----
> > From: Stephen Colebourne [mailto:[EMAIL PROTECTED]
> > Sent: Friday, November 14, 2003 4:41 PM
> > To: Jakarta Commons Developers List
> > Subject: Re: [Bug 22692] - StringUtils.split ignores empty items
> >
> >
> > Could the check as to whether a char is a delimiter be made into a
method?
> > The base class could just handle single characters, but people/us could
> then
> > write subclasses for multiple delimiters or a check such as
> > Character.isWhitespace(). I haven't checked, but is this feasible?
> >
> > Stephen
> >
> > From: "Inger, Matthew" <[EMAIL PROTECTED]>
> > > checking for more than 1 delimiter or quote character would most
likely
> > > slow the implementation down significantly, and i'm not
> > > sure we would want to do that.
> > >
> > > The rest of the suggestions  could easily be implemented.
> > >
> > > -----Original Message-----
> > > From: Stephen Colebourne [mailto:[EMAIL PROTECTED]
> > > Sent: Friday, November 14, 2003 4:18 PM
> > > To: Jakarta Commons Developers List
> > > Subject: Re: [Bug 22692] - StringUtils.split ignores empty items
> > >
> > >
> > > Thank you for your submission. The implementation looks to have the
> basics
> > > of what is needed for a StringTokenizer replacement. My suggestions:
> > >
> > > 1) The implementation is perhaps a little too CSV focussed at present.
> For
> > > example, by default I would expect settings similar to
StringTokenizer,
> > > splitting on whitespace.
> > >
> > > 2) There is no ability to suport multiple delimiters or multiple quote
> > > tokens. Related to #2.
> > >
> > > 3) There seems to be no way to ignore null/empty strings (ie. not
return
> > > them)
> > >
> > > 4) The coding style doesn't match the rest of [lang], ie. curly
brackets
> > > most noticeably
> > >
> > > 5) Implement java.util.Iterator to gives extra flexibility. (no need
to
> > > implement remove()) Keep nextToken() of course!
> > >
> > > 6) Maybe add nextTokenAsBoolean(), nextTokenAsInt() to handle the most
> > > common conversions when reading a known format file like CSV.
> > >
> > > I definitely want to see a Tokenizer in [lang], and this looks like a
> good
> > > start. (I suggest Tokenizer is a sufficiently good name). We also need
> to
> > > ensure that it performs well!
> > > Thanks
> > > Stephen
> > >
> > > ----- Original Message -----
> > > From: "Inger, Matthew" <[EMAIL PROTECTED]>
> > > > FYI:  I have submitted the DelimitedTokenizer class.
> > > > Could one of the committers please review this defect,
> > > > and commit the new files I have uploaded?  Or, i'd be
> > > > open to being a committer myself, and just checking it
> > > > in using cvs.
> > > >
> > > >
> > > > -----Original Message-----
> > > > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
> > > > Sent: Wednesday, November 12, 2003 10:00 AM
> > > > To: [EMAIL PROTECTED]
> > > > Subject: DO NOT REPLY [Bug 22692] - StringUtils.split ignores empty
> > > > items
> > > >
> > > >
> > > > DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG
> > > > RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
> > > > <http://nagoya.apache.org/bugzilla/show_bug.cgi?id=22692>.
> > > > ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND
> > > > INSERTED IN THE BUG DATABASE.
> > > >
> > > > http://nagoya.apache.org/bugzilla/show_bug.cgi?id=22692
> > > >
> > > > StringUtils.split ignores empty items
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > ------- Additional Comments From [EMAIL PROTECTED]  2003-11-12
14:59
> > > > -------
> > > > The attachment uploaded at 14:56 supercedes the one uploaded at
13:20
> > > >
> > >
> ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > > > For additional commands, e-mail: [EMAIL PROTECTED]
> > > >
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > > For additional commands, e-mail: [EMAIL PROTECTED]
> > >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to