Based on a previous suggestion, i've decided to reuse
the lang.CharSet for this.

Unicode has nothing to do with quote characters.  Unicode
is two byte representation of a character (rather than a
single byte ASCII), and really only deals with character
sets that have more than 127 characters.

However, i will put in the ability to have a set for the
quote and for the whitespace characters as well since it
seems not to have a performance impact as far as i can tell.

-----Original Message-----
From: Stephen Colebourne [mailto:[EMAIL PROTECTED]
Sent: Friday, November 14, 2003 5:25 PM
To: Jakarta Commons Developers List
Subject: Re: [lang] [Bug 22692] - StringUtils.split ignores empty items


(BTW: I've been meaning to and forgetting to change the subject line to
include [lang]. Please use this for emails directed to lang)

This interface approach should work OK. Perhaps if the interface was
isMatch(char ch)  then it could be used in two forms, one for delimiters,
one for quotes. Thus:
  setDelimiterMatcher(Tokenizer.Matcher m)
  setQuoteMatcher(Tokenizer.Matcher m)
  setWhitespaceMatcher(Tokenizer.Matcher m)
The last might allow a simplification - I don't know for sure.

(I believe that unicode may have a group of 'quote' characters.Single quote,
double quote, opening and closing quotes, etc... so I think that allowing
for this is probably a good idea)

Stephen

From: "Inger, Matthew" <[EMAIL PROTECTED]>
> What about an interface:
>
> public class DelimitedTokenizer {
>
>    public static interface DelimiterSet {
>        public boolean isDelimiter(char c);
>    }
> }
>
> and having the ability to pass in this
> interface.  Of course, we'd still have a
> single char version as well, so someone
> might pass either a single char or an implementation
> of this interface as the delimiter.  I suppose I could
> do the same thing for quotes, but i find that less useful.
>
>
> -----Original Message-----
> From: Stephen Colebourne [mailto:[EMAIL PROTECTED]
> Sent: Friday, November 14, 2003 4:41 PM
> To: Jakarta Commons Developers List
> Subject: Re: [Bug 22692] - StringUtils.split ignores empty items
>
>
> Could the check as to whether a char is a delimiter be made into a method?
> The base class could just handle single characters, but people/us could
then
> write subclasses for multiple delimiters or a check such as
> Character.isWhitespace(). I haven't checked, but is this feasible?
>
> Stephen
>
> From: "Inger, Matthew" <[EMAIL PROTECTED]>
> > checking for more than 1 delimiter or quote character would most likely
> > slow the implementation down significantly, and i'm not
> > sure we would want to do that.
> >
> > The rest of the suggestions  could easily be implemented.
> >
> > -----Original Message-----
> > From: Stephen Colebourne [mailto:[EMAIL PROTECTED]
> > Sent: Friday, November 14, 2003 4:18 PM
> > To: Jakarta Commons Developers List
> > Subject: Re: [Bug 22692] - StringUtils.split ignores empty items
> >
> >
> > Thank you for your submission. The implementation looks to have the
basics
> > of what is needed for a StringTokenizer replacement. My suggestions:
> >
> > 1) The implementation is perhaps a little too CSV focussed at present.
For
> > example, by default I would expect settings similar to StringTokenizer,
> > splitting on whitespace.
> >
> > 2) There is no ability to suport multiple delimiters or multiple quote
> > tokens. Related to #2.
> >
> > 3) There seems to be no way to ignore null/empty strings (ie. not return
> > them)
> >
> > 4) The coding style doesn't match the rest of [lang], ie. curly brackets
> > most noticeably
> >
> > 5) Implement java.util.Iterator to gives extra flexibility. (no need to
> > implement remove()) Keep nextToken() of course!
> >
> > 6) Maybe add nextTokenAsBoolean(), nextTokenAsInt() to handle the most
> > common conversions when reading a known format file like CSV.
> >
> > I definitely want to see a Tokenizer in [lang], and this looks like a
good
> > start. (I suggest Tokenizer is a sufficiently good name). We also need
to
> > ensure that it performs well!
> > Thanks
> > Stephen
> >
> > ----- Original Message -----
> > From: "Inger, Matthew" <[EMAIL PROTECTED]>
> > > FYI:  I have submitted the DelimitedTokenizer class.
> > > Could one of the committers please review this defect,
> > > and commit the new files I have uploaded?  Or, i'd be
> > > open to being a committer myself, and just checking it
> > > in using cvs.
> > >
> > >
> > > -----Original Message-----
> > > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
> > > Sent: Wednesday, November 12, 2003 10:00 AM
> > > To: [EMAIL PROTECTED]
> > > Subject: DO NOT REPLY [Bug 22692] - StringUtils.split ignores empty
> > > items
> > >
> > >
> > > DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG
> > > RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
> > > <http://nagoya.apache.org/bugzilla/show_bug.cgi?id=22692>.
> > > ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND
> > > INSERTED IN THE BUG DATABASE.
> > >
> > > http://nagoya.apache.org/bugzilla/show_bug.cgi?id=22692
> > >
> > > StringUtils.split ignores empty items
> > >
> > >
> > >
> > >
> > >
> > > ------- Additional Comments From [EMAIL PROTECTED]  2003-11-12 14:59
> > > -------
> > > The attachment uploaded at 14:56 supercedes the one uploaded at 13:20
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > > For additional commands, e-mail: [EMAIL PROTECTED]
> > >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to