I've uploaded the source and test files. The inner class is named FastCharSet, since it's no longer strictly for delimiters, and it's pretty fast (though it's very simple and does only the basic thing we need to do). Please review and commit the source code please. It might be worthwhile to breakout the FastCharSet class into it's own class, but i'll leave that up to whoever commits it.
-----Original Message----- From: Inger, Matthew [mailto:[EMAIL PROTECTED] Sent: Friday, November 14, 2003 5:46 PM To: 'Jakarta Commons Developers List' Subject: RE: [lang] [Bug 22692] - StringUtils.split ignores empty items I see what you mean. It appears, as robust as CharSet it, is does way too much, and is slow for what we need it for. I'm going back to DelimiterSet, but rather than an interface, it will be an inner class with several constructors: public DelimiterSet(char[]); public DelimiterSet(String); public DelimiterSet(char); and two useful methods: public boolean contains(char); public char[] getChars(); This will be an immutable object. The constructor sorts the character array using Arrays.sort, and the contains method uses Arrays.binarySearch. This should give us a pretty efficient algorithm for the contains method. There's also a predefined whitespace delimiter set "WHITESPACE_DELIMITERSET" so people don't have to construct their own all the time. -----Original Message----- From: Stephen Colebourne [mailto:[EMAIL PROTECTED] Sent: Friday, November 14, 2003 5:26 PM To: Jakarta Commons Developers List Subject: Re: [lang] [Bug 22692] - StringUtils.split ignores empty items An interesting idea, although the performance would be very poor without some effort in the CharSet class. Stephen From: "Todd V. Jonker" <[EMAIL PROTECTED]> > Or just use lang.CharSet > > > On Fri, 14 Nov 2003 16:58:45 -0500, "Inger, Matthew" <[EMAIL PROTECTED]> > said: > > What about an interface: > > > > public class DelimitedTokenizer { > > > > public static interface DelimiterSet { > > public boolean isDelimiter(char c); > > } > > } > > > > and having the ability to pass in this > > interface. Of course, we'd still have a > > single char version as well, so someone > > might pass either a single char or an implementation > > of this interface as the delimiter. I suppose I could > > do the same thing for quotes, but i find that less useful. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]