std.algorithm.startsWith with maximal matching

2012-01-13 Thread H. S. Teoh
Hi all,

I'm reading the docs for startsWith(A,B...) with multiple ranges in B,
and it seems that it will always match the *shortest* range whenever
more than one range in B matches. Is there a way to make it always match
the *longest* range instead? Or do I have to write my own function for
that?

Thanks!


T

-- 
Why can't you just be a nonconformist like everyone else? -- YHL


Re: std.algorithm.startsWith with maximal matching

2012-01-13 Thread Jonathan M Davis
On Friday, January 13, 2012 16:48:00 H. S. Teoh wrote:
> Hi all,
> 
> I'm reading the docs for startsWith(A,B...) with multiple ranges in B,
> and it seems that it will always match the *shortest* range whenever
> more than one range in B matches. Is there a way to make it always match
> the *longest* range instead? Or do I have to write my own function for
> that?

It doesn't have a way to tell it which one to match if multiple match. It just 
takes the range that you're looking at and the list of elements and/or ranges 
that the first range might start with. It has to have a way to decide which one 
to match when multiple match, and the most efficient (and easiest) way is to 
match the shortest. So, that's what it does.

- Jonathan M Davis


Re: std.algorithm.startsWith with maximal matching

2012-01-13 Thread H. S. Teoh
On Fri, Jan 13, 2012 at 09:36:07PM -0500, Jonathan M Davis wrote:
> On Friday, January 13, 2012 16:48:00 H. S. Teoh wrote:
> > Hi all,
> > 
> > I'm reading the docs for startsWith(A,B...) with multiple ranges in B,
> > and it seems that it will always match the *shortest* range whenever
> > more than one range in B matches. Is there a way to make it always match
> > the *longest* range instead? Or do I have to write my own function for
> > that?
> 
> It doesn't have a way to tell it which one to match if multiple match. It 
> just 
> takes the range that you're looking at and the list of elements and/or ranges 
> that the first range might start with. It has to have a way to decide which 
> one 
> to match when multiple match, and the most efficient (and easiest) way is to 
> match the shortest. So, that's what it does.
[...]

I suppose that's reasonable.

But what I really want to accomplish is to parse a string containing
multiple words; at each point I have a list of permitted words that need
to be matched against the string; substring matches don't count. I
already have a way of skipping over spaces; so for medial words, I can
simulate this by appending a space to the end of the word list passed to
startsWith(). However, this doesn't work when the word being matched is
at the very end of the string, or if it is followed by punctuation.

Is there another library function that can do this, or do I just have to
roll my own?


T

-- 
Philosophy: how to make a career out of daydreaming.


Re: std.algorithm.startsWith with maximal matching

2012-01-13 Thread Jonathan M Davis
On Friday, January 13, 2012 18:47:19 H. S. Teoh wrote:
> On Fri, Jan 13, 2012 at 09:36:07PM -0500, Jonathan M Davis wrote:
> > On Friday, January 13, 2012 16:48:00 H. S. Teoh wrote:
> > > Hi all,
> > > 
> > > I'm reading the docs for startsWith(A,B...) with multiple ranges in
> > > B,
> > > and it seems that it will always match the *shortest* range whenever
> > > more than one range in B matches. Is there a way to make it always
> > > match the *longest* range instead? Or do I have to write my own
> > > function for that?
> > 
> > It doesn't have a way to tell it which one to match if multiple match.
> > It just takes the range that you're looking at and the list of elements
> > and/or ranges that the first range might start with. It has to have a
> > way to decide which one to match when multiple match, and the most
> > efficient (and easiest) way is to match the shortest. So, that's what
> > it does.
> 
> [...]
> 
> I suppose that's reasonable.
> 
> But what I really want to accomplish is to parse a string containing
> multiple words; at each point I have a list of permitted words that need
> to be matched against the string; substring matches don't count. I
> already have a way of skipping over spaces; so for medial words, I can
> simulate this by appending a space to the end of the word list passed to
> startsWith(). However, this doesn't work when the word being matched is
> at the very end of the string, or if it is followed by punctuation.
> 
> Is there another library function that can do this, or do I just have to
> roll my own?

Use std.array.split. It will split a string into an array of strings using 
whitespace as the delimiter. And if you want a lazy solution (one which avoids 
allocating another array), then using std.algorithm.splitter, and give it 
std.ascii.whitespace as its delimiter. You don't need a custom solution to 
split strings along whitespace. And if you need to compare entire words rather 
than just their beginning, once you have the word split out, you can just use 
== instead of startsWith.

- Jonathan M Davis


Re: std.algorithm.startsWith with maximal matching

2012-01-14 Thread H. S. Teoh
On Fri, Jan 13, 2012 at 09:30:35PM -0800, Jonathan M Davis wrote:
> On Friday, January 13, 2012 18:47:19 H. S. Teoh wrote:
[...]
> > But what I really want to accomplish is to parse a string containing
> > multiple words; at each point I have a list of permitted words that
> > need to be matched against the string; substring matches don't
> > count. I already have a way of skipping over spaces; so for medial
> > words, I can simulate this by appending a space to the end of the
> > word list passed to startsWith(). However, this doesn't work when
> > the word being matched is at the very end of the string, or if it is
> > followed by punctuation.
> > 
> > Is there another library function that can do this, or do I just
> > have to roll my own?
> 
> Use std.array.split. It will split a string into an array of strings
> using whitespace as the delimiter.
[...]

What about punctuation?


T

-- 
Don't modify spaghetti code unless you can eat the consequences.


Re: std.algorithm.startsWith with maximal matching

2012-01-14 Thread Jonathan M Davis
On Saturday, January 14, 2012 19:13:02 H. S. Teoh wrote:
> On Fri, Jan 13, 2012 at 09:30:35PM -0800, Jonathan M Davis wrote:
> > On Friday, January 13, 2012 18:47:19 H. S. Teoh wrote:
> [...]
> 
> > > But what I really want to accomplish is to parse a string containing
> > > multiple words; at each point I have a list of permitted words that
> > > need to be matched against the string; substring matches don't
> > > count. I already have a way of skipping over spaces; so for medial
> > > words, I can simulate this by appending a space to the end of the
> > > word list passed to startsWith(). However, this doesn't work when
> > > the word being matched is at the very end of the string, or if it is
> > > followed by punctuation.
> > > 
> > > Is there another library function that can do this, or do I just
> > > have to roll my own?
> > 
> > Use std.array.split. It will split a string into an array of strings
> > using whitespace as the delimiter.
> 
> [...]
> 
> What about punctuation?

If you have to worry about punctuation, then == isn't going to work. You'll 
need to use some other combination of functions to strip the punctuation from 
one or both ends of the word. One possible solution would be something like

foreach(word; splitter!(std.uni.isWhite)(str))
{
auto found = find!(not!(std.uni.isPunctuation))(word);
if(found.startsWith(listOfWords))
{
//...
}
}

- Jonathan M Davis


Re: std.algorithm.startsWith with maximal matching

2012-01-14 Thread Jonathan M Davis
On Saturday, January 14, 2012 19:45:55 Jonathan M Davis wrote:
> If you have to worry about punctuation, then == isn't going to work. You'll
> need to use some other combination of functions to strip the punctuation
> from one or both ends of the word. One possible solution would be something
> like
> 
> foreach(word; splitter!(std.uni.isWhite)(str))
> {
> auto found = find!(not!(std.uni.isPunctuation))(word);
> if(found.startsWith(listOfWords))
> {
> //...
> }
> }

Actually, if the word has to match exactly, then startsWith isn't going to cut 
it. What you need to do is outright strip the punctuation from both ends. 
You'd need something more like

word = find!(not!(std.uni.isPunctuation))(word);
word = array(until!(std.uni.isPunctuation)(word));

if(canFind(wordList, word))
{
//...
}

- Jonathan M Davis


Re: std.algorithm.startsWith with maximal matching

2012-01-15 Thread H. S. Teoh
On Sat, Jan 14, 2012 at 07:53:10PM -0800, Jonathan M Davis wrote:
[...]
> Actually, if the word has to match exactly, then startsWith isn't
> going to cut it. What you need to do is outright strip the punctuation
> from both ends.  You'd need something more like
> 
> word = find!(not!(std.uni.isPunctuation))(word);
> word = array(until!(std.uni.isPunctuation)(word));
> 
> if(canFind(wordList, word))
> {
> //...
> }
[...]

Thanks for the info, but this method has the flaw that the original
punctuation is lost, unless I work with a copy of the word. I was hoping
for a nice way to do matching in-place.

But perhaps what I need is a full-fledged lexer after all. Unless
there's a nice way of saying "match up to some predicate that determines
the end of the word" in the current infrastructure.


T

-- 
Claiming that your operating system is the best in the world because more 
people use it is like saying McDonalds makes the best food in the world. -- 
Carl B. Constantine


Re: std.algorithm.startsWith with maximal matching

2012-01-15 Thread Jonathan M Davis
On Sunday, January 15, 2012 11:23:04 H. S. Teoh wrote:
> On Sat, Jan 14, 2012 at 07:53:10PM -0800, Jonathan M Davis wrote:
> [...]
> 
> > Actually, if the word has to match exactly, then startsWith isn't
> > going to cut it. What you need to do is outright strip the punctuation
> > from both ends.  You'd need something more like
> > 
> > word = find!(not!(std.uni.isPunctuation))(word);
> > word = array(until!(std.uni.isPunctuation)(word));
> > 
> > if(canFind(wordList, word))
> > {
> > 
> > //...
> > 
> > }
> 
> [...]
> 
> Thanks for the info, but this method has the flaw that the original
> punctuation is lost, unless I work with a copy of the word. I was hoping
> for a nice way to do matching in-place.
> 
> But perhaps what I need is a full-fledged lexer after all. Unless
> there's a nice way of saying "match up to some predicate that determines
> the end of the word" in the current infrastructure.

Depending on what you're doing, a full-blown lexer would indeed make more 
sense. You could make splitter's predicate split on both whitespace and 
punctuation if that helps. But as for search in words, look at the various 
functions in std.range and std.algorithm. In particular, the ones listed as 
being in the "searching" category at the top of std.algorithm are likely to be 
of help. But what the exact combination of them is that will do the best job 
for you, I don't know, since I don't fully understand what your exact 
requirements are. And it's definitely possible that what you need is a function 
which doesn't exist in Phobos. What's there is quite good, but it doesn't 
cover every scenario.

- Jonathan M Davis