Re: std.algorithm.startsWith with maximal matching

2012-01-15 Thread H. S. Teoh
On Sat, Jan 14, 2012 at 07:53:10PM -0800, Jonathan M Davis wrote:
[...]
 Actually, if the word has to match exactly, then startsWith isn't
 going to cut it. What you need to do is outright strip the punctuation
 from both ends.  You'd need something more like
 
 word = find!(not!(std.uni.isPunctuation))(word);
 word = array(until!(std.uni.isPunctuation)(word));
 
 if(canFind(wordList, word))
 {
 //...
 }
[...]

Thanks for the info, but this method has the flaw that the original
punctuation is lost, unless I work with a copy of the word. I was hoping
for a nice way to do matching in-place.

But perhaps what I need is a full-fledged lexer after all. Unless
there's a nice way of saying match up to some predicate that determines
the end of the word in the current infrastructure.


T

-- 
Claiming that your operating system is the best in the world because more 
people use it is like saying McDonalds makes the best food in the world. -- 
Carl B. Constantine


Re: std.algorithm.startsWith with maximal matching

2012-01-15 Thread Jonathan M Davis
On Sunday, January 15, 2012 11:23:04 H. S. Teoh wrote:
 On Sat, Jan 14, 2012 at 07:53:10PM -0800, Jonathan M Davis wrote:
 [...]
 
  Actually, if the word has to match exactly, then startsWith isn't
  going to cut it. What you need to do is outright strip the punctuation
  from both ends.  You'd need something more like
  
  word = find!(not!(std.uni.isPunctuation))(word);
  word = array(until!(std.uni.isPunctuation)(word));
  
  if(canFind(wordList, word))
  {
  
  //...
  
  }
 
 [...]
 
 Thanks for the info, but this method has the flaw that the original
 punctuation is lost, unless I work with a copy of the word. I was hoping
 for a nice way to do matching in-place.
 
 But perhaps what I need is a full-fledged lexer after all. Unless
 there's a nice way of saying match up to some predicate that determines
 the end of the word in the current infrastructure.

Depending on what you're doing, a full-blown lexer would indeed make more 
sense. You could make splitter's predicate split on both whitespace and 
punctuation if that helps. But as for search in words, look at the various 
functions in std.range and std.algorithm. In particular, the ones listed as 
being in the searching category at the top of std.algorithm are likely to be 
of help. But what the exact combination of them is that will do the best job 
for you, I don't know, since I don't fully understand what your exact 
requirements are. And it's definitely possible that what you need is a function 
which doesn't exist in Phobos. What's there is quite good, but it doesn't 
cover every scenario.

- Jonathan M Davis


Re: std.algorithm.startsWith with maximal matching

2012-01-14 Thread H. S. Teoh
On Fri, Jan 13, 2012 at 09:30:35PM -0800, Jonathan M Davis wrote:
 On Friday, January 13, 2012 18:47:19 H. S. Teoh wrote:
[...]
  But what I really want to accomplish is to parse a string containing
  multiple words; at each point I have a list of permitted words that
  need to be matched against the string; substring matches don't
  count. I already have a way of skipping over spaces; so for medial
  words, I can simulate this by appending a space to the end of the
  word list passed to startsWith(). However, this doesn't work when
  the word being matched is at the very end of the string, or if it is
  followed by punctuation.
  
  Is there another library function that can do this, or do I just
  have to roll my own?
 
 Use std.array.split. It will split a string into an array of strings
 using whitespace as the delimiter.
[...]

What about punctuation?


T

-- 
Don't modify spaghetti code unless you can eat the consequences.


Re: std.algorithm.startsWith with maximal matching

2012-01-14 Thread Jonathan M Davis
On Saturday, January 14, 2012 19:13:02 H. S. Teoh wrote:
 On Fri, Jan 13, 2012 at 09:30:35PM -0800, Jonathan M Davis wrote:
  On Friday, January 13, 2012 18:47:19 H. S. Teoh wrote:
 [...]
 
   But what I really want to accomplish is to parse a string containing
   multiple words; at each point I have a list of permitted words that
   need to be matched against the string; substring matches don't
   count. I already have a way of skipping over spaces; so for medial
   words, I can simulate this by appending a space to the end of the
   word list passed to startsWith(). However, this doesn't work when
   the word being matched is at the very end of the string, or if it is
   followed by punctuation.
   
   Is there another library function that can do this, or do I just
   have to roll my own?
  
  Use std.array.split. It will split a string into an array of strings
  using whitespace as the delimiter.
 
 [...]
 
 What about punctuation?

If you have to worry about punctuation, then == isn't going to work. You'll 
need to use some other combination of functions to strip the punctuation from 
one or both ends of the word. One possible solution would be something like

foreach(word; splitter!(std.uni.isWhite)(str))
{
auto found = find!(not!(std.uni.isPunctuation))(word);
if(found.startsWith(listOfWords))
{
//...
}
}

- Jonathan M Davis


Re: std.algorithm.startsWith with maximal matching

2012-01-14 Thread Jonathan M Davis
On Saturday, January 14, 2012 19:45:55 Jonathan M Davis wrote:
 If you have to worry about punctuation, then == isn't going to work. You'll
 need to use some other combination of functions to strip the punctuation
 from one or both ends of the word. One possible solution would be something
 like
 
 foreach(word; splitter!(std.uni.isWhite)(str))
 {
 auto found = find!(not!(std.uni.isPunctuation))(word);
 if(found.startsWith(listOfWords))
 {
 //...
 }
 }

Actually, if the word has to match exactly, then startsWith isn't going to cut 
it. What you need to do is outright strip the punctuation from both ends. 
You'd need something more like

word = find!(not!(std.uni.isPunctuation))(word);
word = array(until!(std.uni.isPunctuation)(word));

if(canFind(wordList, word))
{
//...
}

- Jonathan M Davis


Re: std.algorithm.startsWith with maximal matching

2012-01-13 Thread Jonathan M Davis
On Friday, January 13, 2012 16:48:00 H. S. Teoh wrote:
 Hi all,
 
 I'm reading the docs for startsWith(A,B...) with multiple ranges in B,
 and it seems that it will always match the *shortest* range whenever
 more than one range in B matches. Is there a way to make it always match
 the *longest* range instead? Or do I have to write my own function for
 that?

It doesn't have a way to tell it which one to match if multiple match. It just 
takes the range that you're looking at and the list of elements and/or ranges 
that the first range might start with. It has to have a way to decide which one 
to match when multiple match, and the most efficient (and easiest) way is to 
match the shortest. So, that's what it does.

- Jonathan M Davis


Re: std.algorithm.startsWith with maximal matching

2012-01-13 Thread H. S. Teoh
On Fri, Jan 13, 2012 at 09:36:07PM -0500, Jonathan M Davis wrote:
 On Friday, January 13, 2012 16:48:00 H. S. Teoh wrote:
  Hi all,
  
  I'm reading the docs for startsWith(A,B...) with multiple ranges in B,
  and it seems that it will always match the *shortest* range whenever
  more than one range in B matches. Is there a way to make it always match
  the *longest* range instead? Or do I have to write my own function for
  that?
 
 It doesn't have a way to tell it which one to match if multiple match. It 
 just 
 takes the range that you're looking at and the list of elements and/or ranges 
 that the first range might start with. It has to have a way to decide which 
 one 
 to match when multiple match, and the most efficient (and easiest) way is to 
 match the shortest. So, that's what it does.
[...]

I suppose that's reasonable.

But what I really want to accomplish is to parse a string containing
multiple words; at each point I have a list of permitted words that need
to be matched against the string; substring matches don't count. I
already have a way of skipping over spaces; so for medial words, I can
simulate this by appending a space to the end of the word list passed to
startsWith(). However, this doesn't work when the word being matched is
at the very end of the string, or if it is followed by punctuation.

Is there another library function that can do this, or do I just have to
roll my own?


T

-- 
Philosophy: how to make a career out of daydreaming.


Re: std.algorithm.startsWith with maximal matching

2012-01-13 Thread Jonathan M Davis
On Friday, January 13, 2012 18:47:19 H. S. Teoh wrote:
 On Fri, Jan 13, 2012 at 09:36:07PM -0500, Jonathan M Davis wrote:
  On Friday, January 13, 2012 16:48:00 H. S. Teoh wrote:
   Hi all,
   
   I'm reading the docs for startsWith(A,B...) with multiple ranges in
   B,
   and it seems that it will always match the *shortest* range whenever
   more than one range in B matches. Is there a way to make it always
   match the *longest* range instead? Or do I have to write my own
   function for that?
  
  It doesn't have a way to tell it which one to match if multiple match.
  It just takes the range that you're looking at and the list of elements
  and/or ranges that the first range might start with. It has to have a
  way to decide which one to match when multiple match, and the most
  efficient (and easiest) way is to match the shortest. So, that's what
  it does.
 
 [...]
 
 I suppose that's reasonable.
 
 But what I really want to accomplish is to parse a string containing
 multiple words; at each point I have a list of permitted words that need
 to be matched against the string; substring matches don't count. I
 already have a way of skipping over spaces; so for medial words, I can
 simulate this by appending a space to the end of the word list passed to
 startsWith(). However, this doesn't work when the word being matched is
 at the very end of the string, or if it is followed by punctuation.
 
 Is there another library function that can do this, or do I just have to
 roll my own?

Use std.array.split. It will split a string into an array of strings using 
whitespace as the delimiter. And if you want a lazy solution (one which avoids 
allocating another array), then using std.algorithm.splitter, and give it 
std.ascii.whitespace as its delimiter. You don't need a custom solution to 
split strings along whitespace. And if you need to compare entire words rather 
than just their beginning, once you have the word split out, you can just use 
== instead of startsWith.

- Jonathan M Davis