Re: A case for opImplicitCast: making string search work better

2009-05-15 Thread grauzone

downs wrote:

Consider this type:

struct StringPosition {
  size_t pos;
  void opImplicitCast(out size_t sz) {
sz = pos;
  }
  void opImplicitCast(out bool b) {
b = pos != -1;
  }
}

Wouldn't that effectively sidestep most problems people have with find 
returning -1?

Or am I missing something?


Could work, but it looks overcomplicated. It could be intuitive, but 
even then someone new would not be able to figure out what is actually 
going on, without digging deep into the internals of the library (or the 
D language).


I like my way better (returning two slices for search). Also, it 
wouldn't require this:



Of course, this would require a way to resolve ambiguities, i.e. functions/statements 
with preferences - for instance, if() would prefer bool over int. I don't 
know if this is possible.


...and with my way, it's very simple to check if the search was successful.

e.g.

void myfind(char[] text, char[] search_for, out char[] before, char[] 
after);


char[] before, after;
myfind(text, something, before, after);

//was it found?
bool was_found = !!after.length;
//where was it found?
int at = before.length;

Both operations are frequently needed and don't require you to reference 
text or something again, which means they can be returned by other 
functions, and you don't need to break the flow by putting them into 
temporary variables.


With multiple return values, the signature of myfind() could become 
nicer, too:


auto before, after = myfind(text, something);

(Or at least allow static arrays as return values for functions.)

Am _I_ missing something?


Re: A case for opImplicitCast: making string search work better

2009-05-15 Thread Steven Schveighoffer

On Fri, 15 May 2009 09:36:51 -0400, grauzone n...@example.net wrote:


downs wrote:

Consider this type:
 struct StringPosition {
  size_t pos;
  void opImplicitCast(out size_t sz) {
sz = pos;
  }
  void opImplicitCast(out bool b) {
b = pos != -1;
  }
}
 Wouldn't that effectively sidestep most problems people have with find  
returning -1?

 Or am I missing something?


Could work, but it looks overcomplicated. It could be intuitive, but  
even then someone new would not be able to figure out what is actually  
going on, without digging deep into the internals of the library (or the  
D language).


I like my way better (returning two slices for search). Also, it  
wouldn't require this:


Of course, this would require a way to resolve ambiguities, i.e.  
functions/statements with preferences - for instance, if() would  
prefer bool over int. I don't know if this is possible.


...and with my way, it's very simple to check if the search was  
successful.


e.g.

void myfind(char[] text, char[] search_for, out char[] before, char[]  
after);


char[] before, after;
myfind(text, something, before, after);

//was it found?
bool was_found = !!after.length;
//where was it found?
int at = before.length;

Both operations are frequently needed and don't require you to reference  
text or something again, which means they can be returned by other  
functions, and you don't need to break the flow by putting them into  
temporary variables.


With multiple return values, the signature of myfind() could become  
nicer, too:


auto before, after = myfind(text, something);

(Or at least allow static arrays as return values for functions.)

Am _I_ missing something?


Your solution actually goes the opposite direction than I'd like.  That  
is, it looks more complicated than simply returning an index or a slice.   
I don't want to have to declare return values ahead of time and I'm not  
holding my breath for multiple return values.  You may be able to return a  
pair struct, but still, what could be simpler than returning an index?   
It's easy to construct the value you want (before or after), and if you  
both multiple values, that is also possible (and probably results in  
simpler code).


-Steve


Re: A case for opImplicitCast: making string search work better

2009-05-15 Thread grauzone
to return a pair struct, but still, what could be simpler than returning 
an index?  It's easy to construct the value you want (before or after), 
and if you both multiple values, that is also possible (and probably 
results in simpler code).


All what you can do with the index is
1. compare it against the length of the searched string to test if the 
search was successful

2. slice the searched string
3. do something rather special

What else would you do? You'd just have to store the searched string as 
a temporary, and then you'd slice the searched string (for 2.), or 
compare it against the length of the searched string. You always have to 
keep the searched string in a temporary. That's rather unpractical. Oh 
sure, if you _really_ need the index (for 3.), then directly returning 
an index is of course the best way.


With my approach, you don't need to grab the passed searched string 
again. All of these can be done in a single, trivial expression (for 3. 
getting the index only). Actually, compared to your approach, this would 
just eliminate the trivial but annoying slicing code after the search 
call, that'd you'd type in... what, 90% of all cases?


The thing about multiple return values is true (sadly), but in this 
case, you could simply return a static array (char[][2]). At least that 
should be possible in D2 at some point.


Maybe a struct would work fine too. But I don't like it, because the 
programmer had to look up the struct members first. He had to memorize 
the struct members, and couldn't tell what the function returns just by 
looking at the function signature.


(Yay bikeshed issues.)


Re: A case for opImplicitCast: making string search work better

2009-05-15 Thread Steven Schveighoffer

On Fri, 15 May 2009 10:30:17 -0400, grauzone n...@example.net wrote:

to return a pair struct, but still, what could be simpler than  
returning an index?  It's easy to construct the value you want (before  
or after), and if you both multiple values, that is also possible (and  
probably results in simpler code).


All what you can do with the index is
1. compare it against the length of the searched string to test if the  
search was successful

2. slice the searched string
3. do something rather special

What else would you do? You'd just have to store the searched string as  
a temporary, and then you'd slice the searched string (for 2.), or  
compare it against the length of the searched string. You always have to  
keep the searched string in a temporary. That's rather unpractical. Oh  
sure, if you _really_ need the index (for 3.), then directly returning  
an index is of course the best way.


With my approach, you don't need to grab the passed searched string  
again. All of these can be done in a single, trivial expression (for 3.  
getting the index only). Actually, compared to your approach, this would  
just eliminate the trivial but annoying slicing code after the search  
call, that'd you'd type in... what, 90% of all cases?


I hadn't thought of the case where you are calling *on* a temporary, I  
always had in mind that the source string was already declared, this is a  
good point.  The only drawback in this case is you are constructing  
information you sometimes do not need or care about.  If all you want is  
whether it succeeded or not, then you don't need two ranges constructed  
and returned.  But therein lies a fundamental tradeoff that cannot be  
avoided.  The very basic information you get is the index, and with that,  
you can construct any larger pieces from the pieces you have, but not  
always easily, and not without repeating identifiers.


I like your approach, but with the single return type, not out  
parameters.  Having out parameters would be a deal breaker.


I'd prefer not to have two strings but a string that has an identified  
pivot point.  You could generate the desired left and right hand sides  
dynamically, and it would work without any changes to the current syntax.


for example:

struct partition(R)
{
   R range;
   uint pivot;

   R lhs() {return range[0..pivot];}
   R rhs() {return range[pivot..$];}
   bool found() {return pivot  range.length;}
}

partition!string indexOf(string haystack, dchar needle);

usage:

string s = str.find(hi).rhs; // or .lhs or .found or .pivot

Maybe a struct would work fine too. But I don't like it, because the  
programmer had to look up the struct members first. He had to memorize  
the struct members, and couldn't tell what the function returns just by  
looking at the function signature.


If this were implemented, the return type would be very common.  At some  
point you have to look up everything (what's a range?).


-Steve


Re: A case for opImplicitCast: making string search work better

2009-05-15 Thread Christopher Wright

downs wrote:

Consider this type:

struct StringPosition {
  size_t pos;
  void opImplicitCast(out size_t sz) {
sz = pos;
  }
  void opImplicitCast(out bool b) {
b = pos != -1;
  }
}

Wouldn't that effectively sidestep most problems people have with find 
returning -1?

Or am I missing something?

Of course, this would require a way to resolve ambiguities, i.e. functions/statements 
with preferences - for instance, if() would prefer bool over int. I don't 
know if this is possible.


Just use two functions: find and contains.


Re: A case for opImplicitCast: making string search work better

2009-05-15 Thread bearophile
Christopher Wright:
 Just use two functions: find and contains.

Or better, define a built in operator, you may call it in :-)

'e' in hello = true
(The compiler may even cache the resulting position somewhere, so a successive 
find can be very fast).

Bye,
bearophile


Re: A case for opImplicitCast: making string search work better

2009-05-15 Thread grauzone
a good point.  The only drawback in this case is you are constructing 
information you sometimes do not need or care about.  If all you want is 
whether it succeeded or not, then you don't need two ranges constructed 
and returned.  But therein lies a fundamental tradeoff that cannot be 
avoided.  The very basic information you get is the index, and with 
that, you can construct any larger pieces from the pieces you have, but 
not always easily, and not without repeating identifiers.


The whole point of the search function is to make programming easier, 
isn't it? Its implementation is rather trivial. You call it because it 
makes your life easier. I don't see why constructing this additional 
information is a problem.


Anyway, you always could move this to a second function. I just think 
that returning a tuple of slices is the most useful way.


I like your approach, but with the single return type, not out 
parameters.  Having out parameters would be a deal breaker.


I just wanted to show something, that works on D1 without memory 
allocation. And without returning a struct.


If this were implemented, the return type would be very common.  At some 
point you have to look up everything (what's a range?).


I think multiple return values are simpler, and more versatile, elegant 
and intuitive. I contrast, having to define structs for return values of 
(almost) trivial functions is not a good sign. You could as well pass 
all in-parameters of a function as struct, claiming this is more 
practical, because then you can have named arguments and arbitrary 
default arguments. Huh.