Re: std.regexp vs std.regex [Re: RegExp.find() now crippled]

2010-11-17 Thread Steve Teale
Andrei Alexandrescu Wrote:

 
 It's probably common courtesy that should be preserved. I just committed 
 the fix prompted by Lutger (thanks).
 
 Andrei

Thanks Andrei. When the next version is released I'll remove the temporary 
findRex() function from my current code.

Steve ;=)


Re: std.regexp vs std.regex [Re: RegExp.find() now crippled]

2010-11-16 Thread Steve Teale
Andrei Alexandrescu Wrote:

 
 I am sorry for the inadvertent change, it wasn't meant to change 
 semantics of existing code. I'm not sure whether one of my unrelated 
 64-bit changes messed things up. You may want to file a bug report.
 
 There are a number of good reasons for which I was compelled to split 
 std.regex from std.regexp. I'm sure you or others would have found them 
 just as compelling if you saw things the same way.
 
 Phobos 1 has experimented in std.string and std.regexp with juxtaposing 
 APIs of various languages (PHP, Ruby, Python). The reasoning was that 
 people familiar with either of those languages could feel right at home 
 by using APIs with similar nomenclatures and semantics. The result was 
 some strange bedfellows in std.string such as column or capwords and 
 an outright mess in std.regexp. The interface of std.regexp is without a 
 doubt the worst I've ever seen, by a long shot. I have never been able 
 to use it without poring through the documentation _several times_ and 
 without confirming to myself via a small test case that I'm doing the 
 right thing.
 
 The simplest problem is this: std.regexp uses the words exec, find, 
 match, search, and test - all to mean regular expression matching. 
 There is absolutely no logic to how meanings are ascribed to words, and 
 there is absolutely no recourse than rote memorization of various 
 arbitrary decisions.
 
 The resulting FrankenAPI is likely familiar to anyone except those 
 who've actually spent time learning it, in spite of it trying to be 
 familiar to anyone.
 
 So I spawned std.regex in an attempt to sanitize the API (I made minor, 
 if any, changes to the engine; I am in fact having significant trouble 
 maintaining it). The advantages of std.regex are:
 
 * No more class definition. Nobody is supposed to inherit RegExp anyway 
 so it's useless to brand the object as a class.
 
 * Engine is separated from matches, which means that engines can be 
 memoized for efficiency. Currently regex() only memoizes the last engine.
 
 * The new engine works with any character size.
 
 * Simpler API: create a regex, call match() against that regex and a 
 string, look at the resulting RegexMatch object.
 
 If this all annoys you more than the old API, I will need to disagree. 
 If you have suggestions on how std.regex can be improved, I'm all ears.
 
 
 Andrei

Andrei,

Maybe it is time that the structure of the standard library became more 
generalized. At the moment we have std... and core...

Perhaps we need another branch in the hierarchy, like ranges... Then there 
could be a std.range module that was the gateway into ranges... The library 
could then expand in an orderly fashion, with a wider range of users becoming 
responsible for the maintenance of of different branches against changes in the 
language, not against changes in fashion.

Then you could have ranges.regex, that suits you, and the people who were happy 
with the status quo, could continue to use std.regexp, which should continue to 
behave like it did in DMD2.029 or whatever it was when I wrote my 'legacy' code.

The current system, where modules of the library can get arbitrarily deprecated 
and at some point removed because they are unfashionable, is very unfriendly.

I recognize that you are young, hyper-intelligent, and motivated toward fame. 
But there are other users, like me, who are older, but not senile, and have 
more conservative attitudes, including the desire to use code they wrote in the 
past at some point in the future.

Steve



Re: std.regexp vs std.regex [Re: RegExp.find() now crippled]

2010-11-16 Thread Steven Schveighoffer
On Tue, 16 Nov 2010 13:16:13 -0500, Steve Teale  
steve.te...@britseyeview.com wrote:


Andrei,

Maybe it is time that the structure of the standard library became more  
generalized. At the moment we have std... and core...


Perhaps we need another branch in the hierarchy, like ranges... Then  
there could be a std.range module that was the gateway into ranges...  
The library could then expand in an orderly fashion, with a wider range  
of users becoming responsible for the maintenance of of different  
branches against changes in the language, not against changes in fashion.


Then you could have ranges.regex, that suits you, and the people who  
were happy with the status quo, could continue to use std.regexp, which  
should continue to behave like it did in DMD2.029 or whatever it was  
when I wrote my 'legacy' code.


The current system, where modules of the library can get arbitrarily  
deprecated and at some point removed because they are unfashionable, is  
very unfriendly.


I recognize that you are young, hyper-intelligent, and motivated toward  
fame. But there are other users, like me, who are older, but not senile,  
and have more conservative attitudes, including the desire to use code  
they wrote in the past at some point in the future.


The standard library should not have something to please everyone.  If  
there is 5 different styles to do the same thing, it will be a failure.


Can you just copy std.regex from 2.029 and compile it in your project?   
I.e. instead of phobos adding range branch for the new range style, you  
add branch Teale for your style and copy what you like in there.  Then you  
have what you want (may take a little effort on your part, but then you  
control the results).


Also, 2.029 is still available via download, you can still use it.

-Steve


Re: std.regexp vs std.regex [Re: RegExp.find() now crippled]

2010-11-16 Thread Steve Teale
Steven Schveighoffer Wrote:

 On Tue, 16 Nov 2010 13:16:13 -0500, Steve Teale  
 steve.te...@britseyeview.com wrote:
 
  Andrei,
 
  Maybe it is time that the structure of the standard library became more  
  generalized. At the moment we have std... and core...
 
  Perhaps we need another branch in the hierarchy, like ranges... Then  
  there could be a std.range module that was the gateway into ranges...  
  The library could then expand in an orderly fashion, with a wider range  
  of users becoming responsible for the maintenance of of different  
  branches against changes in the language, not against changes in fashion.
 
  Then you could have ranges.regex, that suits you, and the people who  
  were happy with the status quo, could continue to use std.regexp, which  
  should continue to behave like it did in DMD2.029 or whatever it was  
  when I wrote my 'legacy' code.
 
  The current system, where modules of the library can get arbitrarily  
  deprecated and at some point removed because they are unfashionable, is  
  very unfriendly.
 
  I recognize that you are young, hyper-intelligent, and motivated toward  
  fame. But there are other users, like me, who are older, but not senile,  
  and have more conservative attitudes, including the desire to use code  
  they wrote in the past at some point in the future.
 
 The standard library should not have something to please everyone.  If  
 there is 5 different styles to do the same thing, it will be a failure.
 
 Can you just copy std.regex from 2.029 and compile it in your project?   
 I.e. instead of phobos adding range branch for the new range style, you  
 add branch Teale for your style and copy what you like in there.  Then you  
 have what you want (may take a little effort on your part, but then you  
 control the results).
 
 Also, 2.029 is still available via download, you can still use it.
 
 -Steve

Yes Steve, of course I can, but other much more popular languages like for 
instance PHP seem to do OK with the suit-everyone style.

I am just upset that code I put a lot of effort into gets broken because 
somebody else does not like the style of the library.

Which should be preserved - style, or substance?

Steve



Re: std.regexp vs std.regex [Re: RegExp.find() now crippled]

2010-11-16 Thread sybrandy

On 11/16/2010 01:30 PM, Steven Schveighoffer wrote:

On Tue, 16 Nov 2010 13:16:13 -0500, Steve Teale
steve.te...@britseyeview.com wrote:


Andrei,

Maybe it is time that the structure of the standard library became
more generalized. At the moment we have std... and core...

Perhaps we need another branch in the hierarchy, like ranges... Then
there could be a std.range module that was the gateway into ranges...
The library could then expand in an orderly fashion, with a wider
range of users becoming responsible for the maintenance of of
different branches against changes in the language, not against
changes in fashion.

Then you could have ranges.regex, that suits you, and the people who
were happy with the status quo, could continue to use std.regexp,
which should continue to behave like it did in DMD2.029 or whatever it
was when I wrote my 'legacy' code.

The current system, where modules of the library can get arbitrarily
deprecated and at some point removed because they are unfashionable,
is very unfriendly.

I recognize that you are young, hyper-intelligent, and motivated
toward fame. But there are other users, like me, who are older, but
not senile, and have more conservative attitudes, including the desire
to use code they wrote in the past at some point in the future.


The standard library should not have something to please everyone. If
there is 5 different styles to do the same thing, it will be a failure.

Can you just copy std.regex from 2.029 and compile it in your project?
I.e. instead of phobos adding range branch for the new range style, you
add branch Teale for your style and copy what you like in there. Then
you have what you want (may take a little effort on your part, but then
you control the results).

Also, 2.029 is still available via download, you can still use it.

-Steve


This actually sounds interesting.  If I'm understanding things right, 
std.range.* would provide a range interface to specific libraries, such 
as regex.  So, in theory, there could be different interfaces to the 
same functionality.  E.g. std.range.regex, std.oo.regex, and 
std.proc.regex for a range interface, a OO interface, or a procedural 
interface.  Underneath, you could have the same core functionality, but 
people can access it in the way they feel most comfortable or that 
better fits the design of the program being written.  As new paradigms 
are invented, they can be added as well and be based on the existing 
interfaces.


Is this something we want to do?  Don't know.  I don't even know how 
feasible it is.  However, I do like the concept and if the goal is to 
make the language as friendly as possible, perhaps it should be looked 
into.  There's the chance that it will cause some confusion, but how 
much will actually occur?


The biggest issue I see is having certain libraries that don't fit well 
into all of the different paradigms.  E.g. a date library can have a 
nice OO interface and a nice procedural interface, but it doesn't make 
much sense to have a range interface.


Anyway, food for thought.

Casey


Re: std.regexp vs std.regex [Re: RegExp.find() now crippled]

2010-11-16 Thread Steven Schveighoffer
On Tue, 16 Nov 2010 13:46:48 -0500, Steve Teale  
steve.te...@britseyeview.com wrote:



Steven Schveighoffer Wrote:


On Tue, 16 Nov 2010 13:16:13 -0500, Steve Teale
steve.te...@britseyeview.com wrote:

 Andrei,

 Maybe it is time that the structure of the standard library became  
more

 generalized. At the moment we have std... and core...

 Perhaps we need another branch in the hierarchy, like ranges... Then
 there could be a std.range module that was the gateway into ranges...
 The library could then expand in an orderly fashion, with a wider  
range

 of users becoming responsible for the maintenance of of different
 branches against changes in the language, not against changes in  
fashion.


 Then you could have ranges.regex, that suits you, and the people who
 were happy with the status quo, could continue to use std.regexp,  
which

 should continue to behave like it did in DMD2.029 or whatever it was
 when I wrote my 'legacy' code.

 The current system, where modules of the library can get arbitrarily
 deprecated and at some point removed because they are unfashionable,  
is

 very unfriendly.

 I recognize that you are young, hyper-intelligent, and motivated  
toward
 fame. But there are other users, like me, who are older, but not  
senile,

 and have more conservative attitudes, including the desire to use code
 they wrote in the past at some point in the future.

The standard library should not have something to please everyone.  If
there is 5 different styles to do the same thing, it will be a failure.

Can you just copy std.regex from 2.029 and compile it in your project?
I.e. instead of phobos adding range branch for the new range style, you
add branch Teale for your style and copy what you like in there.  Then  
you

have what you want (may take a little effort on your part, but then you
control the results).

Also, 2.029 is still available via download, you can still use it.

-Steve


Yes Steve, of course I can, but other much more popular languages like  
for instance PHP seem to do OK with the suit-everyone style.


I don't object to having multiple styles to do things, I even maintain a  
library (dcollections) that is not even close to the style of  
std.container.  I just object to everything being included in the standard  
library.  The standard library should do things one way, and if you want  
something different, use an add-on library.


I'm guessing you are referring to php's pcre vs posix regex?  I think  
posix is marked as deprecated...


I am just upset that code I put a lot of effort into gets broken because  
somebody else does not like the style of the library.


Well, the library isn't finished.  As much as I understand your pain, I  
also don't think phobos should be write-only.  We should not be stuck with  
mistakes or designs of the past until we have stated the library is  
released, and then we can deal with backwards compatibility in a  
reasonable way.  Until then, you shouldn't expect everything to be set in  
stone.  Sorry if this is confusing or annoying.


That being said, if std.regexp is broken, I don't think it was  
intentional.  In fact, in the bug report, someone mentions a one-line fix,  
does that solve your problem?  AFAIK, regexp is not even deprecated yet,  
which means it should be supported.  I think Andrei said as much.



Which should be preserved - style, or substance?


substance.  AFAIK, substance is preserved, or am I misunderstanding you?

-Steve


Re: std.regexp vs std.regex [Re: RegExp.find() now crippled]

2010-11-16 Thread Andrei Alexandrescu

On 11/16/10 10:16 AM, Steve Teale wrote:

Andrei Alexandrescu Wrote:



I am sorry for the inadvertent change, it wasn't meant to change
semantics of existing code. I'm not sure whether one of my unrelated
64-bit changes messed things up. You may want to file a bug report.

There are a number of good reasons for which I was compelled to split
std.regex from std.regexp. I'm sure you or others would have found them
just as compelling if you saw things the same way.

Phobos 1 has experimented in std.string and std.regexp with juxtaposing
APIs of various languages (PHP, Ruby, Python). The reasoning was that
people familiar with either of those languages could feel right at home
by using APIs with similar nomenclatures and semantics. The result was
some strange bedfellows in std.string such as column or capwords and
an outright mess in std.regexp. The interface of std.regexp is without a
doubt the worst I've ever seen, by a long shot. I have never been able
to use it without poring through the documentation _several times_ and
without confirming to myself via a small test case that I'm doing the
right thing.

The simplest problem is this: std.regexp uses the words exec, find,
match, search, and test - all to mean regular expression matching.
There is absolutely no logic to how meanings are ascribed to words, and
there is absolutely no recourse than rote memorization of various
arbitrary decisions.

The resulting FrankenAPI is likely familiar to anyone except those
who've actually spent time learning it, in spite of it trying to be
familiar to anyone.

So I spawned std.regex in an attempt to sanitize the API (I made minor,
if any, changes to the engine; I am in fact having significant trouble
maintaining it). The advantages of std.regex are:

* No more class definition. Nobody is supposed to inherit RegExp anyway
so it's useless to brand the object as a class.

* Engine is separated from matches, which means that engines can be
memoized for efficiency. Currently regex() only memoizes the last engine.

* The new engine works with any character size.

* Simpler API: create a regex, call match() against that regex and a
string, look at the resulting RegexMatch object.

If this all annoys you more than the old API, I will need to disagree.
If you have suggestions on how std.regex can be improved, I'm all ears.


Andrei


Andrei,

Maybe it is time that the structure of the standard library became
more generalized. At the moment we have std... and core...

Perhaps we need another branch in the hierarchy, like ranges... Then
there could be a std.range module that was the gateway into ranges...
The library could then expand in an orderly fashion, with a wider
range of users becoming responsible for the maintenance of of
different branches against changes in the language, not against
changes in fashion.

Then you could have ranges.regex, that suits you, and the people who
were happy with the status quo, could continue to use std.regexp,
which should continue to behave like it did in DMD2.029 or whatever
it was when I wrote my 'legacy' code.


I think that's not a good design. Ranges are a cross-cutting 
abstraction. One wouldn't put all code using exception under 
std.exceptions or code using floating point under std.floating_point. 
Better, ranges, exceptions, or floating point should be used wherever it 
makes sense to use them.



The current system, where modules of the library can get arbitrarily
deprecated and at some point removed because they are unfashionable,
is very unfriendly.


I agree we need to have a rather long deprecation schedule. Fashionable 
has, however, little to do with the rationale for deprecation. You may 
want to tune to the Phobos developers' mailing list for more details.



I recognize that you are young, hyper-intelligent, and motivated
toward fame.


I have enumerated a list of technical reasons for which std.regexp is 
inadequate, followed by a list of improvements brought about by 
std.regex. Ranges are nowhere on that list, nor is being fashionable. 
It's all good old design stuff that I'm sure you have down better than 
me: make an API small and simple, separate concerns (engine/matches), 
use the right tool for the job (struct not class), generalize within 
reason (character width).


Would have been great to have a discussion along those lines. Instead, I 
see you chose to ignore all technical arguments and go with a 
presupposition, no matter how assuming and stereotypical.



But there are other users, like me, who are older, but not senile,
and have more conservative attitudes, including the desire to use
code they wrote in the past at some point in the future.


Backward compatibility is indeed important, and again we need to have a 
long deprecation schedule. At the same time, I think there are much more 
many users in D's future than in its past, and I cannot inflict 
std.regexp on them.



Andrei


Re: std.regexp vs std.regex [Re: RegExp.find() now crippled]

2010-11-16 Thread Steve Teale
Steven Schveighoffer Wrote:

 
 I'm guessing you are referring to php's pcre vs posix regex?  I think  
 posix is marked as deprecated...
 
Steve,

No. I just meant that the library that comes with PHP seems happy to provide 
different ways of doing the same thing, as in for example, CURL, DOMDocument, 
and standard file operation wrappers.

I've been following D on and off for about seven or eight years now, so I don't 
subscribe too much to the 'when it's finished' argument. By now, it needs to 
work for real projects.

But I've had my gripe - I'll shut up for now.

Steve



Re: std.regexp vs std.regex [Re: RegExp.find() now crippled]

2010-11-16 Thread Jonathan M Davis
On Tuesday, November 16, 2010 10:30:03 Steven Schveighoffer wrote:
 The standard library should not have something to please everyone.  If
 there is 5 different styles to do the same thing, it will be a failure.

Agreed. Ideally, the standard library would be very uniform in approach. That 
makes it easier to learn and use. If it's schizophrenic about it's approach - 
especially if it has multiple ways of doing everything - then it's going to be 
much harder to learn and use. Everyone would be asking why you'd choose one way 
over another and what the differences between them are. It would just cause 
confusion.

Ranges are a key element of how Phobos does things in D2. The truth of the 
matter is that if you want to effectively use Phobos, you're going to have to 
use 
ranges. If ranges aren't appropriate for a particular module or problem, then 
they shouldn't be used, but Phobos is generally being built around using them, 
and the more of Phobos which functions in essentially the same way, the easier 
it will be to understand, learn, and use.

The old code is indeed available for modules which are going to be 
deprecated/removed, and the license is usually Boost, so you're pretty free to 
do what you want with it if you prefer it. And there's nothing wrong with 
creating your own libraries if you'd prefer. Plenty of folks have done that in 
the past.

The standard library needs to be fairly uniform in approach, however, and some 
of the current modules are older and don't follow that approach or have 
licensing or design issues which were not addressed in the past. Once all of 
those modules have been updated, replaced, or removed, Phobos will be more 
uniform and its parts will interact better. And over time, it's unlikely that 
modules will continue to be deprecated like that. It's happening now because D2 
Phobos is still fairly early in its evolution.

- Jonathan M Davis


Re: std.regexp vs std.regex [Re: RegExp.find() now crippled]

2010-11-16 Thread Andrei Alexandrescu

On 11/16/10 10:46 AM, Steve Teale wrote:

Steven Schveighoffer Wrote:


On Tue, 16 Nov 2010 13:16:13 -0500, Steve Teale
steve.te...@britseyeview.com  wrote:


Andrei,

Maybe it is time that the structure of the standard library became more
generalized. At the moment we have std... and core...

Perhaps we need another branch in the hierarchy, like ranges... Then
there could be a std.range module that was the gateway into ranges...
The library could then expand in an orderly fashion, with a wider range
of users becoming responsible for the maintenance of of different
branches against changes in the language, not against changes in fashion.

Then you could have ranges.regex, that suits you, and the people who
were happy with the status quo, could continue to use std.regexp, which
should continue to behave like it did in DMD2.029 or whatever it was
when I wrote my 'legacy' code.

The current system, where modules of the library can get arbitrarily
deprecated and at some point removed because they are unfashionable, is
very unfriendly.

I recognize that you are young, hyper-intelligent, and motivated toward
fame. But there are other users, like me, who are older, but not senile,
and have more conservative attitudes, including the desire to use code
they wrote in the past at some point in the future.


The standard library should not have something to please everyone.  If
there is 5 different styles to do the same thing, it will be a failure.

Can you just copy std.regex from 2.029 and compile it in your project?
I.e. instead of phobos adding range branch for the new range style, you
add branch Teale for your style and copy what you like in there.  Then you
have what you want (may take a little effort on your part, but then you
control the results).

Also, 2.029 is still available via download, you can still use it.

-Steve


Yes Steve, of course I can, but other much more popular languages like for 
instance PHP seem to do OK with the suit-everyone style.

I am just upset that code I put a lot of effort into gets broken because 
somebody else does not like the style of the library.

Which should be preserved - style, or substance?

Steve


It's probably common courtesy that should be preserved. I just committed 
the fix prompted by Lutger (thanks).


Andrei


Re: std.regexp vs std.regex [Re: RegExp.find() now crippled]

2010-11-16 Thread spir
On Tue, 16 Nov 2010 11:24:02 -0800
Jonathan M Davis jmdavisp...@gmx.com wrote:

 On Tuesday, November 16, 2010 10:30:03 Steven Schveighoffer wrote:
  The standard library should not have something to please everyone.  If
  there is 5 different styles to do the same thing, it will be a failure.
 
 Agreed. Ideally, the standard library would be very uniform in approach. That 
 makes it easier to learn and use. If it's schizophrenic about it's approach - 
 especially if it has multiple ways of doing everything - then it's going to 
 be 
 much harder to learn and use. Everyone would be asking why you'd choose one 
 way 
 over another and what the differences between them are. It would just cause 
 confusion.
 
 Ranges are a key element of how Phobos does things in D2. The truth of the 
 matter is that if you want to effectively use Phobos, you're going to have to 
 use 
 ranges. If ranges aren't appropriate for a particular module or problem, then 
 they shouldn't be used, but Phobos is generally being built around using 
 them, 
 and the more of Phobos which functions in essentially the same way, the 
 easier 
 it will be to understand, learn, and use.
 
 The old code is indeed available for modules which are going to be 
 deprecated/removed, and the license is usually Boost, so you're pretty free 
 to 
 do what you want with it if you prefer it. And there's nothing wrong with 
 creating your own libraries if you'd prefer. Plenty of folks have done that 
 in 
 the past.
 
 The standard library needs to be fairly uniform in approach, however, and 
 some 
 of the current modules are older and don't follow that approach or have 
 licensing or design issues which were not addressed in the past. Once all of 
 those modules have been updated, replaced, or removed, Phobos will be more 
 uniform and its parts will interact better. And over time, it's unlikely that 
 modules will continue to be deprecated like that. It's happening now because 
 D2 
 Phobos is still fairly early in its evolution.
 
 - Jonathan M Davis

+++


Denis
-- -- -- -- -- -- --
vit esse estrany ☣

spir.wikidot.com



Re: std.regexp vs std.regex [Re: RegExp.find() now crippled]

2010-11-16 Thread bearophile
Steve Teale:

 The current system, where modules of the library can get arbitrarily 
 deprecated and at some point removed because they are unfashionable, is very 
 unfriendly.

We are in the initial phase of Phobos develpment, so frequent large changes are 
expected. Surely one year from now Phobos will be more careful in its changes.

The other things you have said are too much silly to comment.

And thank you to Andrei to the improvements to the D regex API, I'd love to see 
other good people give other similar improvements to Phobos :-) We must be 
grateful with Andrei for improving that API.

Bye,
bearophile


Re: std.regexp vs std.regex [Re: RegExp.find() now crippled]

2010-11-16 Thread Steven Schveighoffer
On Tue, 16 Nov 2010 14:24:32 -0500, Steve Teale  
steve.te...@britseyeview.com wrote:



Steven Schveighoffer Wrote:




I'm guessing you are referring to php's pcre vs posix regex?  I think
posix is marked as deprecated...


Steve,

No. I just meant that the library that comes with PHP seems happy to  
provide different ways of doing the same thing, as in for example, CURL,  
DOMDocument, and standard file operation wrappers.


I don't know much about php's lib because I haven't used it enough to know  
the tendencies of library acceptance.  But two different APIs for doing  
regex strike me as way more overlapping than CURL and file operation  
wrappers.


I've been following D on and off for about seven or eight years now, so  
I don't subscribe too much to the 'when it's finished' argument. By now,  
it needs to work for real projects.


D2 is much younger than that.  D1 is complete (to use the term loosely),  
if you want to use that, its API will not change.  There are quite a few  
projects using D1 for real work (I wrote one a few years ago).


D2 is changing monthly, to the point where newer versions of phobos  
require newer versions of the compiler due to compiler bugs fixed or  
features added.  I can't see how it can be considered finished.


Don has recently brought up on the mailing list that we should identify  
the status of each module in the ddoc so people can understand the plans  
for that module before basing their work on it.  I can see how spending  
lots of time working with something only to have it disappear can be  
hugely frustrating.


-Steve


Re: std.regexp vs std.regex [Re: RegExp.find() now crippled]

2010-11-15 Thread Steve Teale
KennyTM~ Wrote:

 On Nov 15, 10 14:58, Steve Teale wrote:
  Some time ago in phobos2, the following:
 
  RegExp wsr = RegExp((\\s+));
  int p = wsr.find(thingie att1=\whatever\);
  writefln(%s|%s|%s %d,wsr.pre(),  wsr.match(1), wsr.post(), p);
 
  would print:
 
  thingie| |att1=whatever  7
 
  Now it prints
 
  thingie| |att1=whatever  1
 
  The new return value is pretty useless, equivalent to returning a bool. It 
  seems to me that the 'find' verb's subject should be the string, not the 
  RegExp object.
 
  This looks like a case of the implementation being changed to match the 
  documentation, when in fact it would have been better to change the 
  documentation to match the implementation.
 
  Either that, or RegExp should have an indexOf method that behaves like 
  string.indexOf.
 
  Steve
 
 
 Isn't std.regexp replaced by std.regex? Why are both of them still in 
 Phobos 2?
 
 (oh, and std.regex is missing a documented .index (= .src_start) property.)

I guess std.regexp is still there because not all of us necessarily want to 
iterate a range to simply find out the position of the first whitespace in a 
string. Part of the expressiveness of languages is that one should be free to 
use the style that suits, and not have to read the documentation every time one 
uses it. Give me options in Phobos by all means.

D2 is not going to succeed by forcing its users to use unfamiliar, and maybe 
not yet very fashionable constructions.

I'm pissed off because this change broke a lot of my code, which I had not used 
for some time, but now have a paying customer for. The code did not break 
because of D language evolution. It broke because somebody decided they did not 
like the style of std.regexp.  All I wanted was plain old regular expressions, 
similar to JavaScript, or PHP, or other popular languages, and std.regexp did 
that pretty well at one time.

Steve



Re: std.regexp vs std.regex [Re: RegExp.find() now crippled]

2010-11-15 Thread Jesse Phillips
Steve Teale Wrote:

 I guess std.regexp is still there because not all of us necessarily want to 
 iterate a range to simply find out the position of the first whitespace in a 
 string.

I'm pretty sure it is still there for the same reason many are, trying to 
figure out when it should be removed.

 Part of the expressiveness of languages is that one should be free to use the 
 style that suits, and not have to read the documentation every time one uses 
 it. Give me options in Phobos by all means.

That has nothing to do with expressiveness, familiarity/easy of use sure. 

 D2 is not going to succeed by forcing its users to use unfamiliar, and maybe 
 not yet very fashionable constructions.

Not providing, does not mean forcing to use.

 I'm pissed off because this change broke a lot of my code, which I had not 
 used for some time, but now have a paying customer for. The code did not 
 break because of D language evolution. It broke because somebody decided they 
 did not like the style of std.regexp.  All I wanted was plain old regular 
 expressions, similar to JavaScript, or PHP, or other popular languages, and 
 std.regexp did that pretty well at one time.

I agree, there is no reason a module that is scheduled for deletion should have 
changes made that would cause existing code to break. But looking at the 
history, there doesn't seem to be such changes for at least the last year. The 
only questionable change (one that wasn't just type changes to auto/spacing) 
happened 3 months ago, but I don't think the behavior was intended to change:

http://www.dsource.org/projects/phobos/changeset/1923/trunk/phobos/std/regexp.d


Re: std.regexp vs std.regex [Re: RegExp.find() now crippled]

2010-11-15 Thread Andrei Alexandrescu

On 11/15/10 7:55 AM, Steve Teale wrote:

KennyTM~ Wrote:


On Nov 15, 10 14:58, Steve Teale wrote:

Some time ago in phobos2, the following:

 RegExp wsr = RegExp((\\s+));
 int p = wsr.find(thingie att1=\whatever\);
 writefln(%s|%s|%s %d,wsr.pre(),  wsr.match(1), wsr.post(), p);

would print:

thingie| |att1=whatever   7

Now it prints

thingie| |att1=whatever   1

The new return value is pretty useless, equivalent to returning a bool. It 
seems to me that the 'find' verb's subject should be the string, not the RegExp 
object.

This looks like a case of the implementation being changed to match the 
documentation, when in fact it would have been better to change the 
documentation to match the implementation.

Either that, or RegExp should have an indexOf method that behaves like 
string.indexOf.

Steve



Isn't std.regexp replaced by std.regex? Why are both of them still in
Phobos 2?

(oh, and std.regex is missing a documented .index (= .src_start) property.)


I guess std.regexp is still there because not all of us necessarily
want to iterate a range to simply find out the position of the first
whitespace in a string. Part of the expressiveness of languages is
that one should be free to use the style that suits, and not have to
read the documentation every time one uses it. Give me options in
Phobos by all means.

D2 is not going to succeed by forcing its users to use unfamiliar,
and maybe not yet very fashionable constructions.

I'm pissed off because this change broke a lot of my code, which I
had not used for some time, but now have a paying customer for. The
code did not break because of D language evolution. It broke because
somebody decided they did not like the style of std.regexp.  All I
wanted was plain old regular expressions, similar to JavaScript, or
PHP, or other popular languages, and std.regexp did that pretty well
at one time.

Steve


I am sorry for the inadvertent change, it wasn't meant to change 
semantics of existing code. I'm not sure whether one of my unrelated 
64-bit changes messed things up. You may want to file a bug report.


There are a number of good reasons for which I was compelled to split 
std.regex from std.regexp. I'm sure you or others would have found them 
just as compelling if you saw things the same way.


Phobos 1 has experimented in std.string and std.regexp with juxtaposing 
APIs of various languages (PHP, Ruby, Python). The reasoning was that 
people familiar with either of those languages could feel right at home 
by using APIs with similar nomenclatures and semantics. The result was 
some strange bedfellows in std.string such as column or capwords and 
an outright mess in std.regexp. The interface of std.regexp is without a 
doubt the worst I've ever seen, by a long shot. I have never been able 
to use it without poring through the documentation _several times_ and 
without confirming to myself via a small test case that I'm doing the 
right thing.


The simplest problem is this: std.regexp uses the words exec, find, 
match, search, and test - all to mean regular expression matching. 
There is absolutely no logic to how meanings are ascribed to words, and 
there is absolutely no recourse than rote memorization of various 
arbitrary decisions.


The resulting FrankenAPI is likely familiar to anyone except those 
who've actually spent time learning it, in spite of it trying to be 
familiar to anyone.


So I spawned std.regex in an attempt to sanitize the API (I made minor, 
if any, changes to the engine; I am in fact having significant trouble 
maintaining it). The advantages of std.regex are:


* No more class definition. Nobody is supposed to inherit RegExp anyway 
so it's useless to brand the object as a class.


* Engine is separated from matches, which means that engines can be 
memoized for efficiency. Currently regex() only memoizes the last engine.


* The new engine works with any character size.

* Simpler API: create a regex, call match() against that regex and a 
string, look at the resulting RegexMatch object.


If this all annoys you more than the old API, I will need to disagree. 
If you have suggestions on how std.regex can be improved, I'm all ears.



Andrei


Re: std.regexp vs std.regex [Re: RegExp.find() now crippled]

2010-11-15 Thread Lutger Blijdestijn
Steve Teale wrote:

 KennyTM~ Wrote:
 
 On Nov 15, 10 14:58, Steve Teale wrote:
  Some time ago in phobos2, the following:
 
  RegExp wsr = RegExp((\\s+));
  int p = wsr.find(thingie att1=\whatever\);
  writefln(%s|%s|%s %d,wsr.pre(),  wsr.match(1), wsr.post(), p);
 
  would print:
 
  thingie| |att1=whatever  7
 
  Now it prints
 
  thingie| |att1=whatever  1
 
  The new return value is pretty useless, equivalent to returning a bool.
  It seems to me that the 'find' verb's subject should be the string, not
  the RegExp object.
 
  This looks like a case of the implementation being changed to match the
  documentation, when in fact it would have been better to change the
  documentation to match the implementation.
 
  Either that, or RegExp should have an indexOf method that behaves like
  string.indexOf.
 
  Steve
 
 
 Isn't std.regexp replaced by std.regex? Why are both of them still in
 Phobos 2?
 
 (oh, and std.regex is missing a documented .index (= .src_start)
 property.)
 
 I guess std.regexp is still there because not all of us necessarily want
 to iterate a range to simply find out the position of the first whitespace
 in a string. Part of the expressiveness of languages is that one should be
 free to use the style that suits, and not have to read the documentation
 every time one uses it. Give me options in Phobos by all means.
 
 D2 is not going to succeed by forcing its users to use unfamiliar, and
 maybe not yet very fashionable constructions.
 
 I'm pissed off because this change broke a lot of my code, which I had not
 used for some time, but now have a paying customer for. The code did not
 break because of D language evolution. It broke because somebody decided
 they did not like the style of std.regexp.  All I wanted was plain old
 regular expressions, similar to JavaScript, or PHP, or other popular
 languages, and std.regexp did that pretty well at one time.
 
 Steve

I'm pretty sure that can be filed as a bug. The behavior is still documented 
as returning index of match, and the standalone std.regexp.find works that 
way. Patch:

@@ -1045,7 +1045,7 @@
 {
 int i = test(string);
 if (i)
-i = pmatch[0].rm_so != 0;
+i = pmatch[0].rm_so;
 else
 i = -1; // no match
 return i;