Re: [GSOC] regular expressions beta is here

2011-08-17 Thread bearophile
> > thanks for uncovering it again, you may as well file it.
> 
> OK, I'll add it to Bugzilla.

http://d.puremagic.com/issues/show_bug.cgi?id=6518


Re: [GSOC] regular expressions beta is here

2011-08-17 Thread bearophile
Dmitry Olshansky:

> Yes, that's a bug. But it's not a regression,

I think it's a DMD regression, probably introduced with the recent changes in 
switch semantics. DMD 2.042 doesn't have this bug.


> I assume you started to compile with -w,

I suggest Phobos devs to use -w too.


> thanks for uncovering it again, you may as well file it.

OK, I'll add it to Bugzilla.

Bye,
bearophile


Re: [GSOC] regular expressions beta is here

2011-08-17 Thread Dmitry Olshansky

On 17.08.2011 3:47, bearophile wrote:

Dmitry Olshansky:


To get a small no-crap-included beta package see download section of
https://github.com/blackwhale/FReD for .7zs.

I have not patched DMD, but it gives me some problem here:

void parseFlags(S)(S flags)
{
 foreach(ch; flags)//flags are ASCII anyway
 {
 switch(ch)
 {

 foreach(i, op; __traits(allMembers, RegexOption))
 {
 case RegexOptionNames[i]:
 if(re_flags&  mixin("RegexOption."~op))
 throw new RegexException(text("redundant flag 
specified: ",ch));
 re_flags |= mixin("RegexOption."~op);
 break;
 }
 default:
 if(__ctfe)
assert(text("unknown regex flag '",ch,"'"));
 else
 new RegexException(text("unknown regex flag '",ch,"'"));
 }


To better see the situation I have written a small test case:

import std.typetuple: TypeTuple;

enum RegexOption : uint { A, B, C } // no need to put a semicolon here

alias TypeTuple!(RegexOption.A, RegexOption.B, RegexOption.C) RegexOptionNames;

void main() {
 RegexOption ch;

 switch (ch) {
 foreach (i, op; __traits(allMembers, RegexOption))
 case RegexOptionNames[i]: break;

 default: assert(0);
 }
}


test.d(12): Error: switch case fallthrough - use 'goto case;' if intended
test.d(12): Error: switch case fallthrough - use 'goto case;' if intended
test.d(12): Error: switch case fallthrough - use 'goto case;' if intended
test.d(14): Error: switch case fallthrough - use 'goto default;' if intended

This used to work, I think. The new DMD switch analysis seems to have a bug.

-
Yes, that's a bug. But it's not a regression, I assume you started to 
compile with -w, that's when it happens IIRC. I almost forgot about it, 
thanks for uncovering it again, you may as well file it.




If you want a benchmark, to compare it with other implementations, there is 
this one:
http://shootout.alioth.debian.org/debian/program.php?test=regexdna&lang=gdc&id=4


All in due time, though this one involves semi-fixed patterns, hm ... 
very promising.


--
Dmitry Olshansky



Re: [GSOC] regular expressions beta is here

2011-08-16 Thread amanda
When you have Herpes, HIV/AIDS, hpv,or any other STD, it can feel like you are 
all alone in the world  DatingHerpesSingles.com is a place where you didn't 
have to worry about being rejected   Just feel free to chat, share stories, 
make friends in your local area.


Re: [GSOC] regular expressions beta is here

2011-08-16 Thread bearophile
Dmitry Olshansky:

> > To get a small no-crap-included beta package see download section of 
> > https://github.com/blackwhale/FReD for .7zs.

I have not patched DMD, but it gives me some problem here:

void parseFlags(S)(S flags)
{
foreach(ch; flags)//flags are ASCII anyway
{
switch(ch)
{

foreach(i, op; __traits(allMembers, RegexOption))
{
case RegexOptionNames[i]:
if(re_flags & mixin("RegexOption."~op))
throw new RegexException(text("redundant flag 
specified: ",ch));
re_flags |= mixin("RegexOption."~op);
break;
}
default:
if(__ctfe)
   assert(text("unknown regex flag '",ch,"'"));
else
new RegexException(text("unknown regex flag '",ch,"'"));
}


To better see the situation I have written a small test case:

import std.typetuple: TypeTuple;

enum RegexOption : uint { A, B, C } // no need to put a semicolon here

alias TypeTuple!(RegexOption.A, RegexOption.B, RegexOption.C) RegexOptionNames;

void main() {
RegexOption ch;

switch (ch) {
foreach (i, op; __traits(allMembers, RegexOption))
case RegexOptionNames[i]: break;

default: assert(0);
}
}


test.d(12): Error: switch case fallthrough - use 'goto case;' if intended
test.d(12): Error: switch case fallthrough - use 'goto case;' if intended
test.d(12): Error: switch case fallthrough - use 'goto case;' if intended
test.d(14): Error: switch case fallthrough - use 'goto default;' if intended

This used to work, I think. The new DMD switch analysis seems to have a bug.

-

If you want a benchmark, to compare it with other implementations, there is 
this one:
http://shootout.alioth.debian.org/debian/program.php?test=regexdna&lang=gdc&id=4

Bye,
bearophile


Re: [GSOC] regular expressions beta is here

2011-08-16 Thread Dmitry Olshansky

On 10.08.2011 14:42, Dmitry Olshansky wrote:
In case I failed to mention it before, I m working on the project 
codenamed FReD that is aimed at ~100%* source level compatible 
overhaul of std.regex, that uses better implementation techniques, 
provides modern Unicode support and common syntax riches.


I think it's time for a public beta release,  since it _should_ be 
ready for mainstream usage. There are some rough edges, and a couple 
issues that I'm aware of but they are nowhere in realistic use cases.


In order to avoid unexpected regressions I'd be glad if current 
std.regex users do try it for their projects/tests.
To get a small no-crap-included beta package see download section of 
https://github.com/blackwhale/FReD for .7zs.
I'll upload newer packages as bugs get exposed and fixed. 
Alternatively, if you a comfortable with git you may just git clone 
entire repo. Some helpful notes (same as README) can be found here : 
https://github.com/blackwhale/FReD/wiki/Beta-release


Caveats:
In order for it compile a tiny change to 2.054 source is needed 
(no need to recompile Phobos! it's only in templates):
patch std.algorithm.cmp according to this diff 
https://github.com/D-Programming-Language/phobos/pull/176/files#L0L4631  

and to get CTFE features working add if(!__ctfe) listed in the next 
diff on the same webpage.
(this is already upstream, so if you're using a fork of phobos just 
pull this in)


* some API problems might lead to a breaking change, though it didn't 
happen in this release


Meanwhile the new beta is up:
https://github.com/downloads/blackwhale/FReD/FReD_beta1.7z
or checkout "stable" branch https://github.com/blackwhale/FReD/tree/stable
( as dawgfoto noticed  the master branch tend to break on 64-bit as I 
develop primarily on 32bit)


With prominent changes being:
- fixed a horrible memory corruption with regex having certain 
groups/backrefs  in lookaround


- no GC heap activity during matching in all engines, except as 
workaround for bug http://d.puremagic.com/issues/show_bug.cgi?id=6199


- new prefix searcher, featuring up to 40x search speed up on patterns 
with semi-fixed prefixes e.g. \b(https?|ftp|file)://\S+  and  
([0-9][0-9]?)/([0-9][0-9]?)/([0-9][0-9]([0-9][0-9])?)


- bool opCast for RegexMatch for nice "test if not empty syntax" as 
suggested by Steven


- lots of small fixes and optimizations

--
Dmitry Olshansky



Re: [GSOC] regular expressions beta is here

2011-08-12 Thread bearophile
Don:

> bearophile wrote:
>> 2) I have found many situations where I am able to solve a problem with both 
>> a
>> simple and slow brute force solver, and a complex and fast algorithm to solve
>> a problem. The little program maybe is too much slow for normal usage, but
>> it's just few lines long (especially if I use lot of std.algorithm stuff)
>> but it's much less likely to contain bugs.
>
> Sorry, but personally I don't believe that this is useful outside of toy
> examples.

This code of mine is a real-world example. This is a struct method with 
comments removed, the postcondition contains both fast loose tests and a tight 
slow O(n^2) version that thanks to std.algorithm is just 2 lines long 
(unfortunately because of DMD bug 6417 it's a bit longer than 2 lines). It's 
asymptotically slower than the fast algorithm, so I've put it into a debug{}.


void foo(in int[] p, int[] q) nothrow
in {
  assert(p.length == vectorLen);
  assert(q.length == vectorLen);
  assert(equal(p.dup.sort(), iota(1, vectorLen+1)));
} out {
  foreach (i, qi; q)
assert(qi >= 0 && qi < (vectorLen - i));
  debug foreach (j; 1 .. (q.length + 1))
assert(q[j-1] == count!((int k){ return p[k] > j; 
})(iota(countUntil(cast()p, j) + 1)));
} body {
  op[0] = &items[0];
  foreach (i, pi; p) {
items[i] = Item(pi, 0);
op[i + 1] = &items[i + 1];
  }

  foreach_reverse (k; 0 .. (lim + 1)) {
xs[0 .. ((vectorLen >> (k + 1)) + 1)] = 0;
foreach (j; 0 .. vectorLen) {
  int r = (op[j].space >> k) % 2;
  int s = op[j].space >> (k + 1);
  if (r)
xs[s]++;
  else
op[j].digit += xs[s];
}
  }

  foreach (i; 0 .. vectorLen)
q[op[i].space - 1] = op[i].digit;
}


This postcondition has caught a simple mistake I've put in the fast algorithm. 
Probably there are ways to catch the same bug with unittests too.

The ugly empty cast() inside the postcondition is another workaround, because 
countUntil doesn't work with a const p.

If you write those two postcondition lines in Python3 it becomes less noisy:

assert q == [sum(p[k] > j for k in range(p.index(j) + 1)) for j in range(1, 
len(q)+1)]

Instead of:

foreach (j; 1 .. q.length+1)
assert(q[j-1] == count!((int k){ return p[k] > j; 
})(iota(countUntil(cast()p, j) + 1)));


Here using assert(equal(q, map!...)) becomes too much puzzle-code. It's already 
too much nested.

If you program in functional-style it's hard to write lines of 70 chars. In 
Haskell too lines of code are often long.

Bye,
bearophile


Re: [GSOC] regular expressions beta is here

2011-08-12 Thread Timon Gehr

On 08/12/2011 01:31 PM, Don wrote:

bearophile wrote:

Don:


2) I have found many situations where I am able to solve a problem
with both a simple and slow brute force solver, and a complex and
fast algorithm to solve a problem. The little program maybe is too
much slow for normal usage, but it's just few lines long (especially
if I use lot of std.algorithm stuff) but it's much less likely to
contain bugs.



Sorry, but personally I don't believe that this is useful outside of
toy examples.
The question is, what bugs does it find that aren't found by a
trivial unit test?



There are two cases:
(1) it's a very tight test. In which case, it's essentially a unit test.
or (2) it's a very loose test. In which case, it doesn't find bugs.


Putting a simpler algorithm in the post-condition implements a third
possibility you are missing.

Usually unit tests verify some specific cases (you are also able to
add generic testing code in the unit test, but this is just like
moving the postcondition elsewhere).

If you put an alternative algorithm in the postcondition (under
debug{} if you want), you have some advantages:
- It's tight, because the second algorithm is supposed to always give
the same results as the function.



- It works with the real examples the program is run too, not just the
cases you have put in the unit test.


Conditions required for this to be true:
(1) the function must not be time critical;


If the difference is not an asymptotic one, it can well be time critical 
(then the debug version will just not be as responsive as would be 
desirable for a finished product, which is often the case anyways.)



(2) an alternative algorithm must exist;


If an optimized version exists, a slower one exists too.


(3) the alternative algorithm must be bug-free;


That is often trivial. Also, if it is buggy, the discrepancy will be 
caught by the contract and the bug can be fixed.



(4) the function must not have been tested properly;


Usually, large software that has been 'tested properly' still contains 
bugs. For mission critical tasks, a form of testing related to this one 
is used heavily (multiple teams implement the same specification and the 
result of each query to the software is determined by majority vote).



(5) the faulty test cases must occur during debugging (they won't be
caught during production);


Sure. This can catch eg. regressions during development, If there is a 
large team of programmers involved, contracts are more useful than if 
there is only a single developer.



(6) the programmer must remember to put the asserts in the 'out'
contract, but not put them into the body of the function.


Well, if he they are a seasoned contract programmer, this is not a 
problem at all. ;)




This doesn't leave much.



I disagree.



Sometimes you forget to add certain cases in the unittests. Putting the
test in the postcondition makes sure it always run, for all the inputs
your function is run on (unless you disable it), so you will catch the
cases you didn't think of in the unittests.





The crucial feature is, they do NOTHING except find bugs in the function
they are attached to.


They specify what the function is supposed to do, in a way that always 
is up to date because it gets checked.




In Eiffel you have the prestate too (the old), so the postcondition is
the only place where such information is usable. I hope prestate will
be added to D DbC, because it's a majob sub-feature of DbC. But I
don't agree that postconditions are useless in D.


??? Does that relate to my sentence in any way?


Yes. He says that once prestate is available, out contracts will be more 
useful. But he thinks they are already quite valuable without them.





For starters, it really needs to be a function with multiple return
values. Otherwise, you can just stick asserts just before your return
statement, and you don't need __old or any such thing.


If a function has multiple return values the out(result) helps make
sure all the return paths are verified.


That's what I said.


If the function has only one return value it helps anyway, because it
helps you not forget to verify the result.


 Why would you remember to put an assert in the postcondition, when
you didn't put it into the function?


Don wrote
> bearophile wrote:
>> If a function has multiple return values the out(result) helps make
>> sure all the return paths are verified.
>
> That's what I said.

Exactly that reason:

int foo(){
// some code
if(condition) return 37; // added after 2h of debugging
// more code
result=...;
assert(condition(result));
return result;
}

int foo()
out(result){assert(condition(result));}
body{
//some code
if(condition) return 37;
// more code
return ...;
}

it is both more convenient (you don't have to change your program logic) 
and less error-prone.


Furthermore, all other programmers on the project can immediately check 
the postcondition and rely on that it 

Re: [GSOC] regular expressions beta is here

2011-08-12 Thread Don

bearophile wrote:

Don:


2) I have found many situations where I am able to solve a problem with both a 
simple and slow brute force solver, and a complex and fast algorithm to solve a 
problem. The little program maybe is too much slow for normal usage, but it's 
just few lines long (especially if I use lot of std.algorithm stuff) but it's 
much less likely to contain bugs.



Sorry, but personally I don't believe that this is useful outside of toy 
examples.
The question is, what bugs does it find that aren't found by a trivial unit 
test?



There are two cases:
(1) it's a very tight test. In which case, it's essentially a unit test.
or (2) it's a very loose test. In which case, it doesn't find bugs.


Putting a simpler algorithm in the post-condition implements a third 
possibility you are missing.

Usually unit tests verify some specific cases (you are also able to add generic 
testing code in the unit test, but this is just like moving the postcondition 
elsewhere).

If you put an alternative algorithm in the postcondition (under debug{} if you 
want), you have some advantages:
- It's tight, because the second algorithm is supposed to always give the same 
results as the function.



- It works with the real examples the program is run too, not just the cases 
you have put in the unit test.


Conditions required for this to be true:
(1) the function must not be time critical;
(2) an alternative algorithm must exist;
(3) the alternative algorithm must be bug-free;
(4) the function must not have been tested properly;
(5) the faulty test cases must occur during debugging (they won't be 
caught during production);
(6) the programmer must remember to put the asserts in the 'out' 
contract, but not put them into the body of the function.


This doesn't leave much.


Sometimes you forget to add certain cases in the unittests. Putting the 
test in the postcondition makes sure it always run, for all the inputs 
your function is run on (unless you disable it), so you will catch the 
cases you didn't think of in the unittests.






The crucial feature is, they do NOTHING except find bugs in the function
they are attached to.


In Eiffel you have the prestate too (the old), so the postcondition is the only 
place where such information is usable. I hope prestate will be added to D DbC, 
because it's a majob sub-feature of DbC. But I don't agree that postconditions 
are useless in D.


??? Does that relate to my sentence in any way?


For starters, it really needs to be a function with multiple return
values. Otherwise, you can just stick asserts just before your return
statement, and you don't need __old or any such thing.


If a function has multiple return values the out(result) helps make sure all 
the return paths are verified.


That's what I said.


If the function has only one return value it helps anyway, because it helps you 
not forget to verify the result.


 Why would you remember to put an assert in the postcondition, when 
you didn't put it into the function?






Under what circumstances are they are more valuable than any other assert 
inside a function?


I have already given some answers.


No you haven't.


Another answer is this:


int foo(int x)
in {
// ...
}
out(result) {
auto y = computeSomething(result);
assert(y ...);
assert(y ...);
}
body {
// ...
}

The out{} helps you organize your code, separating the tests of the body from 
the postcondition tests. Also in the postcondition you are allowed to define 
new variables and call things. All this out(){} code vanishes in release mode. 
Ho do you do that with just asserts inside the body?


If you do this the asserts will vanish in release mode, but the y will be 
computed still, wasting computations (a smart compiler is able to see y is not 
used and etc, but it's not sure this optimization happens if the computation of 
y is complex and it's done in-place):

int foo(int x)
in {
// ...
}
body {
result = ...;
auto y = computeSomething(result);
assert(y ...);
assert(y ...);
return result;
}


I presume there are ways to disable the computation of y in release mode, but I 
don't want to think about them. I just stick the y computation in the 
postcondition and the compiler will take care of it.


Trivial!
Make the postcondition a nested function. (You can even make it a 
delegate literal, if it's only used in one place).


I'll explain my original statement further: If you have a theorem 
prover, then the theorem prover can use the 'out' contract in any 
function which calls that function.


Eg,
int square(int x) out { assert(result>=0); } body { return x*x; }

void foo()
{
   int q = square(-5);
   if (q < 0) {  }

}
Theorem prover knows that q>=0, even if it doesn't have access to the 
body of 'square'. So it detects unreachable code in foo().


So in this case, the 'out' contract can be used to find bugs in code 
that the author of the contract didn't write.
Otherwise, out contracts o

Re: [GSOC] regular expressions beta is here

2011-08-11 Thread Marco Leise
Am 11.08.2011, 19:56 Uhr, schrieb Adam D. Ruppe  
:



If it's worth anything, I use the out contracts in dom.d more as
checked documentation than for serious bug-finding.

For example:

Element appendChild(Element newChild)
out (ret) { assert(ret is newChild); }
body { ... }

I also use it from time to time to assert that a return value is not
null. The check itself isn't particularly useful, but I think it's
a nice bit of documentation.

Actually, IMO, in and out contracts should be in the generated
ddoc too.


I've been wondering for a while if selective unit tests could be included  
in DDOC somehow. Most of the 'examples' in the Phobos documentation look  
like they were taken right out of a unittest block blow the function. Like  
BinaryHeap in std.containers:


--

DDOC:

// Example from "Introduction to Algorithms" Cormen et al, p 146
int[] a = [ 4, 1, 3, 2, 16, 9, 10, 14, 8, 7 ];
auto h = heapify(a);
// largest element
assert(h.front == 16);
// a has the heap property
assert(equal(a, [ 16, 14, 10, 9, 8, 7, 4, 3, 2, 1 ]));

--

std/containers.d:

unittest
{
{
// example from "Introduction to Algorithms" Cormen et al., p 146
int[] a = [ 4, 1, 3, 2, 16, 9, 10, 14, 8, 7 ];
auto h = heapify(a);
assert(h.front == 16);
assert(a == [ 16, 14, 10, 8, 7, 9, 3, 2, 4, 1 ]);
auto witness = [ 16, 14, 10, 9, 8, 7, 4, 3, 2, 1 ];
for (; !h.empty; h.removeFront(), witness.popFront())
{
assert(!witness.empty);
assert(witness.front == h.front);
}
assert(witness.empty);
}
{
int[] a = [ 4, 1, 3, 2, 16, 9, 10, 14, 8, 7 ];
int[] b = new int[a.length];
BinaryHeap!(int[]) h = BinaryHeap!(int[])(b, 0);
foreach (e; a)
{
h.insert(e);
}
assert(b == [ 16, 14, 10, 8, 7, 3, 9, 1, 4, 2 ], text(b));
}
}

--

bearophile, you are the expert with the DRY buzz word ;)


Re: [GSOC] regular expressions beta is here

2011-08-11 Thread bearophile
Don:

>> 2) I have found many situations where I am able to solve a problem with both 
>> a simple and slow brute force solver, and a complex and fast algorithm to 
>> solve a problem. The little program maybe is too much slow for normal usage, 
>> but it's just few lines long (especially if I use lot of std.algorithm 
>> stuff) but it's much less likely to contain bugs.

> Sorry, but personally I don't believe that this is useful outside of toy 
> examples.
> The question is, what bugs does it find that aren't found by a trivial unit 
> test?

> There are two cases:
> (1) it's a very tight test. In which case, it's essentially a unit test.
> or (2) it's a very loose test. In which case, it doesn't find bugs.

Putting a simpler algorithm in the post-condition implements a third 
possibility you are missing.

Usually unit tests verify some specific cases (you are also able to add generic 
testing code in the unit test, but this is just like moving the postcondition 
elsewhere).

If you put an alternative algorithm in the postcondition (under debug{} if you 
want), you have some advantages:
- It's tight, because the second algorithm is supposed to always give the same 
results as the function.
- It works with the real examples the program is run too, not just the cases 
you have put in the unit test. Sometimes you forget to add certain cases in the 
unittests. Putting the test in the postcondition makes sure it always run, for 
all the inputs your function is run on (unless you disable it), so you will 
catch the cases you didn't think of in the unittests.


> The crucial feature is, they do NOTHING except find bugs in the function
> they are attached to.

In Eiffel you have the prestate too (the old), so the postcondition is the only 
place where such information is usable. I hope prestate will be added to D DbC, 
because it's a majob sub-feature of DbC. But I don't agree that postconditions 
are useless in D.


> For starters, it really needs to be a function with multiple return
> values. Otherwise, you can just stick asserts just before your return
> statement, and you don't need __old or any such thing.

If a function has multiple return values the out(result) helps make sure all 
the return paths are verified.

If the function has only one return value it helps anyway, because it helps you 
not forget to verify the result.


> Under what circumstances are they are more valuable than any other assert 
> inside a function?

I have already given some answers. Another answer is this:


int foo(int x)
in {
// ...
}
out(result) {
auto y = computeSomething(result);
assert(y ...);
assert(y ...);
}
body {
// ...
}

The out{} helps you organize your code, separating the tests of the body from 
the postcondition tests. Also in the postcondition you are allowed to define 
new variables and call things. All this out(){} code vanishes in release mode. 
Ho do you do that with just asserts inside the body?


If you do this the asserts will vanish in release mode, but the y will be 
computed still, wasting computations (a smart compiler is able to see y is not 
used and etc, but it's not sure this optimization happens if the computation of 
y is complex and it's done in-place):

int foo(int x)
in {
// ...
}
body {
result = ...;
auto y = computeSomething(result);
assert(y ...);
assert(y ...);
return result;
}


I presume there are ways to disable the computation of y in release mode, but I 
don't want to think about them. I just stick the y computation in the 
postcondition and the compiler will take care of it.

Bye,
bearophile


Re: [GSOC] regular expressions beta is here

2011-08-11 Thread Adam D. Ruppe
If it's worth anything, I use the out contracts in dom.d more as
checked documentation than for serious bug-finding.

For example:

Element appendChild(Element newChild)
out (ret) { assert(ret is newChild); }
body { ... }

I also use it from time to time to assert that a return value is not
null. The check itself isn't particularly useful, but I think it's
a nice bit of documentation.

Actually, IMO, in and out contracts should be in the generated
ddoc too.


Re: [GSOC] regular expressions beta is here

2011-08-11 Thread Don

bearophile wrote:

Don:


"out" contracts seem to be almost useless, unless you have a theorem prover. The 
reason is, that they test nothing apart from the function they are attached to, and it's 
much better to do that with unittesting.<


I see three different situations where postconditions are useful in D:

1) Sometimes the result of your function/method must satisfy some simple 
condition to be correct.

As example, it must be a nonnegative number. Then you add assert(result >= 0, "..."); in the 
out. For a Phobos example, std.algorithm.countUntil postcondition is allowed to test assert(result >= 
-1, "...");

Other possible conditions are the output string can't be longer than a certain 
amount (like longer than the input string), and so on.

In certain cases the program the finds the solution is slow, but testing the correctness of a function is fast.   I have hit many situations like this. 
As an example you test if the result of a complex sorting algorithm is ordered, and with the same length of the input (but maybe you don't test for the output items to be the same of the input).



2) I have found many situations where I am able to solve a problem with both a 
simple and slow brute force solver, and a complex and fast algorithm to solve a 
problem. The little program maybe is too much slow for normal usage, but it's 
just few lines long (especially if I use lot of std.algorithm stuff) but it's 
much less likely to contain bugs.


Sorry, but personally I don't believe that this is useful outside of toy 
examples.
The question is, what bugs does it find that aren't found by a trivial 
unit test?



You can't always verify the result of the fast algorithm with the slow 
algorithm, this is not useful.
In such situations I write the postcondition like this:

in {
// ...
} 
out(result) {

// some fast postconditon tests here
 
debug {

assert(result == slowAlgorithm(input));
}
body {
// fast algorithm here
}


This way, in release mode it tests nothing, in nonrelease build it tests the 
fast postconditions, and in debug mode it also verifies the fast algorithm 
gives the same results as the slow algorithm. Generally solving a problem in 
two quite different ways helps catch problems in the algorithms.


3) When D will get the prestate ("old" in some contract programming 
implementations), I will be able to use the prestate inside the postcondition to verify 
better than the function/method has changed the globals, or instance attributes in a 
correct way. You can't put such tests in the class/struct invariant, or in the 
precondition.


There are two cases:
(1) it's a very tight test. In which case, it's essentially a unit test.
or (2) it's a very loose test. In which case, it doesn't find bugs.



I'm using postconditions often in my code (less often than preconditions, but 
often enough). A theorem prover is not strictly necessary for them to be useful.


I would like to see an example of a good postcondition.
The crucial feature is, they do NOTHING except find bugs in the function 
they are attached to. So it's very difficult to invent a plausible one.
For starters, it really needs to be a function with multiple return 
values. Otherwise, you can just stick asserts just before your return 
statement, and you don't need __old or any such thing.
Under what circumstances are they are more valuable than any other 
assert inside a function?


Re: [GSOC] regular expressions beta is here

2011-08-11 Thread Dmitry Olshansky

On 11.08.2011 8:58, Don wrote:

bearophile wrote:

Contracts don't replace unittests, they complement each other.


They are nice but have little value over plain assert _unless_ we 
are talking about classes and _inheritance_, which isn't the case here.


It's easy to forget to test the output of a function, the "out" 
contracts help here. In structs the invariant helps you avoid 
forgetting to call manually a sanity test function every time you 
come in and out of a method.


You're conflating a couple of things here. Invariants are tremendously 
helpful for structs as well as classes.


I stand corrected about invariants, somehow I wasn't considering them a 
part of contracts.


"out" contracts seem to be almost useless, unless you have a theorem 
prover. The reason is, that they test nothing apart from the function 
they are attached to, and it's much better to do that with unittesting.

They have very little in common with 'in' contracts.

I think that EVERY struct and class in Phobos should have an invariant 
(except for something like Complex, where there are no invalid values).

But I don't think 'out' contracts would add much value at all.



--
Dmitry Olshansky



Re: [GSOC] regular expressions beta is here

2011-08-11 Thread bearophile
Don:

>"out" contracts seem to be almost useless, unless you have a theorem prover. 
>The reason is, that they test nothing apart from the function they are 
>attached to, and it's much better to do that with unittesting.<

I see three different situations where postconditions are useful in D:

1) Sometimes the result of your function/method must satisfy some simple 
condition to be correct.

As example, it must be a nonnegative number. Then you add assert(result >= 0, 
"..."); in the out. For a Phobos example, std.algorithm.countUntil 
postcondition is allowed to test assert(result >= -1, "...");

Other possible conditions are the output string can't be longer than a certain 
amount (like longer than the input string), and so on.

In certain cases the program the finds the solution is slow, but testing the 
correctness of a function is fast.   I have hit many situations like this. 
As an example you test if the result of a complex sorting algorithm is ordered, 
and with the same length of the input (but maybe you don't test for the output 
items to be the same of the input).


2) I have found many situations where I am able to solve a problem with both a 
simple and slow brute force solver, and a complex and fast algorithm to solve a 
problem. The little program maybe is too much slow for normal usage, but it's 
just few lines long (especially if I use lot of std.algorithm stuff) but it's 
much less likely to contain bugs.
You can't always verify the result of the fast algorithm with the slow 
algorithm, this is not useful.
In such situations I write the postcondition like this:

in {
// ...
} 
out(result) {
// some fast postconditon tests here
 
debug {
assert(result == slowAlgorithm(input));
}
body {
// fast algorithm here
}


This way, in release mode it tests nothing, in nonrelease build it tests the 
fast postconditions, and in debug mode it also verifies the fast algorithm 
gives the same results as the slow algorithm. Generally solving a problem in 
two quite different ways helps catch problems in the algorithms.


3) When D will get the prestate ("old" in some contract programming 
implementations), I will be able to use the prestate inside the postcondition 
to verify better than the function/method has changed the globals, or instance 
attributes in a correct way. You can't put such tests in the class/struct 
invariant, or in the precondition.


I'm using postconditions often in my code (less often than preconditions, but 
often enough). A theorem prover is not strictly necessary for them to be useful.

Bye,
bearophile


Re: [GSOC] regular expressions beta is here

2011-08-11 Thread Jonathan M Davis
On Thursday, August 11, 2011 11:34:30 simendsjo wrote:
> On 11.08.2011 11:04, Jonathan M Davis wrote:
> > On Thursday, August 11, 2011 10:50:41 simendsjo wrote:
> >> On 10.08.2011 23:16, Jonathan M Davis wrote:
>  On 10.08.2011 22:12, Jonathan M Davis wrote:
> > There a few things that were agreed upon (such as always putting
> > braces on their own line),
>  
>  There is? Parallelism and json uses braces on the same line.
> >>> 
> >>> It was agreed upon, and where it has been noticed, it has been
> >>> fixed.
> >>> But as I said, the style guide needs updating on a few points.
> >>> Braces
> >>> on their own line is one of them.
> >>> 
> >>> - Jonathan M Davis
> >> 
> >> Damn - I've been changing my D style to braces on the same line. It's
> >> great as I do most D coding on a small laptop. Guess I'll have to
> >> change
> >> it again :)
> > 
> > You're free to do your braces however you'd like in your own code, but
> > any code submitted to Phobos or druntime needs to have the braces on
> > their own line.
> > 
> > - Jonathan M Davis
> 
> I actually like that a language has a "default" style. Java, C# and
> Python all has a default style that makes code easy to read regardless
> of who wrote it (of course, python has some enforced stuff with
> indentation). You can, for instance, break the style as much as you'd
> like in C#, but I've yet to see a library that uses a very different style.
> 
> But then again.. Unless it's written in an obfuscated style, it doesn't
> really matter that much..

Well, you're free to follow Phobos' style too. It's entirely up to you. But 
bracing style is the sort of thing that's likely to vary quite a bit from 
programmer to programmer (especially among those with a C or C++ background).

- Jonathan M Davis


Re: [GSOC] regular expressions beta is here

2011-08-11 Thread simendsjo

On 11.08.2011 11:04, Jonathan M Davis wrote:

On Thursday, August 11, 2011 10:50:41 simendsjo wrote:

On 10.08.2011 23:16, Jonathan M Davis wrote:

On 10.08.2011 22:12, Jonathan M Davis wrote:

There a few things that were agreed upon (such as always putting
braces on their own line),


There is? Parallelism and json uses braces on the same line.


It was agreed upon, and where it has been noticed, it has been fixed.
But as I said, the style guide needs updating on a few points. Braces
on their own line is one of them.

- Jonathan M Davis


Damn - I've been changing my D style to braces on the same line. It's
great as I do most D coding on a small laptop. Guess I'll have to change
it again :)


You're free to do your braces however you'd like in your own code, but any
code submitted to Phobos or druntime needs to have the braces on their own
line.

- Jonathan M Davis


I actually like that a language has a "default" style. Java, C# and 
Python all has a default style that makes code easy to read regardless 
of who wrote it (of course, python has some enforced stuff with 
indentation). You can, for instance, break the style as much as you'd 
like in C#, but I've yet to see a library that uses a very different style.


But then again.. Unless it's written in an obfuscated style, it doesn't 
really matter that much..


Re: [GSOC] regular expressions beta is here

2011-08-11 Thread Jonathan M Davis
On Thursday, August 11, 2011 10:50:41 simendsjo wrote:
> On 10.08.2011 23:16, Jonathan M Davis wrote:
> >> On 10.08.2011 22:12, Jonathan M Davis wrote:
> >>> There a few things that were agreed upon (such as always putting
> >>> braces on their own line),
> >> 
> >> There is? Parallelism and json uses braces on the same line.
> > 
> > It was agreed upon, and where it has been noticed, it has been fixed.
> > But as I said, the style guide needs updating on a few points. Braces
> > on their own line is one of them.
> > 
> > - Jonathan M Davis
> 
> Damn - I've been changing my D style to braces on the same line. It's
> great as I do most D coding on a small laptop. Guess I'll have to change
> it again :)

You're free to do your braces however you'd like in your own code, but any 
code submitted to Phobos or druntime needs to have the braces on their own 
line.

- Jonathan M Davis


Re: [GSOC] regular expressions beta is here

2011-08-11 Thread simendsjo

On 10.08.2011 23:16, Jonathan M Davis wrote:

On 10.08.2011 22:12, Jonathan M Davis wrote:

There a few things that were agreed upon (such as always putting
braces on their own line),


There is? Parallelism and json uses braces on the same line.


It was agreed upon, and where it has been noticed, it has been fixed. But as I
said, the style guide needs updating on a few points. Braces on their own line
is one of them.

- Jonathan M Davis


Damn - I've been changing my D style to braces on the same line. It's 
great as I do most D coding on a small laptop. Guess I'll have to change 
it again :)


Re: [GSOC] regular expressions beta is here

2011-08-10 Thread Don

Lutger Blijdestijn wrote:

Don wrote:


bearophile wrote:

Contracts don't replace unittests, they complement each other.



They are nice but have little value over plain assert
_unless_ we are talking about classes and _inheritance_, which isn't the
case here.

It's easy to forget to test the output of a function, the "out" contracts
help here. In structs the invariant helps you avoid forgetting to call
manually a sanity test function every time you come in and out of a
method.

You're conflating a couple of things here. Invariants are tremendously
helpful for structs as well as classes.
"out" contracts seem to be almost useless, unless you have a theorem
prover. The reason is, that they test nothing apart from the function
they are attached to, and it's much better to do that with unittesting.
They have very little in common with 'in' contracts.

I think that EVERY struct and class in Phobos should have an invariant
(except for something like Complex, where there are no invalid values).
But I don't think 'out' contracts would add much value at all.


What about out contracts on interfaces in a library (where you use the 
library by implementing them).


That involves inheritance. But I don't think there are any cases in 
Phobos where that is currently applicable.





Re: [GSOC] regular expressions beta is here

2011-08-10 Thread Jacob Carlborg

On 2011-08-10 19:45, Andrei Alexandrescu wrote:

That's pretty cool actually because it naturally extends the built-in
approach. When you do e.g. if (pointer) that's really equivalent to if
(cast(bool) pointer) and so on.

Andrei


Cool, I always thought that opCast was for explicit casts, but maybe 
it's explicit in this case, in some way.


--
/Jacob Carlborg


Re: [GSOC] regular expressions beta is here

2011-08-10 Thread Lutger Blijdestijn
Don wrote:

> bearophile wrote:
>> Contracts don't replace unittests, they complement each other.
>> 
>> 
>>> They are nice but have little value over plain assert
>>> _unless_ we are talking about classes and _inheritance_, which isn't the
>>> case here.
>> 
>> It's easy to forget to test the output of a function, the "out" contracts
>> help here. In structs the invariant helps you avoid forgetting to call
>> manually a sanity test function every time you come in and out of a
>> method.
> 
> You're conflating a couple of things here. Invariants are tremendously
> helpful for structs as well as classes.
> "out" contracts seem to be almost useless, unless you have a theorem
> prover. The reason is, that they test nothing apart from the function
> they are attached to, and it's much better to do that with unittesting.
> They have very little in common with 'in' contracts.
> 
> I think that EVERY struct and class in Phobos should have an invariant
> (except for something like Complex, where there are no invalid values).
> But I don't think 'out' contracts would add much value at all.

What about out contracts on interfaces in a library (where you use the 
library by implementing them).



Re: [GSOC] regular expressions beta is here

2011-08-10 Thread Jonathan M Davis
On Thursday, August 11, 2011 06:58:51 Don wrote:
> I think that EVERY struct and class in Phobos should have an invariant
> (except for something like Complex, where there are no invalid values).
> But I don't think 'out' contracts would add much value at all.

That would be great, but several bugs need to be fixed before that's possible, 
including

http://d.puremagic.com/issues/show_bug.cgi?id=1251
http://d.puremagic.com/issues/show_bug.cgi?id=5039
http://d.puremagic.com/issues/show_bug.cgi?id=5058
http://d.puremagic.com/issues/show_bug.cgi?id=5500

- Jonathan M Davis


Re: [GSOC] regular expressions beta is here

2011-08-10 Thread Don

bearophile wrote:

Contracts don't replace unittests, they complement each other.


They are nice but have little value over plain assert 
_unless_ we are talking about classes and _inheritance_, which isn't the 
case here.


It's easy to forget to test the output of a function, the "out" contracts help 
here. In structs the invariant helps you avoid forgetting to call manually a sanity test 
function every time you come in and out of a method.


You're conflating a couple of things here. Invariants are tremendously 
helpful for structs as well as classes.
"out" contracts seem to be almost useless, unless you have a theorem 
prover. The reason is, that they test nothing apart from the function 
they are attached to, and it's much better to do that with unittesting.

They have very little in common with 'in' contracts.

I think that EVERY struct and class in Phobos should have an invariant 
(except for something like Complex, where there are no invalid values).

But I don't think 'out' contracts would add much value at all.


Re: [GSOC] regular expressions beta is here

2011-08-10 Thread Jonathan M Davis
> On 10.08.2011 22:12, Jonathan M Davis wrote:
> > There a few things that were agreed upon (such as always putting
> > braces on their own line),
> 
> There is? Parallelism and json uses braces on the same line.

It was agreed upon, and where it has been noticed, it has been fixed. But as I 
said, the style guide needs updating on a few points. Braces on their own line 
is one of them.

- Jonathan M Davis


Re: [GSOC] regular expressions beta is here

2011-08-10 Thread Marco Leise

Am 10.08.2011, 22:12 Uhr, schrieb Jonathan M Davis :


[...] I don't see any reason to keep discussing it over and over.


You see, and that is why we should make that explicit rather than implicit  
in the style guide. An additional point "personal preference" could list  
"blank lines to group logical blocks of code".


Re: [GSOC] regular expressions beta is here

2011-08-10 Thread simendsjo

On 10.08.2011 22:12, Jonathan M Davis wrote:

There a few things that were agreed upon (such as always putting
braces on their own line),


There is? Parallelism and json uses braces on the same line.



Re: [GSOC] regular expressions beta is here

2011-08-10 Thread Jonathan M Davis
On Wednesday, August 10, 2011 21:42:01 Marco Leise wrote:
> Am 10.08.2011, 19:24 Uhr, schrieb Adam D. Ruppe
> 
> :
> > bearophile:
> > 
> > The thing is just because you call it a problem a lot doesn't mean
> > everyone else sees it that way.
> > 
> > A lot of us have many years of experience and just don't see it the
> > same way you do.
> 
> I think a blank line makes code easier on the eyes. When you scroll over
> it you recognize easily where you are from the size and shape of the
> paragraphs. So I totally understand that. On the other hand my laptop
> screen is 1280x800 and I also feel that sometimes I think I scroll over
> the end of a function body when there is just a blank line in a block of
> code. So usually I go with the approach of inserting a comment line
> instead of a blank line, which is usually italic and in a brighter color.
> If I was working on a Phobos module I would try to mime existing code
> style (and probably find out that there is no common style :p ). Anyway
> such things can be up to a vote just like the idea to not use single
> capital letters only for template type placeholders (i.e. T, S).
> Google's code style wiki is nice. It lists all the rules and also offers
> an explanation. We can have that for Phobos, too. So topics like these
> don't come up over and over again. The D style guide is a good start:
> http://www.digitalmars.com/d/2.0/dstyle.html

This sort of thing has been discussed by the Phobos dev team previously, and 
the general consensus was not to enforce much in the way of formatting in a 
style guide. There a few things that were agreed upon (such as always putting 
braces on their own line), but on the whole, the style guide is supposed to 
focus on the API (so, things like function and variable names) rather than how 
code is formatted. I have an update to the style guide as a pull request which 
is currently being reviewed to make sure that the style guide on the site is 
in line with what we do:

https://github.com/D-Programming-Language/d-programming-language.org/pull/16

But I'm certain that you're not going to get the Phobos devs to agree on a 
style guide like Bearophile wants. And honestly, I'm a bit tired of the topic 
coming up. The does need some updates, but it's mostly correct. It's 
essentially what we've decided on, and I don't see any reason to keep 
discussing it over and over.

Personally, I'd prefer that Dmitry had more blank lines in his code, but it's 
up to him how he does that as long as his code falls within the rules set down 
by the D style guide. And for any of his code which isn't going into Phobos, 
it's completely up to him how to format it.

- Jonathan M Davis


Re: [GSOC] regular expressions beta is here

2011-08-10 Thread Marco Leise
Am 10.08.2011, 19:24 Uhr, schrieb Adam D. Ruppe  
:



bearophile:

The thing is just because you call it a problem a lot doesn't mean
everyone else sees it that way.

A lot of us have many years of experience and just don't see it the
same way you do.


I think a blank line makes code easier on the eyes. When you scroll over  
it you recognize easily where you are from the size and shape of the  
paragraphs. So I totally understand that. On the other hand my laptop  
screen is 1280x800 and I also feel that sometimes I think I scroll over  
the end of a function body when there is just a blank line in a block of  
code. So usually I go with the approach of inserting a comment line  
instead of a blank line, which is usually italic and in a brighter color.
If I was working on a Phobos module I would try to mime existing code  
style (and probably find out that there is no common style :p ). Anyway  
such things can be up to a vote just like the idea to not use single  
capital letters only for template type placeholders (i.e. T, S).
Google's code style wiki is nice. It lists all the rules and also offers  
an explanation. We can have that for Phobos, too. So topics like these  
don't come up over and over again. The D style guide is a good start:  
http://www.digitalmars.com/d/2.0/dstyle.html


Re: [GSOC] regular expressions beta is here

2011-08-10 Thread bearophile
Dmitry Olshansky:

> Braces *are* paragraphs of code,

They sometimes are, but inside functions there are other kinds of "paragraphs".

As an example, this is first-quality C code (partially written by R. Hettinger):
http://hg.python.org/cpython/file/d5b274a0b0a5/Modules/_collectionsmodule.c

If you take a random function from that page, like:

653 static int
654 deque_del_item(dequeobject *deque, Py_ssize_t i)
655 {
656 PyObject *item;
657
658 assert (i >= 0 && i < deque->len);
659 if (_deque_rotate(deque, -i) == -1)
660 return -1;
661
662 item = deque_popleft(deque, NULL);
663 assert (item != NULL);
664 Py_DECREF(item);
665
666 return _deque_rotate(deque, i);
667 }

You see a blank line after "Py_DECREF(item);" despite there is no closing 
brace. The purpose of those blank lines is to help the person that reads the 
code to tell apart the various things done by that function. This is C code is 
well written.


> No gonna work, file I/O is certainly in Phobos, as are network sockets, 
> etc. You can't assert that something external won't fail.

OK.


> I hate being drugged in these discussions, but just can't resist.

I am sorry, but thank you for answering :-)

Bye,
bearophile


Re: [GSOC] regular expressions beta is here

2011-08-10 Thread Dmitry Olshansky

On 10.08.2011 22:11, Vladimir Panteleev wrote:
On Wed, 10 Aug 2011 20:59:27 +0300, Dmitry Olshansky 
 wrote:



About spaces personally I dislike eating extra vertical space for
"clarity", curly braces on it's own line is already way too much.
Think about reading a book without the half lines between 
paragraphs. In code it's the same. Some empty lines are good to 
improve readability of the code. Curly braces are not always 
present, sometimes a paragraphs ends before or after or right on a 
curly brace.


Braces *are* paragraphs of code, with proper indention it's more then 
enough to fell the structure. If I really need to stop in the middle 
function, it's to explain something, then a single line of comment 
instead of meaningless empty line (which leaves reader clueless as to 
why) is good enough. Except that I'm not opposed to spaces at global 
scope.


I agree with bearophile; I find code that leaves a blank line between 
closely-related lines make the code much more readable. I don't 
understand what's with the craving for maximum vertical terseness 
either, but that may be because the resolution of my primary monitor 
is currently 1200x1920 :)


Lucky you, hm... probably turning my monitor on 90 degrees can get me in 
this league of abundant vertical space :)


--
Dmitry Olshansky



Re: [GSOC] regular expressions beta is here

2011-08-10 Thread Vladimir Panteleev
On Wed, 10 Aug 2011 20:59:27 +0300, Dmitry Olshansky  
 wrote:



About spaces personally I dislike eating extra vertical space for
"clarity", curly braces on it's own line is already way too much.
Think about reading a book without the half lines between paragraphs.  
In code it's the same. Some empty lines are good to improve readability  
of the code. Curly braces are not always present, sometimes a  
paragraphs ends before or after or right on a curly brace.


Braces *are* paragraphs of code, with proper indention it's more then  
enough to fell the structure. If I really need to stop in the middle  
function, it's to explain something, then a single line of comment  
instead of meaningless empty line (which leaves reader clueless as to  
why) is good enough. Except that I'm not opposed to spaces at global  
scope.


I agree with bearophile; I find code that leaves a blank line between  
closely-related lines make the code much more readable. I don't understand  
what's with the craving for maximum vertical terseness either, but that  
may be because the resolution of my primary monitor is currently 1200x1920  
:)


--
Best regards,
 Vladimirmailto:vladi...@thecybershadow.net


Re: [GSOC] regular expressions beta is here

2011-08-10 Thread Dmitry Olshansky

On 10.08.2011 21:11, bearophile wrote:

Dmitry Olshansky:


Honestly I can't get why you are so nervous about code style anyway, you
seem to bring this up way to often.

I bring it often because many D programmers seem half blind to this problem. I 
am not willing to go to the extremes Go language goes to solve this problem, 
but I'd like more recognition of this problem in D programmers. A bit more 
common style is quite helpful to create an ecology of D programmers that share 
single modules. I guess D programmers are used to C/C++ languages, where there 
are not modules and where programs are usually made of many files. So they 
don't see why sharing single modules in the pool is so useful.



About spaces personally I dislike eating extra vertical space for
"clarity", curly braces on it's own line is already way too much.

Think about reading a book without the half lines between paragraphs. In code 
it's the same. Some empty lines are good to improve readability of the code. 
Curly braces are not always present, sometimes a paragraphs ends before or 
after or right on a curly brace.


Braces *are* paragraphs of code, with proper indention it's more then 
enough to fell the structure. If I really need to stop in the middle 
function, it's to explain something, then a single line of comment 
instead of meaningless empty line (which leaves reader clueless as to 
why) is good enough. Except that I'm not opposed to spaces at global scope.





Have to respectfully disagree on this, don't try to nail everything on
contracts.

Contracts don't replace unittests, they complement each other.


unittest != assert, though the former do contain asserts.

They are nice but have little value over plain assert
_unless_ we are talking about classes and _inheritance_, which isn't the
case here.

It's easy to forget to test the output of a function, the "out" contracts help 
here. In structs the invariant helps you avoid forgetting to call manually a sanity test 
function every time you come in and out of a method.



And there are lots of asserts here, but much more of input is
enforced since it's totally expected to supply wrong pattern (or have an
outside  user to type in the pattern).

The idea is to replace those enforces with asserts, and allow user programs to 
import Phobos stuff that still contain asserts (from a secondary Phobos lib). 
Enforces are for certain kinds of user code, I don't think they are fit in 
Phobos.


No gonna work, file I/O is certainly in Phobos, as are network sockets, 
etc. You can't assert that something external won't fail. While you'd 
normally assert on your local logical invariants. As for other things I 
thought e.g. ranges are already hooked on asserts, as much as other 
templates. If you have a list of modules where you find the lack of 
compiled in contracts/asserts unbearable, do tell.


I hate being drugged in these discussions, but just can't resist.

--
Dmitry Olshansky



Re: [GSOC] regular expressions beta is here

2011-08-10 Thread Andrei Alexandrescu

On 8/10/11 10:46 AM, Jacob Carlborg wrote:

On 2011-08-10 17:55, Dmitry Olshansky wrote:

On 10.08.2011 18:54, Jacob Carlborg wrote:

Interesting idea, one problem with it is that I want this:

auto m = match("bleh", "bleh");
writeln(m);

to actually print "bleh", not true
Right now due to a carry over bug from std.regex (interface thing)
writln(m) will just do a stackoverflow, m.hit however works.


No, that won't be any problem:

struct Foo
{
bool b;
alias b this;
}

auto f = Foo();
static assert(is(typeof(f) == Foo));

The above assert passes as expected.

That may be all well, but try writeln on it, what will it print?


Hmm, it doesn't print anything, I think it looks like a bug in writeln.


After some experience with alias this I had to conclude that it's rather
blunt tool, and I'd rather stay away of it.
Actually I like Steven's opCast suggestion, so that it works in
conditionals.


Oh, I didn't know that it would work implicitly in conditionals. Then
I'm happy with opCast :)


That's pretty cool actually because it naturally extends the built-in 
approach. When you do e.g. if (pointer) that's really equivalent to if 
(cast(bool) pointer) and so on.


Andrei


Re: [GSOC] regular expressions beta is here

2011-08-10 Thread Andrei Alexandrescu

On 8/10/11 9:55 AM, Dmitry Olshansky wrote:

On 10.08.2011 18:54, Jacob Carlborg wrote:

Interesting idea, one problem with it is that I want this:

auto m = match("bleh", "bleh");
writeln(m);

to actually print "bleh", not true
Right now due to a carry over bug from std.regex (interface thing)
writln(m) will just do a stackoverflow, m.hit however works.


No, that won't be any problem:

struct Foo
{
bool b;
alias b this;
}

auto f = Foo();
static assert(is(typeof(f) == Foo));

The above assert passes as expected.

That may be all well, but try writeln on it, what will it print?
After some experience with alias this I had to conclude that it's rather
blunt tool, and I'd rather stay away of it.


If alias this is any more blunt than regular subtyping (inheritance), 
that would be a bug. Feel free to submit if you find such issues.


Andrei


Re: [GSOC] regular expressions beta is here

2011-08-10 Thread bearophile
Adam D. Ruppe:

> A lot of us have many years of experience and just don't see it the
> same way you do.

This "you" is a group that includes people like Guido V. Rossum, Rob Pike, Ken 
Thompson and R. Hettinger (they have feelings even stronger than mine on this 
topic).

Bye,
bearophile


Re: [GSOC] regular expressions beta is here

2011-08-10 Thread Adam D. Ruppe
bearophile:

The thing is just because you call it a problem a lot doesn't mean
everyone else sees it that way.

A lot of us have many years of experience and just don't see it the
same way you do.


Re: [GSOC] regular expressions beta is here

2011-08-10 Thread Steven Schveighoffer

On Wed, 10 Aug 2011 12:46:25 -0400, Jacob Carlborg  wrote:


On 2011-08-10 17:55, Dmitry Olshansky wrote:



After some experience with alias this I had to conclude that it's rather
blunt tool, and I'd rather stay away of it.
Actually I like Steven's opCast suggestion, so that it works in
conditionals.


alias this has lots of problems, but it doesn't mean it's *design* is  
blunt, just that the implementation of it is not too good.




Oh, I didn't know that it would work implicitly in conditionals. Then  
I'm happy with opCast :)




http://www.d-programming-language.org/operatoroverloading.html#Cast

Note that it only works for structs (not sure if that return type is a  
struct or not...)


-Steve


Re: [GSOC] regular expressions beta is here

2011-08-10 Thread bearophile
Dmitry Olshansky:

> Honestly I can't get why you are so nervous about code style anyway, you 
> seem to bring this up way to often.

I bring it often because many D programmers seem half blind to this problem. I 
am not willing to go to the extremes Go language goes to solve this problem, 
but I'd like more recognition of this problem in D programmers. A bit more 
common style is quite helpful to create an ecology of D programmers that share 
single modules. I guess D programmers are used to C/C++ languages, where there 
are not modules and where programs are usually made of many files. So they 
don't see why sharing single modules in the pool is so useful.


> About spaces personally I dislike eating extra vertical space for 
> "clarity", curly braces on it's own line is already way too much.

Think about reading a book without the half lines between paragraphs. In code 
it's the same. Some empty lines are good to improve readability of the code. 
Curly braces are not always present, sometimes a paragraphs ends before or 
after or right on a curly brace.


> Have to respectfully disagree on this, don't try to nail everything on 
> contracts.

Contracts don't replace unittests, they complement each other.


> They are nice but have little value over plain assert 
> _unless_ we are talking about classes and _inheritance_, which isn't the 
> case here.

It's easy to forget to test the output of a function, the "out" contracts help 
here. In structs the invariant helps you avoid forgetting to call manually a 
sanity test function every time you come in and out of a method.


> And there are lots of asserts here, but much more of input is 
> enforced since it's totally expected to supply wrong pattern (or have an 
> outside  user to type in the pattern).

The idea is to replace those enforces with asserts, and allow user programs to 
import Phobos stuff that still contain asserts (from a secondary Phobos lib). 
Enforces are for certain kinds of user code, I don't think they are fit in 
Phobos.

Bye,
bearophile


Re: [GSOC] regular expressions beta is here

2011-08-10 Thread Jacob Carlborg

On 2011-08-10 18:02, bearophile wrote:

Dmitry Olshansky:


To get a small no-crap-included beta package see download section of
https://github.com/blackwhale/FReD for .7zs.


When you write some English text you don't write a single block of text, you 
organize it into paragraphs, and paragraphs into chapters, chapters into 
sections, sections into books, etc. Time ago I have understood that paragraphs 
are very good in source code too.

So I suggest you to add a blank line here and there inside your functions to separate 
them into paragraphs. I can't give you a style rule, you will need to create your own 
style, but often a function that's more than 10 lines line long needs one or more blank 
lines inside (some people say that every time you see one of such paragraphs in a 
function, especially if it has a comment before it, then you need to perform an 
"extract method" to improve the code. I believe this is a bad advice).


I always add a blank line before and after statements.

--
/Jacob Carlborg


Re: [GSOC] regular expressions beta is here

2011-08-10 Thread Jacob Carlborg

On 2011-08-10 17:55, Dmitry Olshansky wrote:

On 10.08.2011 18:54, Jacob Carlborg wrote:

Interesting idea, one problem with it is that I want this:

auto m = match("bleh", "bleh");
writeln(m);

to actually print "bleh", not true
Right now due to a carry over bug from std.regex (interface thing)
writln(m) will just do a stackoverflow, m.hit however works.


No, that won't be any problem:

struct Foo
{
bool b;
alias b this;
}

auto f = Foo();
static assert(is(typeof(f) == Foo));

The above assert passes as expected.

That may be all well, but try writeln on it, what will it print?


Hmm, it doesn't print anything, I think it looks like a bug in writeln.


After some experience with alias this I had to conclude that it's rather
blunt tool, and I'd rather stay away of it.
Actually I like Steven's opCast suggestion, so that it works in
conditionals.


Oh, I didn't know that it would work implicitly in conditionals. Then 
I'm happy with opCast :)


--
/Jacob Carlborg


Re: [GSOC] regular expressions beta is here

2011-08-10 Thread Dmitry Olshansky

On 10.08.2011 20:02, bearophile wrote:

Dmitry Olshansky:


To get a small no-crap-included beta package see download section of
https://github.com/blackwhale/FReD for .7zs.

When you write some English text you don't write a single block of text, you 
organize it into paragraphs, and paragraphs into chapters, chapters into 
sections, sections into books, etc. Time ago I have understood that paragraphs 
are very good in source code too.

So I suggest you to add a blank line here and there inside your functions to separate 
them into paragraphs. I can't give you a style rule, you will need to create your own 
style, but often a function that's more than 10 lines line long needs one or more blank 
lines inside (some people say that every time you see one of such paragraphs in a 
function, especially if it has a comment before it, then you need to perform an 
"extract method" to improve the code. I believe this is a bad advice).


While I haven't asked for review, I do appreciate  comments. I have to 
say I did no cleanup or otherwise shape up the code, I'm still working 
on semantic side part of problems:)
Honestly I can't get why you are so nervous about code style anyway, you 
seem to bring this up way to often.
About spaces personally I dislike eating extra vertical space for 
"clarity", curly braces on it's own line is already way too much.




I see no contracts in the code (I mean the ones with assert inside, instead of 
enforce). I suggest Walter to fix this situation. One idea is to include two 
versions of Phobos lib in the zip of the dmd distribution, one with asserts 
compiled in and one without, and let DMD import from the correct library 
according to the compilation flags.
Some solution to this problem is getting urgent, because Phobos is growing 
without the use of one of the nicest features of D (contract programming). 
Solving this problem is more urgent than having an excellent regex library in 
Phobos. If people don't use contract programming much, is because you can't use 
it in Phobos.
Have to respectfully disagree on this, don't try to nail everything on 
contracts. They are nice but have little value over plain assert 
_unless_ we are talking about classes and _inheritance_, which isn't the 
case here. And there are lots of asserts here, but much more of input is 
enforced since it's totally expected to supply wrong pattern (or have an 
outside  user to type in the pattern).



Bye,
bearophile



--
Dmitry Olshansky



Re: [GSOC] regular expressions beta is here

2011-08-10 Thread bearophile
Dmitry Olshansky:

> To get a small no-crap-included beta package see download section of 
> https://github.com/blackwhale/FReD for .7zs.

When you write some English text you don't write a single block of text, you 
organize it into paragraphs, and paragraphs into chapters, chapters into 
sections, sections into books, etc. Time ago I have understood that paragraphs 
are very good in source code too.

So I suggest you to add a blank line here and there inside your functions to 
separate them into paragraphs. I can't give you a style rule, you will need to 
create your own style, but often a function that's more than 10 lines line long 
needs one or more blank lines inside (some people say that every time you see 
one of such paragraphs in a function, especially if it has a comment before it, 
then you need to perform an "extract method" to improve the code. I believe 
this is a bad advice).

I see no contracts in the code (I mean the ones with assert inside, instead of 
enforce). I suggest Walter to fix this situation. One idea is to include two 
versions of Phobos lib in the zip of the dmd distribution, one with asserts 
compiled in and one without, and let DMD import from the correct library 
according to the compilation flags.

Some solution to this problem is getting urgent, because Phobos is growing 
without the use of one of the nicest features of D (contract programming). 
Solving this problem is more urgent than having an excellent regex library in 
Phobos. If people don't use contract programming much, is because you can't use 
it in Phobos.

Bye,
bearophile


Re: [GSOC] regular expressions beta is here

2011-08-10 Thread Dmitry Olshansky

On 10.08.2011 18:54, Jacob Carlborg wrote:

Interesting idea, one problem with it is that I want this:

auto m = match("bleh", "bleh");
writeln(m);

to actually print "bleh", not true
Right now due to a carry over bug from std.regex (interface thing)
writln(m) will just do a stackoverflow, m.hit however works.


No, that won't be any problem:

struct Foo
{
bool b;
alias b this;
}

auto f = Foo();
static assert(is(typeof(f) == Foo));

The above assert passes as expected.

That may be all well, but  try writeln on it, what will it print?
After some experience with alias this I had to conclude that it's rather 
blunt tool, and I'd rather stay away of it.
Actually I like Steven's opCast suggestion, so that it works in 
conditionals.





Aren't there a lot of things that should be declared as private in the
fred.d module?



Yes, it's a side effect of me having a lot of debugging tool that do
need these internals. If only package protection attribute of something
was working
Not to mention that the whole module should work in SafeD with a couple
of @trusted here and there.


Ok, I see.




--
Dmitry Olshansky



Re: [GSOC] regular expressions beta is here

2011-08-10 Thread Dmitry Olshansky

On 10.08.2011 16:54, Steven Schveighoffer wrote:
On Wed, 10 Aug 2011 07:51:32 -0400, Dmitry Olshansky 
 wrote:



On 10.08.2011 15:34, Jacob Carlborg wrote:

On 2011-08-10 12:42, Dmitry Olshansky wrote:

In case I failed to mention it before, I m working on the project
codenamed FReD that is aimed at ~100%* source level compatible 
overhaul

of std.regex, that uses better implementation techniques, provides
modern Unicode support and common syntax riches.

I think it's time for a public beta release, since it _should_ be 
ready

for mainstream usage. There are some rough edges, and a couple issues
that I'm aware of but they are nowhere in realistic use cases.

In order to avoid unexpected regressions I'd be glad if current
std.regex users do try it for their projects/tests.
To get a small no-crap-included beta package see download section of
https://github.com/blackwhale/FReD for .7zs.
I'll upload newer packages as bugs get exposed and fixed. 
Alternatively,

if you a comfortable with git you may just git clone entire repo. Some
helpful notes (same as README) can be found here :
https://github.com/blackwhale/FReD/wiki/Beta-release

Caveats:
In order for it compile a tiny change to 2.054 source is needed (no 
need

to recompile Phobos! it's only in templates):
patch std.algorithm.cmp according to this diff
https://github.com/D-Programming-Language/phobos/pull/176/files#L0L4631 

 
and to get CTFE features working add if(!__ctfe) listed in the next 
diff

on the same webpage.
(this is already upstream, so if you're using a fork of phobos just 
pull

this in)

* some API problems might lead to a breaking change, though it didn't
happen in this release


I have a suggestion, make RegexMatch implicitly convertible to bool, 
indicating if there was a match or not.



Interesting idea, one problem with it is that I want this:

auto m = match("bleh", "bleh");
writeln(m);

to actually print "bleh", not true


Without actually looking at the code, why wouldn't something like this 
work?


struct RegexMatch
{
   ...
   string toString() {...}
   opCast(T : bool)() {...}
}

This isn't an implicit cast, but it will work for conditional statements.


Thanks, I'll give it a try.


--
Dmitry Olshansky



Re: [GSOC] regular expressions beta is here

2011-08-10 Thread Vladimir Panteleev
On Wed, 10 Aug 2011 14:44:44 +0300, Dmitry Olshansky  
 wrote:


Yes, I've dubbed it  static regex. In fact it will be something similar  
to this, though it will do a heap allocation for backtracking points, on  
first call to match. Heap allocations are definetly going away in final  
release.


Awesome stuff. D's codegen abilities have the potential to put regex  
matching way ahead of any C/C++ libraries that don't JIT or stuff like  
that.


--
Best regards,
 Vladimirmailto:vladi...@thecybershadow.net


Re: [GSOC] regular expressions beta is here

2011-08-10 Thread Jacob Carlborg

Interesting idea, one problem with it is that I want this:

auto m = match("bleh", "bleh");
writeln(m);

to actually print "bleh", not true
Right now due to a carry over bug from std.regex (interface thing)
writln(m) will just do a stackoverflow, m.hit however works.


No, that won't be any problem:

struct Foo
{
bool b;
alias b this;
}

auto f = Foo();
static assert(is(typeof(f) == Foo));

The above assert passes as expected.


Aren't there a lot of things that should be declared as private in the
fred.d module?



Yes, it's a side effect of me having a lot of debugging tool that do
need these internals. If only package protection attribute of something
was working
Not to mention that the whole module should work in SafeD with a couple
of @trusted here and there.


Ok, I see.

--
/Jacob Carlborg


Re: [GSOC] regular expressions beta is here

2011-08-10 Thread Steven Schveighoffer
On Wed, 10 Aug 2011 07:51:32 -0400, Dmitry Olshansky  
 wrote:



On 10.08.2011 15:34, Jacob Carlborg wrote:

On 2011-08-10 12:42, Dmitry Olshansky wrote:

In case I failed to mention it before, I m working on the project
codenamed FReD that is aimed at ~100%* source level compatible overhaul
of std.regex, that uses better implementation techniques, provides
modern Unicode support and common syntax riches.

I think it's time for a public beta release, since it _should_ be ready
for mainstream usage. There are some rough edges, and a couple issues
that I'm aware of but they are nowhere in realistic use cases.

In order to avoid unexpected regressions I'd be glad if current
std.regex users do try it for their projects/tests.
To get a small no-crap-included beta package see download section of
https://github.com/blackwhale/FReD for .7zs.
I'll upload newer packages as bugs get exposed and fixed.  
Alternatively,

if you a comfortable with git you may just git clone entire repo. Some
helpful notes (same as README) can be found here :
https://github.com/blackwhale/FReD/wiki/Beta-release

Caveats:
In order for it compile a tiny change to 2.054 source is needed (no  
need

to recompile Phobos! it's only in templates):
patch std.algorithm.cmp according to this diff
https://github.com/D-Programming-Language/phobos/pull/176/files#L0L4631
  
and to get CTFE features working add if(!__ctfe) listed in the next  
diff

on the same webpage.
(this is already upstream, so if you're using a fork of phobos just  
pull

this in)

* some API problems might lead to a breaking change, though it didn't
happen in this release


I have a suggestion, make RegexMatch implicitly convertible to bool,  
indicating if there was a match or not.



Interesting idea, one problem with it is that I want this:

auto m = match("bleh", "bleh");
writeln(m);

to actually print "bleh", not true


Without actually looking at the code, why wouldn't something like this  
work?


struct RegexMatch
{
   ...
   string toString() {...}
   opCast(T : bool)() {...}
}

This isn't an implicit cast, but it will work for conditional statements.

-Steve


Re: [GSOC] regular expressions beta is here

2011-08-10 Thread Dmitry Olshansky

On 10.08.2011 15:34, Jacob Carlborg wrote:

On 2011-08-10 12:42, Dmitry Olshansky wrote:

In case I failed to mention it before, I m working on the project
codenamed FReD that is aimed at ~100%* source level compatible overhaul
of std.regex, that uses better implementation techniques, provides
modern Unicode support and common syntax riches.

I think it's time for a public beta release, since it _should_ be ready
for mainstream usage. There are some rough edges, and a couple issues
that I'm aware of but they are nowhere in realistic use cases.

In order to avoid unexpected regressions I'd be glad if current
std.regex users do try it for their projects/tests.
To get a small no-crap-included beta package see download section of
https://github.com/blackwhale/FReD for .7zs.
I'll upload newer packages as bugs get exposed and fixed. Alternatively,
if you a comfortable with git you may just git clone entire repo. Some
helpful notes (same as README) can be found here :
https://github.com/blackwhale/FReD/wiki/Beta-release

Caveats:
In order for it compile a tiny change to 2.054 source is needed (no need
to recompile Phobos! it's only in templates):
patch std.algorithm.cmp according to this diff
https://github.com/D-Programming-Language/phobos/pull/176/files#L0L4631
 


and to get CTFE features working add if(!__ctfe) listed in the next diff
on the same webpage.
(this is already upstream, so if you're using a fork of phobos just pull
this in)

* some API problems might lead to a breaking change, though it didn't
happen in this release


I have a suggestion, make RegexMatch implicitly convertible to bool, 
indicating if there was a match or not.



Interesting idea, one problem with it is that I want this:

auto m = match("bleh", "bleh");
writeln(m);

to actually print "bleh", not true
Right now due to a carry over bug from std.regex (interface thing) 
writln(m) will just do a stackoverflow, m.hit however works.


Aren't there a lot of things that should be declared as private in the 
fred.d module?




Yes, it's a side effect of me having a lot of debugging tool that do 
need these internals. If only package protection attribute of something 
was working
Not to mention that the whole module should work in SafeD with a couple 
of @trusted here and there.


--
Dmitry Olshansky



Re: [GSOC] regular expressions beta is here

2011-08-10 Thread Dmitry Olshansky

On 10.08.2011 15:16, Vladimir Panteleev wrote:
On Wed, 10 Aug 2011 13:42:25 +0300, Dmitry Olshansky 
 wrote:



and to get CTFE features working add if(!__ctfe) listed in the next diff


Hi, does this rewrite cover compile-time regex compilation?

E.g. regex!`^a` compiling to s.length&&s[0]=='a' or something like that.



Yes, I've dubbed it  static regex. In fact it will be something similar 
to this, though it will do a heap allocation for backtracking points, on 
first call to match. Heap allocations are definetly going away in final 
release.

You can pass -version=fred_ct -debug to dmd to see generated programs.
At the moment it's more prof of concept then speed devil, something I 
might see about to change once CTFE bugs worked out. Anyway when it 
doesn't crush the compiler, it's pretty fast :)


--
Dmitry Olshansky



Re: [GSOC] regular expressions beta is here

2011-08-10 Thread Jacob Carlborg

On 2011-08-10 12:42, Dmitry Olshansky wrote:

In case I failed to mention it before, I m working on the project
codenamed FReD that is aimed at ~100%* source level compatible overhaul
of std.regex, that uses better implementation techniques, provides
modern Unicode support and common syntax riches.

I think it's time for a public beta release, since it _should_ be ready
for mainstream usage. There are some rough edges, and a couple issues
that I'm aware of but they are nowhere in realistic use cases.

In order to avoid unexpected regressions I'd be glad if current
std.regex users do try it for their projects/tests.
To get a small no-crap-included beta package see download section of
https://github.com/blackwhale/FReD for .7zs.
I'll upload newer packages as bugs get exposed and fixed. Alternatively,
if you a comfortable with git you may just git clone entire repo. Some
helpful notes (same as README) can be found here :
https://github.com/blackwhale/FReD/wiki/Beta-release

Caveats:
In order for it compile a tiny change to 2.054 source is needed (no need
to recompile Phobos! it's only in templates):
patch std.algorithm.cmp according to this diff
https://github.com/D-Programming-Language/phobos/pull/176/files#L0L4631

and to get CTFE features working add if(!__ctfe) listed in the next diff
on the same webpage.
(this is already upstream, so if you're using a fork of phobos just pull
this in)

* some API problems might lead to a breaking change, though it didn't
happen in this release


I have a suggestion, make RegexMatch implicitly convertible to bool, 
indicating if there was a match or not.


Aren't there a lot of things that should be declared as private in the 
fred.d module?


--
/Jacob Carlborg


Re: [GSOC] regular expressions beta is here

2011-08-10 Thread Vladimir Panteleev
On Wed, 10 Aug 2011 13:42:25 +0300, Dmitry Olshansky  
 wrote:



and to get CTFE features working add if(!__ctfe) listed in the next diff


Hi, does this rewrite cover compile-time regex compilation?

E.g. regex!`^a` compiling to s.length&&s[0]=='a' or something like that.

--
Best regards,
 Vladimirmailto:vladi...@thecybershadow.net


[GSOC] regular expressions beta is here

2011-08-10 Thread Dmitry Olshansky
In case I failed to mention it before, I m working on the project 
codenamed FReD that is aimed at ~100%* source level compatible overhaul 
of std.regex, that uses better implementation techniques, provides 
modern Unicode support and common syntax riches.


I think it's time for a public beta release,  since it _should_ be 
ready for mainstream usage. There are some rough edges, and a couple 
issues that I'm aware of but they are nowhere in realistic use cases.


In order to avoid unexpected regressions I'd be glad if current 
std.regex users do try it for their projects/tests.
To get a small no-crap-included beta package see download section of 
https://github.com/blackwhale/FReD for .7zs.
I'll upload newer packages as bugs get exposed and fixed. Alternatively, 
if you a comfortable with git you may just git clone entire repo. Some 
helpful notes (same as README) can be found here : 
https://github.com/blackwhale/FReD/wiki/Beta-release


Caveats:
In order for it compile a tiny change to 2.054 source is needed (no 
need to recompile Phobos! it's only in templates):
patch std.algorithm.cmp according to this diff 
https://github.com/D-Programming-Language/phobos/pull/176/files#L0L4631 

and to get CTFE features working add if(!__ctfe) listed in the next diff 
on the same webpage.
(this is already upstream, so if you're using a fork of phobos just pull 
this in)


* some API problems might lead to a breaking change, though it didn't 
happen in this release


--
Dmitry Olshansky