Re: XMLWriter

2011-06-11 Thread Tomek Sowiński
Tomek Sowiński napisał:

 Documentation:
 http://www.keepandshare.com/doc/2863798/std-xml-html-june-11-2011-2-43-am-93k?da=y#XMLWriter

I just noticed it requires everyone to sign in :-(

Please use this link:
http://pastehtml.com/view/awrj8r4zg.html#XMLWriter

-- 
Tomek



XMLWriter

2011-06-10 Thread Tomek Sowiński
I've pilfered some time to wrap up and discuss the proposal for an easy to use 
and efficient XML writer.

Documentation:
http://www.keepandshare.com/doc/2863798/std-xml-html-june-11-2011-2-43-am-93k?da=y#XMLWriter

Code:
https://github.com/tomeksowi/phobos/commit/9f8bb890af7e85d5c4a38409ac13a73585bba643

I've been circling the design lately, questioning the existence of each feature 
and snipping off unnecessary parts. During the process I removed about 60% of 
the code and have come to a stage where I'm not sure whether further deletions 
won't cut into the healthy flesh of the project. These doubts are expressed in 
the questions in the documentation -- I'd like them to guide the discussion.

Oh, and please comment on the XMLWriter part only, the rest is old stuff.

-- 
Tomek


Re: string[] enumerations

2011-05-10 Thread Tomek Sowiński
Nrgyzer napisał:

 I need enumerations with string[] as base type.

What for?

-- 
Tomek



Re: GC for pure functions -- implementation ideas

2011-04-15 Thread Tomek Sowiński
Don napisał:

 LEAKY FUNCTIONS
 
 Define a 'leaky' pure function as a pure function which can return
 heap-allocated memory to the caller, ie, where the return value or a
 parameter passed by reference has at least one pointer or reference
 type. This can be determined simply by inspecting the signature. (Note
 that the function does not need to be immutably pure).
 
 The interesting thing is that heap allocation inside non-leaky pure
 functions behaves like stack allocation. When you return from that
 function, *all* those variables are unreachable, and can be discarded en 
 masse. Here's an idea of how to exploit this.
 
 THE PURE HEAP
 
 [snip]

I'm far from being a GC expert but I think Java having identified such cases 
with escape analysis just puts locally allocated objects on the stack.

Couldn't we too? Your mark  release pure heap scheme looks alright but this 
seems simpler.

The notion of non-leaky functions can be useful either way.

-- 
Tomek



Re: GSoC XML library proposal

2011-04-08 Thread Tomek Sowiński
Andrei Alexandrescu napisał:

 We have an XML library proposal. I know Tomek Sowinski was working on 
 such. What is the status?

The writer is close to being ready to be discussed, I've been working on the 
documentation lately. As for the parser, I got a pretty much good idea how to 
go about it but the code is pretty much in the woods. At work I've changed 
teams which entails lots of reading to get myself up to pace in the new area, 
travelling to an office abroad, and working till late with code I don't know 
yet. Now, that's not an excuse, just an honest answer to what's taking so 
long. I'm still willing to pull this module through. The work frenzy is 
clearing out and most probably I'll have the time to do some solid work this 
month and on.

 Does Tomek or someone else want to apply as a 
 mentor for this project?

Perhaps let's do it this way: I'll finish the writer, get it through community 
scrutiny myself, out of GSoC. The GSoC contribution to std.xml will be limited 
in scope to parsing. I will serve as a light bulb with all I read up so far on 
the topic, the lessons learned from my tries, and several years of experience 
with D. This way  the density of reviews as well as the odds of bringing the 
module home will be higher.

If that sounds good, let me know how to apply as a mentor.

-- 
Tomek



Re: Has the ban on returning function nested structs been lifted?

2011-03-18 Thread Tomek Sowiński
Andrei Alexandrescu napisał:

 Auto returns + local types = just awesome.

Why is it awesome?

-- 
Tomek



Re: Library Development: What to finish/flesh out?

2011-03-17 Thread Tomek Sowiński
dsimcha napisał:

 I've accumulated a bunch of little libraries via various evening and weekend
 hacking projects over the past year or so, in various states of completion.
 Most are things I'm at least half-considering for Phobos, though some belong
 as third-party libs.  I definitely don't have time to finish/flesh out all of
 them anytime soon, so I've decided to ask the community what to prioritize.
 Below is a summary of everything I've been working on, with its current level
 of completion.  Please let me know the following:
 
 1.  A relative ordering of how useful you think these libraries would be to
 the community.
 
 2.  In absolute terms, would you find this useful?
 
 3.  For the Phobos candidates, whether they're general enough to belong in the
 **standard** library.
 
 List in order from most to least finished:
 
 1.  Rational:  A library for handling rational numbers exactly.  Templated on
 integer type, can use BigInts for guaranteed accuracy, or fixed-width integers
 for more speed where the denominator and numerator will be small.  Completion
 state:  Mostly finished.  Just need to fix a litte bit rot and submit for
 review.  (Phobos candidate)

I'd find it useful. As for its presence in Phobos, I'm uncertain if it's in 
enough demand.

 2.  RandAA:  A hash table implementation with deterministic memory management,
 based on randomized probing.  Main advantage over builtin AAs is that it plays
 much nicer with the GC and multithreaded programs.  Lookup times are also
 expected O(1) no matter how many collisions exist in modulus hash space, as
 long as there are few collisions in full 32- or 64-bit hash space.  Completion
 state:  Mostly finished.  Just needs a little doc improvement, a few
 benchmarks and submission for review.  (Phobos candidate)

Useful for me and in Phobos.

 3.  TempAlloc:  A memory allocator based on a thread-local segmented stack,
 useful for allocating large temporary buffers in things like numerics code.
 Also comes with a hash table, hash set and AVL tree optimized for this
 allocation scheme.  The advantages over plain old stack allocation are that
 it's independent of function calls (meaning you can return pointers to
 TempAlloc-allocated memory from a function, etc.) and it's segmented, meaning
 you can allocate huge buffers w/o risking stack overflow.  Its main weakness
 is that this stack is not scanned by the GC, meaning that you can't store the
 only reference to a GC-allocated piece of memory here.  However, in practice
 large arrays of primitives are an extremely common case in
 performance-critical code.  I find this module immensely useful in dstats and
 Lars Kyllingstad uses it in SciD.  Getting it into Phobos would make it easy
 for other scientific/numerics code to use it.  Completion state:  Working and
 used.  Needs a litte cleanup and documentation.  (Phobos candidate)

Useful for me, don't know if for everyone else.

 4.  Streaming CSV Parser:  Parses CSV files as they're read in, a few
 convenience functions for extracting columns into structs.  If Phobos every
 gets SQLite support I'll probably add sugar for turning a CSV file into an
 SQLite database, too.  Completion state:  Prototype working, needs testing,
 cleanup and documentation.  (Phobos candidate)

You mean a lazy slurp? It'd be useful for everyone.

 5.  Matrix operations:  SciD improvements that allow you to write matrix
 operations that look like normal math/MATLAB and optimizes them via expression
 templates so that a minimal number of temporary matrices are created.
 Uses/will use BLAS for multiplication.  Completion state:  Addition
 implemented.  Multiplication not.

It is worth considering standardizing at least matrix expressions in Phobos. 
The motivation is analogous to ranges -- to run an algorithm from lib A on a 
matrix container from lib B. C++ would be green with envy.

I'd be glad to be part of the effort once I'm done with xml.

 6.  Machine learning:  Decision trees, KNN, Random Forest, Logistic
 Regression, SVM, Naive Bayes, etc.  This would be a dstats module.  Completion
 state:  Decision trees prototyped, logistic regression working.

I'd find it useful, I think anyone who's into this would too.

 7.  std.mixins:  Mixins for commonly needed boilerplate code.  I stopped
 working on this when Andrei suggested that making a collection of mixins into
 a module is a bad idea.  I've thought about it some more and I respectfully
 disagree.  std.mixins would be a one-stop shop for pretty much any boilerplate
 you need to inject, and most of this code doesn't fit in any other obvious
 place.  Completion state:  A few things (struct comparison, simple class
 constructors, Singleton pattern) prototyped.  (Phobos candidate)

I'm afraid I also think functionality should be categorized by the purpose it 
serves rather than implementation technique.

 8.  GZip support in std.file:  I'll leave the stream stuff for someone else,
 but just simple stuff like read(), write(), 

Dream package management system (Was: a cabal for D ?)

2011-03-17 Thread Tomek Sowiński
Jason E. Aten napisał:

 Please correct me if I'm wrong, but I observe that there doesn't appear 
 to be a package management system / standard repository for D libraries.  
 Or is there?

No, there isn't.

 I'm talking about something as easy to use as R's CRAN,
  install.packages(rforest)
 
 or cpan for perl, ctan for latex, dpgk/apt for debian, cabal for Haskell/
 Hackage, etc.
 
 If there's not a commonly utilized one currently, perhaps we could 
 borrow cabal, with a trivial port.  cabal is Haskell's package manager.
 
 Not only does having a standard package install system facilitate 
 adoption, it greatly facilitates code sharing and library maturation.

Yes, we need it badly.

I think it's a good moment to start a discussion. First off, what exactly do we 
want from a package management system?

-- 
Tomek



Re: Code Sandwiches

2011-03-09 Thread Tomek Sowiński
bearophile napisał:

 One of the things the paper says about D scope guards is: Scope guards do 
 not provide encapsulation.

Yep, they don't. So?

-- 
Tomek



Re: full ident name without mangle/demange?

2011-03-09 Thread Tomek Sowiński
Nick Sabalausky napisał:

 Is there a way to get the fully-qualified name of an identifier without 
 doing demange( mangledName!(foo) )?

Heh, looks like there isn't. It may be worth filing an enhancement request for 
__traits(fullyQualifiedName, foo).

BTW, what do you need it for?

-- 
Tomek



Re: Google Summer of Code 2011 application

2011-03-08 Thread Tomek Sowiński
Andrei Alexandrescu napisał:

 I just submitted an application for GSoC 2011 on behalf of Digital Mars. 
 Please review and contribute to the project ideas page:
 
 http://prowiki.org/wiki4d/wiki.cgi?GSOC_2011_Ideas

Please throw in database interfacing.

Does putting up XML mean I should stop working on it?

-- 
Tomek



Re: Haskell infix syntax

2011-03-07 Thread Tomek Sowiński
Jonathan M Davis napisał:

  As a feature of its own, it's just sugar. But if introducing infix
  operators were contingent on banishing classic operator overloading, then
  it is worthwhile.  
 
 LOL. And _what_ benefit would banishing classic operator overloading have?

I've worked on a financial system written in Java which used BigDecimal 
extensively. And, of course, I LOLed at that. But after having spent time with 
the code, a few benefits surfaced. It was clear which function was 
user-implemented. Displaying the docs by mousing over was nice too (outside the 
IDE grepping 'add' is easier than '+'). And above all, no abuse whatsoever. It 
all didn't outweigh the loss in terseness of syntax but did make up for some of 
it.

I'm bringing up this case because it's extremely in favour of operator 
overloading. Java is not big on number crunching and BigDecimal is one of the 
few spots on the vast programming landscape where overloaded operators make 
sense. And yet, the final verdict was: it doesn't suck.

 A function named add could be abused in _exactly_ the same ways that + can be.

There's far less incentive for abuse as there's no illusory mathematical 
elegance to pursue.

 The main benefit that infix syntax would provide would be if you had a 
 variety of 
 mathematical functions beyond what the built in operators give you, and you 
 want 
 to be able to treat them the same way. Whether classic operator overloading 
 exists or not is irrelevant.

That's mixing vect1 + vect2 with vect1 `dot` vect2. I'd rather see them treated 
the same way.

 Regardless, I don't think that adding infix syntax to the language is worth 
 it. D 
 is already pretty complicated and _definitely_ more complicated than most 
 languages out there. One of the major complaints of C++ is how complicated it 
 is. We don't want to be adding extra complexity to the language without the 
 benefit outweighing that complexity, and I don't think that it's at all clear 
 that it does in this case.

I agree. Hence the idea of trading operator overloading for infixing. The added 
complexity is zero, if not less.

 As as KennyTM~ pointed out, if UFCS is ever 
 implemented, it gives you most of the benefit of this anyway, and there are 
 already a lot of people around here interested in UFCS. So, I find it _far_ 
 more 
 likely that UFCS gets implemented than an infix function call syntax.

I also think it is more probable.

-- 
Tomek



Re: Haskell infix syntax

2011-03-07 Thread Tomek Sowiński
Caligo napisał:

 With C++, for example, Eigen uses expression templates.  How does one do
 expression templates in D? Could someone rewrite this
 http://en.wikipedia.org/wiki/Expression_templates this D?

You may look at my approach for QuantLibD.

http://dsource.org/projects/quantlibd/browser/ql/math/matrix.d

Mind you, project suspended.

-- 
Tomek



LIFO refrigerators

2011-03-06 Thread Tomek Sowiński
Daniel Gibson napisał:

 You'd need a fridge with two doors: one in the front, one in the back. Insert
 new food in the front, get food to eat from the back (or the other way round).
 But reinsert opened food in the back (or, in the alternative case, in the 
 front).

Or a cylinder-shaped refrigerator with rotating food shelves. Put new stuff in 
the front and turn the shelf slightly clockwise to expose oldest food for 
eating.

Ain't circular buffers yummy?

-- 
Tomek (the patent holder ;-)




Re: Haskell infix syntax

2011-03-06 Thread Tomek Sowiński
bearophile bearophile napisał:

 Haskell is full of function calls, so the Haskell designers have 
 used/invented several different ways to avoid some parenthesys in the code.
 
 From what I've seen if you remove some parenthesis well, in the right places, 
 the resulting code is less noisy, more readable, and it has less chances to 
 contain a bug (because syntax noise is a good place for bugs to hide).
 
 One of the ways used to remove some parenthesys is a standard syntax that's 
 optionally usable on any dyadic function (function with two arguments):
 
 sum a b = a + b
 
 sum 1 5 == 1 `sum` 5
 
 The `name` syntax is just a different way to call a regular function with two 
 arguments.
 
 In Haskell there is also a way to assign an arbitrary precedence and 
 associativity to such infix operators, but some Haskell programmers argue 
 that too much syntax sugar gives troubles ( 
 http://www.haskell.org/haskellwiki/Use_of_infix_operators ).
 
 In D the back tick has a different meaning, and even if in D you use a 
 different syntax, like just a $ prefix, I don't know how much good this 
 syntax is for D:
 
 int sum(int x, int y) { return x + y; }
 
 int s = sum(1, sum(5, sum(6, sum(10, 30;
 Equals to (associativity of $ is fixed like this):
 int s = 1 $sum 5 $sum 6 $sum 10 $sum 30;
 
 So I think it's not worth adding to D.

I vaguely recall someone mentioned infixablility by naming convention.

int _add_(int x, int y);

int s = 1 _add_ 5 _add_ 10;

As a feature of its own, it's just sugar. But if introducing infix operators 
were contingent on banishing classic operator overloading, then it is 
worthwhile.

-- 
Tomek



Re: uniqueness propagation

2011-02-25 Thread Tomek Sowiński
Robert Jacques napisał:

 On Fri, 25 Feb 2011 02:48:01 -0500, Kevin Bealer  
 kevindangerbea...@removedanger.gmail.com wrote:
  I think immutable could benefit from a Value Range Propagation-like  
  uniqueness
 
 'unique' has been proposed and heavily discussed before in the news group.  
 There even is std.typecons.Unique. Unfortunately, Walter has stated that  
 there are issues/difficulties in adding 'unique' to the language.

What were those difficulties?

-- 
Tomek



Re: Should conversion of mutable return value to immutable allowed?

2011-02-24 Thread Tomek Sowiński
Ali Çehreli napisał:

 Implicit conversions to immutable in the following two functions feel 
 harmless. Has this been discussed before?
 
 string foo()
 {
  char[] s;
  return s; // Error: cannot implicitly convert expression
//(s) of type char[] to string
 }
 
 string bar()
 {
  char[] s;
  return s ~ s; // Error: cannot implicitly convert expression
//(s ~ s) of type char[] to string
 }
 
 Is there a reason why that's not possible? I am sure there must be other 
 cases that at least I would find harmless. :)

Indeed. The returned object can be safely set to stone when its only aliases to 
the outside world point to immutable data. Such a guarantee is expressed in 
today's language by marking the function pure and all its arguments immutable. 
The conversion is currently not allowed as the above virtue of immutably pure 
functions was discovered not too long ago.

If you want it, vote up:
http://d.puremagic.com/issues/show_bug.cgi?id=5081

-- 
Tomek



Re: Do findSplit, findSplitBefore, and findSplitAfter make until unnecessary?

2011-02-20 Thread Tomek Sowiński
Jonathan M Davis napisał:

 Does anyone have a good reason why the findSplit* functions don't make until 
 obsolete and unnecessary?

Until is lazy, findSplit* are not.

-- 
Tomek



Re: 'live' testing style

2011-02-14 Thread Tomek Sowiński
spir napisał:

 * Why isn't testList a unittest block?
 
 Using named funcs, I can switch on  off specific test suites by 
 (un)commenting 
 their call from the main and unique unittest block. Else, either they all 
 run, 
 or none. During development, I only keep active the test func(s) relative to 
 the feature I'm currently working on.
 Remedy: named unittests.

The interesting thing about named unit tests is that their names aren't 
interesting at all. They are usually dull and forced; testing filterFoo will be 
called testFilterFoo, etc. Their only purpose is to suppress running of 
unrelated tests.

Now, there is a seemingly unrelated proposal to include every ddoc'ed unit test 
in the preceding declaration as an example. This is great because it implies 
ownership -- a unit test is 'owned' by the symbol above. Going further, it can 
also be named after its owner.

module ooh;

void foo();

unittest { test foo... }

Compiling with --unittest=ooh.foo runs this unittest only. Nested control as a 
bonus: compiling with --unittest=ooh runs only the tests in module ooh.

So there you go, named unit tests without naming.

-- 
Tomek



Re: assert(expression, error)

2011-02-12 Thread Tomek Sowiński
spir napisał:

 Is there a way to specify what error to throw using (a variant of) assert:
  assert(n  0, new ValueError(...));
 
 (Sure, one can write:
  if (n = 0)
  throw new ValueError(...));
 but the same remark applies to plain assert: the whole point of assert is to 
 have it as builtin feature with clear application field  well-known 
 semantics, 
 shared by the community of D programmers.)

With built-in assert, no. But std.exception can do it.

enforce(n  0, new ValueError(...));

-- 
Tomek



Re: 0nnn octal notation considered harmful

2011-02-11 Thread Tomek Sowiński
spir napisał:

 Just had a strange bug --in a test func!-- caused by this notation. This is 
 due 
 in my case to the practice (common, I guess) of pretty printing int numbers 
 using %0nd or %0ns format, to get a nice alignment. Then, if one feeds back 
 results into D code, they are interpreted as octal...
 Now, i know it: will pad with spaces instead ;-)
 
 Copying a string'ed integer is indeed not the only this notation is 
 bug-prone: 
 prefixing a number with '0' should not change its value (!). Several 
 programming languages switched to another notation; like 0onnn, which is 
 consistent with common hex  bin notations and cannot lead to 
 misinterpretation. Such a change would be, I guess, backward compatible; and 
 would not be misleading for C coders.

This has been discussed before. There's octal!123 in Phobos if you don't like 
these confusing literals but they stay because Walter likes them. 

-- 
Tomek



Re: Assert compilation failure with certain message

2011-02-11 Thread Tomek Sowiński
Andrej Mitrovic napisał:

 I've managed to screw up the colon placement though, here's a quick fix:
 
 import std.stdio;
 import std.conv;
 
 void staticAssert(alias exp, string message, string file = __FILE__,
 int line = __LINE__)()
 {
 static if (!exp)
 {
 pragma(msg, file ~ ( ~ to!string(line) ~ ):  ~
 staticAssert:  ~  to!string(message));
 assert(0);
 }
 }
 
 void main()
 {
 enum x = false;
 staticAssert!(x, Oh no we failed!);
 
 int y;
 }

How does it help to find out that compilation tripped on a specific static 
assertion?

-- 
Tomek



Re: std.concurrency immutable classes...

2011-02-11 Thread Tomek Sowiński
Steven Schveighoffer napisał:

  It would be much easier if he provided the specific case(s) which broke  
  his teeth. Then we'll all know where's the problem. If it's soluble,  
  it'll open the door to tail type modifiers in general, not just in  
  classes. It's a burning issue e.g. with ranges (mostly struct).
 
  http://d.puremagic.com/issues/show_bug.cgi?id=5377
 
  Look at the attachment to get a feel of what hoops we'll have to jump  
  through to side-step lack of tail X.  
 
 I've worked through this very same problem (a few months back), thinking  
 that we need a general solution to tail-const.  The large issue with  
 tail-const for structs in the general case is that you cannot control the  
 type of 'this'.  It's always ref.  This might seem like a very  
 inconsequential detail, but I realized that a ref to X does not implicitly  
 convert to a ref to a tail-const X.  This violates a rule of two  
 indirections, in which case you are not able to implicitly convert the  
 indirect type, even if the indirect type would implicitly convert outside  
 the reference.

 A simple example, you cannot convert an int** to a const(int)**.  Reason  
 being, then you could change the indirect pointer to point to something  
 that's immutable, and the original int ** now points to immutable data.

I tried to understand this on an example and now I'm even more confused. :)

int* p;
int** pp = p;
const(int)** cpp = pp;  // compiles fine
immutable int i = 7;
*cpp = i;
**pp = 5;  // mutate the immutable
writeln(cpp, ' ', pp);
writeln(*cpp, ' ', *pp, ' ', i);
writeln(**cpp, ' ', **pp, ' ', i);

The output is interesting:

12FE08 12FE08
12FE14 12FE14 12FE14
5 5 7

So even they all point to i at the end, it remains unchanged. What gives? 
Register caching? It doesn't matter as the int** to a const(int)** conversion 
should fail in the first place, but I'm curious...

 The same is for tail-const structs, because you go through one ref via  
 'this' and the other ref via the referring member.
 
 What does this all mean?  It basically means that you have to define  
 *separate* functions for tail-const and const, and separate functions for  
 tail-immutable and immutable.  This is untenable.

I, from the very first discussions, assumed tail-const functions are 
inevitable. You define empty() as const but popFront() as tail-const. Feels 
natural.

 You might ask why doesn't this problem occur with tail-const arrays?,  
 well because you *don't pass them by ref*.  With structs we have no choice.
 
 I think what we need is a way to define two different structs as being the  
 tail-const version of the other, with some compiler help, and then we do  
 not need to define a new flavor of const functions.  We still need to  
 define these tail-const functions, but it comes in a more understandable  
 form.  But importantly, the implicit cast makes a *temporary* copy of the  
 struct, allowing the cast to work.

I'd like to understand it better. How would you define with this scheme, say, a 
range on a const collection, to which ranges on an (im)mutable collection are 
implicitly convertible? 

-- 
Tomek



Re: std.concurrency immutable classes...

2011-02-10 Thread Tomek Sowiński
Michel Fortin napisał:

  Thanks for doing this. Is it approved by Walter?  
 
 Depends on what you mean by approved.
 
 He commented once on the newsgroup after I posted an earlier version of 
 the patch, saying I should add tests for type deduction and some other 
 stuff. This change his something he attempted to do in the past and 
 failed, I expect him to be skeptical.

It would be much easier if he provided the specific case(s) which broke his 
teeth. Then we'll all know where's the problem. If it's soluble, it'll open the 
door to tail type modifiers in general, not just in classes. It's a burning 
issue e.g. with ranges (mostly struct).

http://d.puremagic.com/issues/show_bug.cgi?id=5377

Look at the attachment to get a feel of what hoops we'll have to jump through 
to side-step lack of tail X.

 I guess he'll review it when he 
 has the time and I hope he'll merge these changes in the mainline. 
 He'll probably want to take his time however, since it can break 
 existing code in some cases; it's basically a change to the language.
 
 If you want to show your support, I guess you can vote up the 
 enhancement request in the bugzilla.
 http://d.puremagic.com/issues/show_bug.cgi?id=5325
 
 Also feel free to compile it, test it, and share your experience. The 
 more tested it is, the more used and appreciated it is, the more 
 exposure it gets, the sooner it gets approved, or so I guess.

I'd love to, but I'm putting shreds of my spare time to xml.

-- 
Tomek



Assert compilation failure with certain message

2011-02-10 Thread Tomek Sowiński
Is there a way to statically assert compilation of an expression failed *with a 
certain message*? I want to check my static asserts trip when they should.

-- 
Tomek


Re: Assert compilation failure with certain message

2011-02-10 Thread Tomek Sowiński
bearophile napisał:

  Is there a way to statically assert compilation of an expression failed 
  *with a certain message*? I want to check
  my static asserts trip when they should.  
 
 I have asked something like this a lot of time ago, but I don't know a way to 
 do it. You are able to statically
 assert that some code doesn't compile, but I don't know how to assert that a 
 certain message gets produced. You are
 asking for a specific static catch :-)

Static catch, yeah. But I'd be content with traits__(fails, expr, msg) which 
seems tractable.

-- 
Tomek



Re: High performance XML parser

2011-02-09 Thread Tomek Sowiński
Steven Schveighoffer napisał:

 OK, so you mean a buffer other than the I/O buffer.  This means double  
 buffering data.  I was thinking of a solution that allows simply using the  
 I/O buffer for parsing.  I think this is one of the keys to Tango's xml  
 performance.

I'd be glad to hear what's your idea. I think they are convergent. In mine, the 
I/O could be asked to dump data to the iterator's buffer at a given position 
(right to previous nodes), then the iterator forms a node out of raw data. Some 
moving would be done but all within the cached buffer so should be quick. I 
guess it's as far as I can predict performance in a newsgroup post. ;-) Gotta 
write some code and whip out the stopwatch, then we'll see.

-- 
Tomek



Re: Efficient outputting of to-string conversions

2011-02-08 Thread Tomek Sowiński
Andrei Alexandrescu napisał:

  I know about Steven's proposal but it applies only to user types not 
  primitives. Either way std.conv.to would need a buffered output range as 
  integers are written from the right. Any chance for an abstraction 
  analogous to buffered input ranges discussed recently?  
 
 Generally I found it more difficult to define a solid output buffer 
 abstraction. This is a great motivating example though.
 
 To my surprise, an API of the same form seems to be what the doctor 
 prescribed. Here's a semi-formal definition:
 
 A buffered output range R is defined as such:
 
 R.front returns the currently uncommitted buffer of type T[]
 
 R.moreFront(n) makes n more elements available for writing
 
 R.commitFront(n) writes the first n elements in front()
 
 R.flushFront() writes the buffer currently held in front() and makes 
 another buffer available (initially empty).

I was thinking along the same lines. There's one missing:

R.skipFront(n) skips the first n elements without outputting

Why? Look at integral conversions in std.conv.to. It first calculates maximum 
string size, then writes numbers to the char array back to front, then returns 
result[$ - ndigits .. $] where ndigits is how long the string turned out.

Returning to Steven's DIP, I think writeTo should take the above rather than 
void delegate(char[]). With the latter you still have to allocate the pieces. 
Our buffered output range is friends with polymorphism too. If you set T=char, 
its API is devoid of generics. Such interface can be placed in object.d with an 
official blessing.

-- 
Tomek



Re: Efficient outputting of to-string conversions

2011-02-08 Thread Tomek Sowiński
Andrei Alexandrescu napisał:

 For the latter, Tomek's idea of passing an output range as 
 an optional second parameter seems appropriate. Please file as an 
 enhancement to bugzilla. If anyone has time to work on this, please do. 
 If not, I'll work on it as my schedule allows.

http://d.puremagic.com/issues/show_bug.cgi?id=5548

-- 
Tomek



Re: Please reply to this to vote to collectExceptionMsg in std.unittests

2011-02-08 Thread Tomek Sowiński
Andrei Alexandrescu napisał:

 Reply here to vote ONLY for the function collectExceptionMsg in Jonathan 
 M Davis's std.unittests. Vote closes on Tue Feb 15.

I'm in two minds. Since Jonathan has improved collectException the proposed 
function is just a short-hand for:

auto e = collectException!MyException(expression);
assert (e);
assert (e.msg == ...);

or:

assert (collectException!MyException(expression) == new MyException(msg));

I would use these because of the possibility to test properties other than 
.msg. Also, there's an ambiguity ex.msg is null vs. didn't throw.

But perhaps msg is important enough to deserve a dedicated wrapper. I'll vote 
in favour, given that the docs are shrunk to something like:

Convenience function for extracting the exception's message. Equivalent of:
---
auto e = collectException(mayThrow);
string msg = e ? e.msg : null;
---

And put a link to collectException.

-- 
Tomek



Re: High performance XML parser

2011-02-08 Thread Tomek Sowiński
Steven Schveighoffer napisał:

  The design I'm thinking is that the node iterator will own a buffer. One  
  consequence is that the fields of the current node will point to the  
  buffer akin to foreach(line; File.byLine), so in order to lift the input  
  the user will have to dup (or process the node in-place). As new nodes  
  will be overwritten on the same piece of memory, an important trait of  
  the design emerges: cache intensity. Because of XML namespaces I think  
  it is necessary for the buffer to contain the current node plus all its  
  parents.  
 
 That might not scale well.  For instance, if you are accessing the 1500th  
 child element of a parent, doesn't that mean that the buffer must contain  
 the full text for the previous 1499 elements in order to also contain the  
 parent?
 
 Maybe I'm misunderstanding what you mean.

Let's talk on an example:

a name=value
b
Some Text 1
c2  !-- HERE --
Some text 2
/c2
Some Text 3
/b
/a

The buffer of the iterator positioned HERE would be:

[Node a | Node b | Node c2]

Node c2 and all its parents are available for inspection. Node a's attribute is 
stored in the buffer, but not b's Some text 1 as it is c2's sibling; Some 
text 1 was available in the previous iteration, now it's overwritten by c2. To 
get to Some text 2 let's advance the iterator in depth to get:

[Node a | Node b | Node c2 | Text node Some text 2]

Advancing it once more we get to:

[Node a | Node b | Text node Some text 3]

So Some text 3 is written where c2 and the text node 2 used to be.

The element type of the range would always be the child, parents available 
through pointers:

foreach (node; xmlRange) {
doStuff(node);
if (Node* parent = node.parent)
doOtherStuff(parent);
}

Having no access to siblings is quite limiting but the iterator can form an 
efficient (zero-allocation) basis on which more convenient schemes are built 
upon. It's still just brain-storming, though. I fear there's something that'll 
make the whole idea crash  burn.

 I would start out with a non-compliant parser, but one that allocates  
 nothing beyond the I/O buffer, one that simply parses lazily and can be  
 used as well as a SAX parser.  Then see how much extra allocations we need  
 to get it to be compliant.  Then, one can choose the compliancy level  
 based on what performance penalties one is willing to incur.

Yeah, 100% compliance is a long way.

-- 
Tomek



Efficient outputting of to-string conversions

2011-02-07 Thread Tomek Sowiński
Looks like std.conv.to always allocates behind the scenes. It's a shame as the 
returned string is immediately processed and discarded in my XML writer. Are 
there plans to include a custom output variant, e.g. to!string(7, outputRange)?

-- 
Tomek



Re: Efficient outputting of to-string conversions

2011-02-07 Thread Tomek Sowiński
Jonathan M Davis napisał:

 On Monday 07 February 2011 13:10:09 Tomek Sowiński wrote:
  Looks like std.conv.to always allocates behind the scenes. It's a shame as
  the returned string is immediately processed and discarded in my XML
  writer. Are there plans to include a custom output variant, e.g.
  to!string(7, outputRange)?
 
 http://prowiki.org/wiki4d/wiki.cgi?LanguageDevel/DIPs/DIP9

I know about Steven's proposal but it applies only to user types not 
primitives. Either way std.conv.to would need a buffered output range as 
integers are written from the right. Any chance for an abstraction analogous to 
buffered input ranges discussed recently?

-- 
Tomek



Re: std.concurrency immutable classes...

2011-02-07 Thread Tomek Sowiński
Michel Fortin napisał:

 I just made this pull request today:
 https://github.com/D-Programming-Language/dmd/pull/
 
 If you want to test it, you're very welcome. Here is my development 
 branch for this feature:
 https://github.com/michelf/dmd/tree/const-object-ref

Thanks for doing this. Is it approved by Walter?

-- 
Tomek



Re: buffered input

2011-02-06 Thread Tomek Sowiński
Nick Sabalausky napisał:

 discard and fetch?

I like that.



Writing XML

2011-02-06 Thread Tomek Sowiński
While I'm circling the problem of parsing, I took a quick look at writing not 
to get stuck in analysis-paralysis. Writing XML is pretty independent from 
parsing and an order of magnitude easier to solve. It was perfect to get myself 
coding.

These are the guidelines I followed:

 * Memory minimalism: don't force allocating an intermediate node structure 
just to push a few tags down the wire.

 * Composability: operating on an arbitrary string output range.

 * Robustness: tags should not be left open, even if the routine producing tag 
interior throws.

 * Simplicity of syntax: resembling real XML if possible.

 * Space efficiency / readability: can write tightly (without indents and 
newlines) for faster network transfer and, having easy an means for temporary 
tight writing, for better readability.

 * Ease of use:
   - automatic to!string of non-string values,
   - automatic string escaping according to XML standard,
   - handle nulls: close the tags short (tag/), don't write attributes with 
null values at all.

 * anything else?


The new writer meets pretty much all of the above. Here's an example to get a 
feel of it:

auto books = [
Book([Name(Grębosz, Jerzy)], Pasja C++, 1999),
Book([Name(Navin, Robert, N.)], Mathemetics of Derivatives, 2007),
Book([Name(Tokarczuk, Olga)], Podróż ludzi Księgi, 1996),
Book([Name(Graham, Ronald, L.),
 Name(Knuth, Donald, E.),
 Name(Patashnik, Oren)], Matematyka Konkretna, 2008)
];

auto outputRange = ... ;
auto xml = xmlWriter(outputRange);

xml.comment(books.length,  favorite books of mine.);
foreach (book; books) {
xml.book(year, book.year, {
 foreach (author; book.authors) {
 xml.tight.authorName({
 xml.first(author.first);
 xml.middle(author.middle);
 xml.last(author.last);
 });
 }
 xml.tight.title(book.title);
});
}

- program output 
-

!-- 4 favorite books of mine. --
book year=1999
  authorNamefirstJerzy/firstmiddle/lastGrębosz/last/authorName
  titlePasja C++/title
/book
book year=2007
  
authorNamefirstRobert/firstmiddleN./middlelastNavin/last/authorName
  titleMathemetics of Derivatives/title
/book
book year=1996
  authorNamefirstOlga/firstmiddle/lastTokarczuk/last/authorName
  titlePodróż ludzi Księgi/title
/book
book year=2008
  
authorNamefirstRonald/firstmiddleL./middlelastGraham/last/authorName
  
authorNamefirstDonald/firstmiddleE./middlelastKnuth/last/authorName
  authorNamefirstOren/firstmiddle/lastPatashnik/last/authorName
  titleMatematyka Konkretna/title
/book


Questions and comments?

-- 
Tomek



Re: buffered input

2011-02-06 Thread Tomek Sowiński
Andrei Alexandrescu napisał:

  Also: could a (truely) circular buffer help  solve the above copy
  problem, concretely?  
 
 Not if you want infinite lookahead, which I think is what any modern 
 buffering system should offer.

Truely circular, probably not, but a wrap-around slice (circular view of length 
at most underlying.length) does offer that and solves the copy problem with 
style.

-- 
Tomek



Re: Writing XML

2011-02-06 Thread Tomek Sowiński
Rainer Schuetze napisał:

 This looks nice and compact Using opDispatch to specify the tag (I guess 
 that is what you are using to create a tag book by calling xml.book()) 
 feels like misusing opDispatch, though. Does it add readability in 
 contrast to passing the tag as a string to some function?
 
 How do you write a tag named tight? Or a tag calculated at runtime?

xml.tag(tight, attributes..., { make content });
 
That's the base implementation. opDispatch is just syntax sugar over it.

 Something more conventional would be
 
   xml.tag(book, attr(year, book.year), { ...
 
 but I'm not sure that pairing the attribute name and value adds 
 readability or mere noise.

Putting name and value without a wrapper tuple is just sugar. Having some sort 
of structure representing an attribute is inevitable as we come at namespaces. 
In the end it should accept any range of (namespace-)name-value tuples as 
attributes.

-- 
Tomek



std.concurrency immutable classes...

2011-02-06 Thread Tomek Sowiński
... doesn't work.

class C {}
thisTid.send(new immutable(C)());
receive((immutable C) { writeln(got it!); });

This throws: 
core.exception.AssertError@/usr/include/d/dmd/phobos/std/variant.d(285): 
immutable(C)

And when I go for Rebindable, I get Aliases to mutable thread-local data not 
allowed..

Is there anything I can do?

Overall, I think that's another reason D needs native tail const badly. 
Polymorphic classes are close to being second class citizens just as soon const 
enters. :(

-- 
Tomek



Re: buffered input

2011-02-05 Thread Tomek Sowiński
Andrei Alexandrescu napisał:

 I hereby suggest we define buffered input range of T any range R that 
 satisfies the following conditions:
 
 1. R is an input range of T[]
 
 2. R defines a primitive shiftFront(size_t n). The semantics of the 
 primitive is that, if r.front.length = n, then shiftFront(n) discards 
 the first n elements in r.front. Subsequently r.front will return a 
 slice of the remaining elements.
 
 3. R defines a primitive appendToFront(size_t n). Semantics: adds at 
 most n more elements from the underlying stream and makes them available 
 in addition to whatever was in front. For example if r.front.length was 
 1024, after the call r.appendToFront(512) will have r.front have length 
 1536 of which the first 1024 will be the old front and the rest will be 
 newly-read elements (assuming that the stream had enough data). If n = 
 0, this instructs the stream to add any number of elements at its own 
 discretion.

I don't see a clear need for the two to be separate. Could they fold into 
popFront(n, m) meaning shiftFront(n); appendToFront(m) ? Nullary popFront() 
discards all and loads any number it pleases.

 This is it. I like many things about this design, although I still fear 
 some fatal flaw may be found with it.

 With these primitives a lot of good operating operating on buffered 
 streams can be written efficiently. The range is allowed to reuse data 
 in its buffers (unless that would contradict language invariants, e.g. 
 if T is invariant), so if client code wants to stash away parts of the 
 input, it needs to make a copy.

Some users would benefit if they could just pass in a buffer and say fill'er 
up.

 One great thing is that buffered ranges as defined above play very well 
 with both ranges and built-in arrays - two quintessential parts of D. I 
 look at this and say, this all makes sense. For example the design 
 could be generalized to operate on some random-access range other than 
 the built-in array, but then I'm thinking, unless some advantage comes 
 about, why not giving T[] a little special status? Probably everyone 
 thinks of contiguous memory when thinking buffers, so here 
 generalization may be excessive (albeit meaningful).

Contiguous, yes. But I'd rather see front() exposing, say, a circular buffer so 
that appendToFront(n) reallocates only when n  buf.length.

-- 
Tomek



Re: buffered input

2011-02-05 Thread Tomek Sowiński
Tomek Sowiński napisał:

 Contiguous, yes. But I'd rather see front() exposing, say, a circular buffer 
 so that appendToFront(n) 
 reallocates only when n  buf.length.

I meant: when n + front.length  buf.length.

-- 
Tomek



Re: buffered input

2011-02-05 Thread Tomek Sowiński
Andrei Alexandrescu napisał:

 Transparent buffering sounds sensible but in fact it robs you of 
 important capabilities. It essentially forces you to use grammars with 
 lookahead 1 for all input operations. Being able to peek forward into 
 the stream without committing to read from it allows you to e.g. do 
 operations like does this stream start with a specific word etc. As soon

Broken sentence?



Re: buffered input

2011-02-05 Thread Tomek Sowiński
Andrei Alexandrescu napisał:

  I don't see a clear need for the two to be separate. Could they fold
  into popFront(n, m) meaning shiftFront(n); appendToFront(m) ? Nullary
  popFront() discards all and loads any number it pleases.  
 
 I think combining the two into one hurts usability as often you want to 
 do one without the other.

OK, but if you go this way, what would popFront() do?

  Some users would benefit if they could just pass in a buffer and say
  fill'er up.  
 
 Correct. That observation applies to unbuffered input as well.

Right.

  Contiguous, yes. But I'd rather see front() exposing, say, a circular
  buffer so that appendToFront(n) reallocates only when n
  buf.length.
   
 
 I think circularity is an implementation detail that is poor as a 
 client-side abstraction.

I fear efficiency will get abstracted out. Say this is my internal buffer 
(pipes indicate front() slice):

[ooo|oo|oo]

Now I do appendToFront(3) -- how do you expose the expected front() without 
moving data?

-- 
Tomek



Re: buffered input

2011-02-05 Thread Tomek Sowiński
Jean Crystof napisał:

 I find this discussion interesting. There's one idea for an application I'd 
 like to try at some point. Basically a facebook chat thingie, but with richer 
 gaming features. The expected audience will be 10 - 100K simultaneous clients 
 connecting to a single server. Not sure if DOM or SAX will be better. After 
 seeing the Tango's XML benchmarks I was convinced that the implementation 
 platform will be D1/Tango, but now it looks like Phobos is also getting 
 there, propably even outperforming Tango by a clear margin.

Thanks for having faith ;-)

 Since even looking at Tango's documentation has intellectual property 
 problems and likely causes taint, I could make an independent benchmark 
 comparing the two and their interfaces later. But I propaply need to avoid 
 going into too much details, otherwise the Phobos developers wouldn't be able 
 to read it without changing their license. 

That would be helpful.

 From what I've read so far, the proposed design looks very much like what 
 Tango has now in their I/O framework. But probably Phobos's TLS default and 
 immutable strings improve multithreaded performance even more.

Well, immutability doesn't help much because a buffer must be written to.

Speaking of multithreading, I was thinking of an implementation where an 
internal thread is doing I/O. It loads data in front of the current front() 
slice, as much as the internal buffer can hold. The motivation is to overlap 
content processing and I/O operations so that less time is spent in total. 
Although there is some interaction overhead: locking, syncing caches so that 
cores see the same buffer.

-- 
Tomek



Re: buffered input

2011-02-05 Thread Tomek Sowiński
Andrei Alexandrescu napisał:

  I fear efficiency will get abstracted out. Say this is my internal buffer 
  (pipes indicate front() slice):
 
  [ooo|oo|oo]
 
  Now I do appendToFront(3) -- how do you expose the expected front() without 
  moving data?  
 
 You do end up moving data, but proportionally little if the buffer is 
 large enough.

It still matters for frequent big munches. I'd like a minimum memory option if 
that's neccessary.

-- 
Tomek



High performance XML parser

2011-02-04 Thread Tomek Sowiński
I am now intensely accumulating information on how to go about creating a 
high-performance parser as it quickly became clear that my old one won't 
deliver. And if anything is clear is that memory is the key.

One way is the slicing approach mentioned on this NG, notably used by RapidXML. 
I already contacted Marcin (the author) to ensure that using solutions inspired 
by his lib is OK with him; it is. But I don't think I'll go this way. One 
reason is, surprisingly, performance. RapidXML cannot start parsing until the 
entire document is loaded and ready as a random-access string. Then it's 
blazingly fast but the time for I/O has already elapsed. Besides, as Marcin 
himself said, we need a 100% W3C-compliant implementation and RapidXML isn't 
one.

I think a much more fertile approach is to operate on a forward range, perhaps 
assuming bufferized input. That way I can start parsing as soon as the first 
buffer gets filled. Not to mention that the end result will use much less 
memory. Plenty of the XML data stream is indents, spaces, and markup -- there's 
no reason to copy all this into memory.

To sum up, I belive memory and overlapping I/O latencies with parsing effort 
are pivotal.

Please comment on this.

-- 
Tomek



Re: David Simcha's std.parallelism

2011-02-04 Thread Tomek Sowiński
dsimcha napisał:

 I could move it over to github, though I'll wait to do that until I get 
 a little more comfortable with Git.  I had never used Git before until 
 Phobos switched to it.  In the mean time, to remind, the code is at:
 
 http://dsource.org/projects/scrapple/browser/trunk/parallelFuture/std_parallelism.d
 
 The docs are at:
 
 http://cis.jhu.edu/~dsimcha/d/phobos/std_parallelism.html

Please run the docs through a spell-checker, there are a few typos:

asyncBuf() - for ecample
stop() - waitied
lazyMap() - Parameters;

But I think it's good overall. These primitives are in demand.

-- 
Tomek



Re: High performance XML parser

2011-02-04 Thread Tomek Sowiński
Michel Fortin napisał:

 I agree it's important, especially when receiving XML over the network, 
 but I also think it's important to be able to be able to support 
 slicing. Imagine all the memory you could save by just making slices of 
 a memory-mapped file.
 
 The difficulty is to support both models: the input range model which 
 requires copying the strings and the slicing model where you're just 
 taking slices of a string.

These are valid concerns. Yet, in overwhelming majority XML documents come from 
hard-drive and network -- these are the places we need to drill. I fear that 
trying to cover every remote use case will render the library incomprehensible.

-- 
Tomek



Re: High performance XML parser

2011-02-04 Thread Tomek Sowiński
Steven Schveighoffer napisał:

 Here is how I would approach it (without doing any research).
 
 First, we need a buffered I/O system where you can easily access and  
 manipulate the buffer.  I have proposed one a few months ago in this NG.
 
 Second, I'd implement the XML lib as a range where front() gives you an  
 XMLNode.  If the XMLNode is an element, it will have eager access to the  
 element tag, and lazy access to the attributes and the sub-nodes.  Each  
 XMLNode will provide a forward range for the child nodes.
 
 Thus you can skip whole elements in the stream by popFront'ing a range,  
 and dive deeper via accessing the nodes of the range.
 
 I'm unsure how well this will work, or if you can accomplish all of it  
 without reallocation (in particular, you may need to store the element  
 information, maybe via a specialized member function?).

Heh, yesterday when I couldn't sleep I was sketching the design. I converged to 
a pretty much same concept, so your comment is reassuring :).

The design I'm thinking is that the node iterator will own a buffer. One 
consequence is that the fields of the current node will point to the buffer 
akin to foreach(line; File.byLine), so in order to lift the input the user will 
have to dup (or process the node in-place). As new nodes will be overwritten on 
the same piece of memory, an important trait of the design emerges: cache 
intensity. Because of XML namespaces I think it is necessary for the buffer to 
contain the current node plus all its parents. Namespaces are the technical 
reason but having access to the path all the way to the root node is of value, 
regardless. This suggests mark-release memory management. The buffer will have 
to be long enough to fit the deepest tag sequence: theoretically infinite, not 
that much in practice. Like I said, the buffer will be owned by the iterator so 
probably deterministic deallocation is possible when the processing is done.

One drawback is that you won't know you're dealing with a well-formed DOM until 
the closing tag comes. If it doesn't, it'll of course throw, but the malformed 
DOM may already have been digested. So providing some rollback possibility is 
up to the user.

-- 
Tomek



Re: High performance XML parser

2011-02-04 Thread Tomek Sowiński
 Steven Schveighoffer napisał:
 
  Here is how I would approach it (without doing any research).
  
  First, we need a buffered I/O system where you can easily access and  
  manipulate the buffer.  I have proposed one a few months ago in this NG.
  
  Second, I'd implement the XML lib as a range where front() gives you an  
  XMLNode.  If the XMLNode is an element, it will have eager access to the  
  element tag, and lazy access to the attributes and the sub-nodes.  Each  
  XMLNode will provide a forward range for the child nodes.
  
  Thus you can skip whole elements in the stream by popFront'ing a range,  
  and dive deeper via accessing the nodes of the range.
  
  I'm unsure how well this will work, or if you can accomplish all of it  
  without reallocation (in particular, you may need to store the element  
  information, maybe via a specialized member function?).
 
 Heh, yesterday when I couldn't sleep I was sketching the design. I converged 
 to a pretty much same concept, so your comment is reassuring :).
 
 The design I'm thinking is that the node iterator will own a buffer. One 
 consequence is that the fields of the current node will point to the buffer 
 akin to foreach(line; File.byLine), so in order to lift the input the user 
 will have to dup (or process the node in-place). As new nodes will be 
 overwritten on the same piece of memory, an important trait of the design 
 emerges: cache intensity. Because of XML namespaces I think it is necessary 
 for the buffer to contain the current node plus all its parents. Namespaces 
 are the technical reason but having access to the path all the way to the 
 root node is of value, regardless. This suggests mark-release memory 
 management. The buffer will have to be long enough to fit the deepest tag 
 sequence: theoretically infinite, not that much in practice. Like I said, the 
 buffer will be owned by the iterator so probably deterministic deallocation 
 is possible when the processing is done.
 
 One drawback is that you won't know you're dealing with a well-formed DOM 
 until the closing tag comes. If it doesn't, it'll of course throw, but the 
 malformed DOM may already have been digested. So providing some rollback 
 possibility is up to the user.
 
Oh, and the direction of iteration (deeper/farther) will of course be 
controllable in fashion you presented.

-- 
Tomek



Re: A monitor for every object

2011-02-04 Thread Tomek Sowiński
Steven Schveighoffer napisał:

 D's monitors are lazily created, so there should be no issue with resource  
 allocation.  If you don't ever lock an object instance, it's not going to  
 consume any resources.  Most of the time the extra word isn't noticed  
 because the memory size of a class is usually not exactly a power of 2.

Except when you put'em in an array. Could happen.

 D also allows you to replace it's monitor with a custom monitor object  
 (i.e. core.sync.Mutex) so you can have more control over the mutex, assign  
 the same mutex to multiple objects, use conditions, etc.  It's much more  
 flexible than Java or C# IMO.

I didn't know, thx. Where is it documented?

-- 
Tomek



Re: A monitor for every object

2011-02-04 Thread Tomek Sowiński
Tomek Sowiński napisał:

  D's monitors are lazily created, so there should be no issue with resource  
  allocation.  If you don't ever lock an object instance, it's not going to  
  consume any resources.  Most of the time the extra word isn't noticed  
  because the memory size of a class is usually not exactly a power of 2.  
 
 Except when you put'em in an array. Could happen.

Sorry, for some reason I thought the mutex is on the stack.

-- 
Tomek



Re: std.xml should just go

2011-02-03 Thread Tomek Sowiński
Andrei Alexandrescu napisał:

  Is anyone tasked with a replacement yet? I had to write an XML parser at 
  some point. It's plenty of work bringing up to industrial quality, so I'd 
  have to know that before I dive in.  
 
 Nobody that I know of. If you want to discuss design here while working 
 on it, that would be great.

Alright, I'm game. I'll assemble something discussable.

 I could think of a few high-level requirements:

My requirements are similar. (if I don't comment below, then I agree)

 * works with input ranges so we can plug it in with any source
 
 * works with all UTF widths (statically selectable)
 
 * avoids where possible memory allocation (perhaps by offering 
 incremental access a la joiner())

What you mean by incremental access? A lazy range? It's obvious for the lexer, 
but on a higher level? Not sure if I can start traversing the DOM until the 
closing tag comes (if at all)... A lazy range of tags defined in the global 
scope seems possible, though.

 * avoids often-called delegates in favor of alias functions

What use case of delegates are you talking about?

 * is familiar in concept to people who've used today's successful XML 
 libraries

-- 
Tomek



Re: std.xml should just go

2011-02-03 Thread Tomek Sowiński
Jonathan M Davis napisał:

 I think that at least a couple of people have said that they have the 
 beginnings 
 of a replacement, but I don't believe that anyone has stepped up to say that 
 they'll actually complete and propose a module for inclusion in Phobos.

Wimps ;-)

 So, std.xml is still very much up in the air, and Tango has set a very high 
 bar 
 with regards to speed. And while we may not be able to match Tango for speed 
 - 
 especially at first - we'd definitely like to have an xml solution that's 
 close. 
 And that's not necessarily going to be easy - especially since we're 
 inevitably 
 going to want a range-based solution. And while ranges can be quite 
 efficient, it 
 can also be easy to make them inefficient if you're not careful.

Speaking of Tango, may I look at it? I remember that beef over the first 
datetime and it gives me shivers...

-- 
Tomek



Re: std.xml should just go

2011-02-03 Thread Tomek Sowiński
Daniel Gibson napisał:

 They can claim whatever they want.. if Tomek says he only looked at the
 documentation (for an idea how a good interface for a XML lib may look like)
 they can hardly prove anything.

One remark: I haven't even looked at the doc. That's why I was asking may I 
look.

-- 
Tomek



Re: std.xml should just go

2011-02-03 Thread Tomek Sowiński
spir spir napisał:

  You probably shouldn't look at the source.
  I dunno about the interface (documentation) - it's certainly not illegal to 
  take
  inspiration from it, but maybe then people will again claim that source was
  stolen.. but when you claim that you haven't looked at the source it may be 
  ok..
 
  Maybe a clean-room approach is possible: Somebody else looks at the source 
  and
  documents what it does and how it does that (without copying anything) and 
  you
  could use that documentation for your own code.
  If you don't want to clone it but have questions about how they did 
  something
  specific you could just ask here and (hopefully) someone looks it up and
  explains it to you.  
 
 Mamma mia! In what world are we supposed to live!?

My thoughts exactly. I mean, as soon as Jonathan mentioned Tango's XML, I 
knee-jerkingly got paranoid and asked about legality of even reading about it 
to stay clear. I only hope having heard about it is legal.

-- 
Tomek



Max length of a LOC: poll results (Was: On 80 columns...)

2011-01-31 Thread Tomek Sowiński
Tomek Sowiński napisał:

 Actually that's a splendid idea. Let's take it easy. Regardless of that silly 
 beef I'm really curious what distribution will emerge.
 
 What is your preferred *maximum* length for a line of D code? (please reply 
 with a number only)
 
Alright, I'm wrapping up this toy study. Two things before the numbers come:

 - A few respondents gave 2 numbers, one reasonable, the other if I really 
have to. I took the latter (larger) number as I was after maximum length, 
something usable as a setting for a repository hook.
 - 2 respondents said no limit. I excluded them from computations, albeit 
it's a valid answer. 1 respondent answered 1 mole which I also excluded as a 
22-order-of-magnitude outlier.

 lengths = c(80, 80, 110, 120, 80, 80, 100, 100, 120, 110, 90)
 summary(lengths)
   Min. 1st Qu.  MedianMean 3rd Qu.Max. 
  80.00   80.00  100.00   97.27  110.00  120.00 

 sd(lengths)  # standard deviation
[1] 16.18080

 quantile(lengths, c(.1, .25, .5, .75, .9))
10% 25% 50% 75% 90% 
 80  80 100 110 120 

 library(moments)
 skewness(lengths)  # take with a grain of salt, little data
[1] 0.1645005

 length(lengths) # count
[1] 11

-- 
Tomek



Re: Max length of a LOC: poll results (Was: On 80 columns...)

2011-01-31 Thread Tomek Sowiński
Tomek Sowiński napisał:

 Alright, I'm wrapping up this toy study. Two things before the numbers come:
 
  - A few respondents gave 2 numbers, one reasonable, the other if I really 
 have to. I took the latter (larger) number as I was after maximum length, 
 something usable as a setting for a repository hook.
  - 2 respondents said no limit. I excluded them from computations, albeit 
 it's a valid answer. 1 respondent answered 1 mole which I also excluded as 
 a 22-order-of-magnitude outlier.

Steven came in late with his datapoint, so once again:

 lengths = c(80, 80, 110, 120, 80, 80, 100, 100, 120, 110, 90, 80)
 
 summary(lengths)
   Min. 1st Qu.  MedianMean 3rd Qu.Max. 
  80.00   80.00   95.00   95.83  110.00  120.00 
 
 sd(lengths)  # standard deviation
[1] 16.21354
 
 quantile(lengths, c(.1, .25, .5, .75, .9))
10% 25% 50% 75% 90% 
 80  80  95 110 119 
 
 skewness(lengths)  # take with a grain of salt, little data
[1] 0.3121957
 
 length(lengths) # count
[1] 12

-- 
Tomek



Re: Image Resizing by Seam Carving (Was: On 80 columns should (not) be enough foreveryone)

2011-01-31 Thread Tomek Sowiński
Nick Sabalausky napisał:

  Now, what we need is the audio-equivalent of this:
  http://www.youtube.com/watch?v=6NcIJXTlugc  
 
 That's really cool, and seems so obvious in retrospect.

There's a D implementation:

http://dsource.org/projects/seamzgood

but it's abandoned.

-- 
Tomek



Re: d-programming-language.org

2011-01-30 Thread Tomek Sowiński
Andrei Alexandrescu napisał:

 In agreement with Walter, I removed the Digitalmars reference. The 
 message is simple - D has long become an entity independent from the 
 company that created it. (However, this makes the page header look 
 different and probably less visually appealing.)

The header's D should be in red. It's become a bit of a community crest and it 
fits the color scheme like a glove.

-- 
Tomek



(Was: On 80 columns should (not) be enough for everyone)

2011-01-30 Thread Tomek Sowiński
Andrej Mitrovic napisał:

 If you really want to set up a column limit that *everyone* has to abide to, 
 then make a poll to see what everyone can agree on.

Actually that's a splendid idea. Let's take it easy. Regardless of that silly 
beef I'm really curious what distribution will emerge.

What is your preferred *maximum* length for a line of D code? (please reply 
with a number only)

-- 
Tomek



Re: (Was: On 80 columns should (not) be enough for everyone)

2011-01-30 Thread Tomek Sowiński
Tomek Sowiński napisał:

 What is your preferred *maximum* length for a line of D code? (please reply 
 with a number only)

120.

-- 
Tomek



Re: On 80 columns should (not) be enough for everyone

2011-01-30 Thread Tomek Sowiński
Sean Kelly napisał:

 Print text doesn't have indentation levels though.  Assuming a 4 character 
 indent, the smallest indentation level for code in a D member function is 8 
 characters.  Add a nested conditional and code is starting 16 characters in, 
 which when wrapped at 80 characters begins to look like a newspaper column.  
 I wrap all my comments at 79 characters, but allow code to spill as far as 
 110 (which is the number of columns on an 8.5x11 piece of paper in landscape 
 mode).

Yeah. If counted without indents, 90 characters would probably suffice, but 
with them it's at least 120 so that nested code doesn't get stifled.

And I'm programming with a proportional font -- far more readable than a 
mono-space.

-- 
Tomek



Re: (Was: On 80 columns should (not) be enough for everyone)

2011-01-30 Thread Tomek Sowiński
Walter Bright napisał:

  What is your preferred *maximum* length for a line of D code? (please reply 
  with a number only)  
 
 6.022e+23

That's a whole mole of code! ;-)

-- 
Tomek



Re: General unicode category

2011-01-30 Thread Tomek Sowiński
spir spir napisał:

 DUnicode has such functionality: https://bitbucket.org/stephan/dunicode/src
 Watch inside unicodedata.d, search for general category.

Thanks. Any word of moving some of it into Phobos? It's jarring to see a 
Unicode-compliant language have so few tools to work with the standard.

-- 
Tomek



Re: Decision on container design

2011-01-29 Thread Tomek Sowiński
Michel Fortin napisał:

  Is there anything implementation specific in the outer struct that provides
  ref semantics to Impl? If not, Container could be generic, parametrized by
  Impl type.  
 
 You could provide an implementation-specific version of some functions 
 as an optimization. For instance there is no need to create the Impl 
 when asking for the length, if the pointer is null, length is zero. 
 Typically, const function can be implemented in the outward container 
 with a shortcut checking for null.

I think the reference struct can still be orthogonal to the container.

struct Ref(Impl)
{
private Impl* _impl;
ref Impl impl() @property
{
if (!impl) impl = new Impl;
return *impl;
}

static if (hasLength!Impl)
{
auto length() @property
{
return impl ? impl.length : 0;
}
}

alias impl this;
}

Reusability lightens the burden of the container's author (less fuss for user 
implementations) and somewhat standardizes containers as they all must exhibit 
a certain API with certain semantics to be able to fit into Ref.

The downside is that the syntax for the most common case (ref semantics) is a 
little nosier than for value-like behavior.

-- 
Tomek



Re: Decision on container design

2011-01-29 Thread Tomek Sowiński
Michel Fortin napisał:

 As for the case of Appender... personally in the case above I'd be 
 tempted to use Appender.Impl directly (value semantics) and make fill 
 take a 'ref'. There's no point in having an extra heap allocation, 
 especially if you're calling test() in a loop or if there's a good 
 chance fill() has nothing to append to it.

Or take an output range.

-- 
Tomek



Re: Decision on container design

2011-01-29 Thread Tomek Sowiński
bearophile napisał:

 This page:
 http://www.jroller.com/scolebourne/entry/the_next_big_jvm_language1
 
 A quotation:
 
 3) Everything is a monitor. In Java and the JVM, every object is a monitor, 
 meaning that you can synchronize on any
 object. This is incredibly wasteful at the JVM level. Senior JVM guys have 
 indicated large percentage improvements
 in JVM space and performance if we removed the requirement that every object 
 can be synchronized on. (Instead, you
 would have specific classes like Java 5 Lock)
 
 I have read similar comments in various other places.
 
 What about creating a @nomonitor annotation, for D2 classes to not create a 
 monitor for specific classes annotated
 with it? This may reduce some class overhead.

Better just remove it, it's not used often. Besides, there are different locks, 
one size doesn't fit all.

-- 
Tomek



Re: Suggestion: New D front page

2011-01-29 Thread Tomek Sowiński
Christopher Bergqvist napisał:

 Hi!
 
 I have been putting some free time into creating a design skeleton for a
 new http://d-programming-language.orghttp://www.d-programming-language.org/
 front
 page:
 http://digitalpoetry.se/D%20website/D%20overview%20design.png
 
 My main concern is presenting newcomers with an inspiring and relevant first
 impression of D. I think there is lots to gain by having a more alive front
 page not based on Ddoc (the rest of the site could still be based on it).
 
 I have not attempted adding any visual style to the design myself since its
 not one of my strengths. It should be made to fit better with the overall
 theme of d-programming-language.org (although IMO it's currently a bit too
 dark and foreboding).
 
 I must confess to being heavily inspired by http://ooc-lang.org and
 http://cobra-language.com.
 
 As creating this would take a significant time investment, I suggest that
 some more complex sections of the page could be released after the initial
 version. I have some background in web development but have been almost
 exclusively doing professional C++ games development during the last 4
 years. I would not mind putting some more work into this but am also hopeful
 that some others in the D community desire to contribute.
 
 Constructive feedback with a minimum of bikeshedding is welcome.
 (Please avoid discussions about specific textual content for now, its just
 placeholders).

Believe it or not but there was a time when the D page welcomed users with 
beautiful exemplary code, but as time went by it got pushed off by quotes, 
current status, news, etc. Looking back, it may have been the reason why I 
didn't say oh.. um.. NEXT! and stayed with D :)

I think we need to go back to the roots.

-- 
Tomek



Re: assert(object) fails to adhere to the principle of least surprise

2011-01-29 Thread Tomek Sowiński
Bernard Helyer napisał:

 If I do
 
 if (object) {
 ...
 }
 
 What happens is fairly obvious, and is equivalent to
 
 if (object !is null) {
 }
 
 However, if I do
 
 auto object = new Object();
 assert(object);
 
 What I expect to happen is
 
 assert(object !is null);
 
 Just as in the above example. What happens however is the program seg 
 faults. Why? Because it turns out what DMD turns it (silently) into is
 
 object.checkInvariants();  // Whatever it's called.
 
 This is bad enough, however it gets pants-on-head stupid as *object is 
 not checked for null*. I think the silent rewrite is bad design, but not 
 checking for null is so stupid, so obvious to anyone who actually uses 
 the language, I can't believe it's existed for so long. The fact that
 
 assert(object);
 
 and
 
 import std.exception;
 enforce(object);
 
 do different things boggles my mind. One must write
 
assert(object !is null);
 
 or
 
assert(!!object);
 
 and every day it's like a giant stabbing pain. A stupid wrong headed 
 design that makes my experience with D _worse_. Just expose a method for 
 checking the invariant explicitly, and don't do this silent rewrite 
 bullshit. Any chance of getting a change of behaviour?
 
 FWIW, GDC doesn't do the rewrite, and SDC (the compiler I'm working on 
 github.com/bhelyer/sdc) won't either. 

http://d.puremagic.com/issues/show_bug.cgi?id=796

Vote up ;)

-- 
Tomek



Re: structs vs classes

2011-01-29 Thread Tomek Sowiński
Jim napisał:

 I'm only discussing the heap/stack difference.

Classes with value semantics would be prone to the slicing problem. 

-- 
Tomek



Re: structs vs classes

2011-01-29 Thread Tomek Sowiński
Matthias Walter napisał:

 That is of course a difference, but no argument. The reason is that you
 can decide whether you want to allocate a class on the stack:
 
 http://www.digitalmars.com/d/2.0/memory.html#stackclass

AFAIR scope classes are to be banished from the language. There's emplace 
instead.

http://digitalmars.com/d/2.0/phobos/std_conv.html#emplace

-- 
Tomek



Re: Suggestion: New D front page

2011-01-29 Thread Tomek Sowiński
Russel Winder napisał:

 I think the current page style looks fine, actually I like it and do not
 consider it dark and foreboding (*).  This is not though a vote for
 not changing if there is something that is going to be more appealing to
 a wider range of programmers.
 
 (*) Or maybe I am just depressed and it fits with the sense of doom and
 despondency ;-)

You're not depressed, just subconsciously keen on prolonging your eye-sight ;-)

Let's blend Chris' dynamic layout with David's toned color scheme, shall we?

-- 
Tomek



Re: Nested function declarations

2011-01-29 Thread Tomek Sowiński
Tomek Sowiński napisał:

  What is the purpose of nested function declarations in D? Is it a good idea 
  to just disallow them?  
 
 1. Helper functions don't clutter the namespace.
 2. Nested functions can access the outer function's stack frame.

OK, I just noticed you asked about declarations, not nested functions in 
general.

They're useful for testing:

unittest {
int foo();
static assert (is(ReturnType!foo == int));
}

-- 
Tomek



Re: How can you read and understand the source of *naryFun in functional.d?

2011-01-29 Thread Tomek Sowiński
Tom napisał:

 I am learning D for some time. I come from background of C, C# and Python.
 When I saw the ways to use std.algorithem's functions, I have noticed that the
 input lambda's can be writen as strings. Somewhat like the pythonic exec. I
 went to the source of this feature in functional.d
 (https://github.com/D-Programming-Language/phobos/blob/master/std/functional.d;).
 The functions unaryFun and binaryFun. Is there a way I can read them and
 understand them easily? or maybe I missed something?

The standard library implementation must cater for a lot of corner-cases. But 
the essence is this:

template binaryFun(string expr) {
auto binaryFun(T, U)(T a, U b) {
return mixin(expr);
}
}

unittest {
assert (binaryFun!a+b(1,2) == 3);
assert (binaryFun!a-b(1,2) == -1);
}

The magic happens at the mixin line. It takes any expression or statement in 
string form and compiles it in context of the function. Unlike pythonic exec, 
the string must be known at compile-time.

-- 
Tomek



General unicode category

2011-01-29 Thread Tomek Sowiński
How can I get the general unicode category (Lu, Nd, Pc, etc.) of a dchar? 
std.uni contains barely anything useful.

-- 
Tomek


Re: Decision on container design

2011-01-28 Thread Tomek Sowiński
Michel Fortin napisał:

 We already argument this over and over in the past. First, I totally 
 acknowledge that C++ style containers have a problem: they make it 
 easier to copy the content than pass it by reference. On the other side 
 of the spectrum, I think that class semantics makes it too easy to have 
 null dereferences, it's easy to get lost when you have a container of 
 containers.
 
 I have some experience with containers having class-style semantics: in 
 Objective-C, I ended up creating a set of macro-like functions which I 
 use to initialize containers whenever I use them in case they are null. 
 And I had to do more of these utility functions to handle a particular 
 data structure of mine which is a dictionary of arrays of objects. In 
 C++, I'd have declared this as a map string, vector Object   and 
 be done with it; no need for special care initializing each vector, so 
 much easier than in Objective-C.
 
 I agree that defining structs to have reference semantics as you have 
 done is complicated. But I like the lazy initialization, and we have a 
 precedent for that with AAs (ideally, AAs would be a compatible 
 container too). Can't we just use the GC instead of reference counting? 
 I'd make things much easier. Here is a implementation:
 
   struct Container
   {
   struct Impl { ... }
 
   private Impl* _impl;
   ref Impl impl() @property
   {
   if (!impl) impl = new Impl;
   return *impl;
   }
   
   alias impl this;
   }
 
 I also believe reference semantics are not to be used everywhere, even 
 though they're good most of the time. I'd like to have a way to bypass 
 it and get a value-semantic container. With the above, it's easy as 
 long as you keep Container.Impl public:
 
   void main() {
   Container  lazyHeapAllocatedContainer;
   Container.Impl stackAllocatedContainer;
   }
 
   void MyObject {
   Container.Impl listOfObjects;
   }

Is there anything implementation specific in the outer struct that provides ref 
semantics to Impl? If not, Container could be generic, parametrized by Impl 
type.

Overall, I think a value-like implementation in a referency wrapper is a 
clear-cut idiom, bringing order to otherwise messy struct-implemented 
ref-semantics. Do you know of a existing collection library that exploits this 
idea?

-- 
Tomek



Re: dlist for phobos

2011-01-27 Thread Tomek Sowiński
Andrei Alexandrescu napisał:

 ref returns should be guaranteed to never escape.

Should meaning they're not guaranteed now? I'm curious in what scenarios they 
escape.

-- 
Tomek



Re: dlist for phobos

2011-01-27 Thread Tomek Sowiński
Andrei Alexandrescu napisał:

 On 1/27/11 4:48 PM, Tomek Sowiński wrote:
  Andrei Alexandrescu napisał:
 
  ref returns should be guaranteed to never escape.
 
  Should meaning they're not guaranteed now? I'm curious in what scenarios 
  they escape.
 
 Any function can take the address of a reference (either a ref parameter 
 or the result of another function) and squirrel it away.

Jeez.. I must've had some brain-warp that I didn't think of r.front :-)

But is just banning taking addresses of ref parameters and return values going 
to solve the problem? Sounds delusively simple...

-- 
Tomek



Re: immutable

2011-01-27 Thread Tomek Sowiński
Trass3r napisał:

  But thank you for the answer, I have filed the bug.
 
 Rats, I've filed one too ;)
 http://d.puremagic.com/issues/show_bug.cgi?id=5492

We found it over a year ago :)
http://d.puremagic.com/issues/show_bug.cgi?id=3534

-- 
Tomek



Re: Is D still alive?

2011-01-27 Thread Tomek Sowiński
Walter Bright napisał:

 bearophile wrote:
  Walter:
  
  The reason that took so long was that few people were using DbC
  effectively, so it was a low priority. I originally had high hopes that DbC
  would produce dramatic improvements in code quality, but the real world
  results were disappointing.
  
  After many years and many failed hopes, I think there is no silver bullet in
  programming, so maybe nothing is able to produce dramatic improvements in
  code quality.
  
  But even if this is true, some things are able to improve coding a bit, like
  unit testing, a well semantically defined language, syntax coloring, quick
  compile-run cycles, OOP for certain kinds of programs, DbC, and so on. Each
  of such things improve the situation only a little, but such improvements
  pile up and most programmers when have tried them don't want to go back to
  miss those things.
 
 Unit testing has produced a dramatic improvement in coding.

Yes, it's big. Funny that it's not really a technical change but a cultural one 
-- D just leaves no excuses to even the most stone-age programmers not to test 
their code.

-- 
Tomek



Re: Is D still alive?

2011-01-26 Thread Tomek Sowiński
Steven Schveighoffer napisał:

  Adam Ruppe and Piotr Szturmaj have recently been working on some database
  stuff. See the recent thread Can your programming language do this?
 
 I have ignored that thread (I sometimes just ignore threads because they  
 start out uninteresting, or become uninteresting, and then I miss out on  
 some good stuff!)
 
 I'll have to take a look, D2 really does need a DB interface -- badly.

That and networking. I can help with the latter as I had done a bit of network 
devving, but I don't know what's the current state of affairs (sb working on it 
already?) and whether Phobos needs another soul on-board.

  I would say it is not ready for prime-time yet.  It has a way to go, but
  some have managed to build pretty impressive applications from it.  So  
  it
  would depend on your application.
 
 
  Personally, I think that even though D still has some things to be worked
  out, I think it's *still* far better than any of the other more mature
  languages.
 
 It all seems really good until you hit an issue that cannot be worked  
 around -- like a compiler error or a misdesigned feature.  I call these  
 'mercy' problems, because you are then at the complete mercy of someone  
 else.  If you have a deadline, or have a complete stoppage in work, you  
 really have little choice but to move onto another language or abandon the  
 project.  Dcollections sat idle for about a year because of a problem like  
 this.

Yeah, ditto for QuantLibD. I just spent too much time on a test project trying 
to isolate dmd and phobos bugs to submit something meaningful to bugzilla and 
too little time coding. Not to mention that sometimes it was really hard to 
know what the language *should* do because of outdated documentation. But maybe 
the storm has passed and I should try serious work in D again?

 [snip]
 
 BTW, I plan to write a semi-professional project in D2 in the near future,  
 but I'm 1) willing to take the risks 2) have no deadline and 3) not  
 depending on this project for a living.

Sheer curiosity: what will the project be about?

-- 
Tomek



Re: Showing unittest in documentation (Was Re: std.unittests [updated] for review)

2011-01-24 Thread Tomek Sowiński
Steven Schveighoffer napisał:

 BTW I consider this a very important topic. We have _plenty_ of
 examples that don't work and are not mechanically verifiable. The
 reasons range from minor typos to language changes to implementation
 limitations. Generally this is what they call documentation rot.
 This is terrible PR for the language.

 Changing ddoc to recognize documentation unittests would fix this
 matter once and forever.

 Last but not least, the  separators for code samples are awful
 because no editor recognizes them for anything - they confuse the
 hell out of Emacs for one thing.
 
 This only makes sense if:
 
 1. The unit test immediately follows the item being documented 2. The
 unit test *only* tests that item.
 
 The second one could be pretty annoying.  Consider cases where several
 functions interact (I've seen this many times on Microsoft's
 Documentation), and it makes sense to make one example that covers all
 of them.  Having them 'testable' means creating several identical unit
 tests.
 
 One way to easily fix this is to allow an additional parameter to the
 comment:
 
 /**
 Example(Foo.foo(int), Foo.bar(int)):
 */
 unittest
 {
 auto foo = new Foo;
 foo.foo(5);
 foo.bar(6);
 assert(foo.toString() == bazunga!);
 }
 
 The above means, copy the example to both Foo.foo(int) and
 Foo.bar(int)
 
 An alternative that is more verbose, but probably more understandable:
 
 /**
 Example:
 Covers Foo.foo(int)
 Covers Foo.bar(int)
 */
 
 Of course, a lack of target just means it applies to the item just
 documented.

Although coming from good intentions, it's just.. too much. The original
idea is very compelling without add-ons.

Often the interacting functions are members of the same class or at
least same module, so it's enough to place the unittest appropriately.
To cover remaining cases an artificial declaration may be introduced.

/// Uses of Foo.foo(int) and Foo.bar(int)
struct foo_and_bar_examples;

/// Example:
unittest { ... }

Both functions would simply link to the artificial symbol in their
ddocs.

 One other thing, using writefln is considered bad form in unit tests
 (you want *no* output if the unit test works).  But many examples
 might want to demonstrate how e.g. an object interacts with
 writefln.  Any suggestions? The assert line above is not very pretty
 for example...

I was thinking of mockFile.writefln(obj) but not sure if std.stdio can
handle it.

--
Tomek



Re: std.unittests [updated] for review

2011-01-24 Thread Tomek Sowiński
Dnia 2011-01-24, o godz. 06:34:49
Jonathan M Davis jmdavisp...@gmx.com napisał(a):

 In case you didn't know, I have a set of unit test helper functions which 
 have 
 been being reviewed for possible inclusion in phobos. Here's an update.
 
 Most recent code: http://is.gd/F1OHat
 
 Okay. I took the previous suggestions into consideration and adjusted the 
 code a 
 bit more. However, most of the changes are to the documentation (though there 
 are some changes to the code). Some of the code duplication was removed, and 
 the 
 way that some of the assertPred functions' errors are formatted has been 
 altered 
 so that values line up vertically, making them easier to compare.

That's a solid improvement, thanks.

 The big change 
 is the docs though. There's now a fake version of assertPred at the top with 
 an 
 overall description for assertPred followed by the individual versions with 
 as 
 little documentation as seemed appropriate while still getting all of the 
 necessary information across. A couple of the functions still have 
 irritatingly 
 long example sections, but anything less wouldn't get the functionality 
 across.

I'm not sure...

Examples:

assertPred!+(7, 5, 12);
assertPred!-(7, 5, 2);
assertPred!*(7, 5, 35);
assertPred!/(7, 5, 1);
assertPred!%(7, 5, 2);
assertPred!^^(7, 5, 16_807);
assertPred!(7, 5, 5);
assertPred!|(7, 5, 7);
assertPred!^(7, 5, 2);
assertPred!(7, 1, 14);
assertPred!(7, 1, 3);
assertPred!(-7, 1, 2_147_483_644);
assertPred!~(hello , world, hello world);

assert(collectExceptionMsg(assertPred!+(7, 5, 11)) ==
   assertPred!\+\ failed: [7] + [5]:\n ~
   [12] (actual)\n ~
   [11] (expected).);

assert(collectExceptionMsg(assertPred!/(11, 2, 6, It failed!)) ==
   assertPred!\/\ failed: [11] / [2]:\n ~
   [5] (actual)\n ~
   [6] (expected): It failed!);

Picking only one or two from the above would be enough to get it. It's the 
description that ought to explain the function's behavior in all cases, 
examples are for jump-starting the user to action.


Oh, one more thing. Previously you asked me why a generic collectThrown is 
useful and I forgot to answer. One use is the same as collectExceptionMsg() 
without being tied to the msg property.

auto e = collectThrown!MyException(expr);
assert(e);
assert(e.errorCode == expectedCode);
assert(cast(MyCauseException) e.next);

I'm not proposing to yank collectExceptionMsg or assertThrown in favor of 
collectThrown, they're useful idioms. But having also collectThrown (a generic 
replacement for existing collectException) would definitely be of value.

 In any case. Here's the updated code. Review away. Andrei set the vote 
 deadline 
 for February 7th, at which point, if it passes majority vote, then it will go 
 into Phobos. The number of functions is small enough now (thanks to having 
 consolidated most of them into the fantastically versatile assertPred) that 
 it 
 looks like it will likely go in std.exception if the vote passes rather than 
 becoming a new module. So, the std.unittests title has now become a bit of a 
 misnomer, but that's what I've been calling it, so it seemed appropriate to 
 continue to label it that way in the thread's title.

Good luck!

-- 
Tomek



Re: Ad hoc ranges

2011-01-22 Thread Tomek Sowiński
Andrei Alexandrescu napisał:

 On 1/21/11 7:35 PM, Tomek Sowiński wrote:
  Andrei Alexandrescu napisał:
 
  Like I said, anything that doesn't bother to expose range-interfaced 
  iterators and is not performance critical is
  considered a target for ad hoc ranges. Working with non-D libraries, or 
  libraries ported to D but preserving
  mother-language idioms. Tasks like traversing a tree of GUI widgets, or 
  business specific objects where defining
  proper ranges rarely happens and is use-case driven in practice. I expect 
  they could be of some use in unittesting
  as mock input. Vaguely related: educational -- ad hoc ranges read almost 
  like a for loop so the learning curve for
  ranges in general is eased off.
 
  Adding them to Phobos is an interesting idea. We need to evaluate their 
  worth, though.
 
  Everybody: if you could write up a one-liner like range(empty, popFront, 
  front), what would you use it for?
 
  How about a singleton range - a range with exactly one element. It could
  be done with repeat(x, 1) but let's try it with your function as a
  warm-up exercise.
 
  If x is nullable, range(x, x=null, x); it destroys x, though. Otherwise the 
  state must be held separately on the
  stack.
 
  bool empty;
  auto r = range(empty, empty=true, x);
 
  So repeat(x, 1) wins this one. I think such nuggets can better be expressed 
  as a degenerate case of existing
  facilities. I envision ad hoc ranges at places where no iteration is 
  defined and a one-off range struct doesn't
  pay. Like database-backed entities which don't conform to any clear-cut 
  data structure, but if you squint you see
  it's sort of a tree, and you may just be able to e.g. walk through children 
  recursively fetching only active ones
  from DB, traverse columns of interest, and dump their content to a grid 
  component which takes an arbitrary range of
  values. And all this can be wrapped in std.parallelism to overlap DB round 
  trips.
 
 I think the challenge here is to figure out where to store the state. 
 The idiom makes it difficult for the delegates to communicate state to 
 one another.

On the stack, for loops do it for years.

-- 
Tomek



Re: Python's partition

2011-01-22 Thread Tomek Sowiński
Andrei Alexandrescu napisał:

 Looking through Python's string functions 
 (http://docs.python.org/release/2.5.2/lib/string-methods.html) I noticed 
 partition():
 
 partition(sep)
  Split the string at the first occurrence of sep, and return a 
 3-tuple containing the part before the separator, the separator itself, 
 and the part after the separator. If the separator is not found, return 
 a 3-tuple containing the string itself, followed by two empty strings. 
 New in version 2.5.
 
 Right now we find find and findSkip; partition would be a great 
 complement, and can be implemented for all forward ranges.
 
 One question is naming - partition() is not good for us because 
 std.algorithm.partition implements Hoare's in-place partition algorithm. 
 How should we call the function?

Instead of a one-shot function, would a lazy range of pre-hit-post troikas be 
possible? That'd rhyme nicely with RegexMatch. In fact, match(string, string) 
overload is free...

-- 
Tomek



Re: replaceFirst, findPieces, and takeExactly

2011-01-22 Thread Tomek Sowiński
Andrei Alexandrescu napisał:

 On 1/22/11 5:14 PM, Nick Sabalausky wrote:
  Andrei Alexandrescuseewebsiteforem...@erdani.org  wrote in message
  news:ihfm34$jvb$1...@digitalmars.com...
  On 1/22/11 4:16 PM, bearophile wrote:
  Andrei:
 
  Back then people said that STL's find() is better than D's find()
  because the former returns
  an iterator that can be combined with either the first iterator to get
  the portion before the match, or with the last iterator to get the
  portion starting at the match. D's find() only gives you the portion
  after the match.
 
  There's a HUGE problem here. This equivalence is sometimes true, but
  surely not always true:
  more powerful != better
 
 
  That function allows you to pick a determined number of elements from a
  range, assuming the range is never shorter than that. That sounds a bit
  obscure, but plays a pivotal role in findParts() (which is the name I
  settled on for the equivalent of Python's partition()):
 
  trisect is way better than findParts :-) And it's a single word with
  no uppercase letters in the middle.
 
  There is still time until the next release. Votes for trisect?
 
 
  vote--
 
  findParts is the sort of thing that once you read what it does just
  *once*, it immediately becomes both obvious and easy to remember. But
  trisect is 1. scary, 2. I'd never remember it, and 3. Whenever I'd come
  across it, I'd never remember what it meant.  Those are paricularly bad
  since I know right now I'm going to find it an incredibly useful function:
  There's already been too many times I've written this mess and felt dirty
  about it:
 
  auto result = find(str, delim);
  auto firstPart = str[0..$-result.length];
 
  So I'm thrilled to see this function being added.
 
 Yes, I'm absolutely in agreement with the naming (and thrilled too). I 
 imagine a putative user looking through std.algorithm (let's see... 
 what find functions are out there?). That makes findPieces easy to get 
 to, whereas trisect would be oddly situated in the alphabetic list and 
 oddly named enough to be virtually undiscoverable.

Me a tad less, but not because of the name. I'd still rather see a lazy range 
of pre-hit-post tuples. Am I the only one to see findParts as a no-patterns 
variation of RegexMatch accepting all element types, not just char? Then even 
the name comes naturally -- match.

-- 
Tomek



Re: Ad hoc ranges

2011-01-21 Thread Tomek Sowiński
Jonathan M Davis napisał:

  I don't know a terser way to get a full-fledged range. It comes at a cost,
  though. Lazy parameters are just sugar over delegates, so it's not exactly
  Usain Bolt**... And you can't return it because by bug or by design lazy
  parameters (unlike vanilla delegates) don't work like closures. Still,
  even with the overhead and limitations the idiom is remarkably useful,
  especially in face of range-unfriendly libraries from outside D realm.
  
  Enjoy.  
 
 What types of stuff do you need ad-hoc ranges for? What's the use case? I've 
 never actually needed such a thing. I'm curious. If it's really something 
 that's 
 likely to be generally useful, then a function similar to what you're 
 suggesting 
 probably should be added to std.range.

Like I said, anything that doesn't bother to expose range-interfaced iterators 
and is not performance critical is considered a target for ad hoc ranges. 
Working with non-D libraries, or libraries ported to D but preserving 
mother-language idioms. Tasks like traversing a tree of GUI widgets, or 
business specific objects where defining proper ranges rarely happens and is 
use-case driven in practice. I expect they could be of some use in unittesting 
as mock input. Vaguely related: educational -- ad hoc ranges read almost like a 
for loop so the learning curve for ranges in general is eased off.

Adding them to Phobos is an interesting idea. We need to evaluate their worth, 
though.

Everybody: if you could write up a one-liner like range(empty, popFront, 
front), what would you use it for?

-- 
Tomek



Re: renamepalooza time

2011-01-21 Thread Tomek Sowiński
Jonathan M Davis napisał:

   These should be expanded a bit and camelCased:
  LS:lineSep, lineSeparator
  PS:paragraphSep, paragraphSeparator  
  
  Isn't there a rule that constants all fully uppercase?  
 
 That would be typical in C++ or Java, but that's not the case in D. Phobos 
 certainly doesn't work that way in general, and Andrei doesn't want it to. 
 The 
 reasoning is that constants are so common in D (likely due to CTFE) that 
 you'd 
 have variables all over the place which were in all caps, and it would get 
 really annoying.

Right on.

 So, no. There is no rule in D that constants should be fully 
 uppercase.

So if not uppercase, what is the convention for constants then? And, to 
hair-split more, what is a constant to begin with? Would e.g. a big immutable 
configuration tree structure fall into that bucket? Or a logger object?

-- 
Tomek



Re: Ad hoc ranges

2011-01-21 Thread Tomek Sowiński
Andrei Alexandrescu napisał:

  Like I said, anything that doesn't bother to expose range-interfaced 
  iterators and is not performance critical is
  considered a target for ad hoc ranges. Working with non-D libraries, or 
  libraries ported to D but preserving
  mother-language idioms. Tasks like traversing a tree of GUI widgets, or 
  business specific objects where defining
  proper ranges rarely happens and is use-case driven in practice. I expect 
  they could be of some use in unittesting
  as mock input. Vaguely related: educational -- ad hoc ranges read almost 
  like a for loop so the learning curve for
  ranges in general is eased off.
 
  Adding them to Phobos is an interesting idea. We need to evaluate their 
  worth, though.
 
  Everybody: if you could write up a one-liner like range(empty, popFront, 
  front), what would you use it for?  
 
 How about a singleton range - a range with exactly one element. It could 
 be done with repeat(x, 1) but let's try it with your function as a 
 warm-up exercise.

If x is nullable, range(x, x=null, x); it destroys x, though. Otherwise the 
state must be held separately on the stack.

bool empty;
auto r = range(empty, empty=true, x);

So repeat(x, 1) wins this one. I think such nuggets can better be expressed as 
a degenerate case of existing facilities. I envision ad hoc ranges at places 
where no iteration is defined and a one-off range struct doesn't pay. Like 
database-backed entities which don't conform to any clear-cut data structure, 
but if you squint you see it's sort of a tree, and you may just be able to e.g. 
walk through children recursively fetching only active ones from DB, traverse 
columns of interest, and dump their content to a grid component which takes an 
arbitrary range of values. And all this can be wrapped in std.parallelism to 
overlap DB round trips.

-- 
Tomek



Ad hoc ranges

2011-01-20 Thread Tomek Sowiński
Doing my own deeds, I often found myself in need of writing up a range just to 
e.g. feed it into an algorithm. Problem is, defining even the simplest range -- 
one-pass forward -- is verbose enough to render this (correct) approach 
unprofitable.

This is how I went about the problem:

auto range(T, Whatever)(lazy bool _empty, lazy Whatever _popFront, lazy T 
_front) {
struct AdHocRange {
@property bool empty() { return _empty(); }
void popFront() { _popFront(); }
@property T front() { return _front(); }
}
return AdHocRange();
}

--- example ---

try { ... }
catch(Throwable t)
{
auto r = range(t is null, t = t.next, t);

// process exception chain...
}

I don't know a terser way to get a full-fledged range. It comes at a cost, 
though. Lazy parameters are just sugar over delegates, so it's not exactly 
Usain Bolt**... And you can't return it because by bug or by design lazy 
parameters (unlike vanilla delegates) don't work like closures. Still, even 
with the overhead and limitations the idiom is remarkably useful, especially in 
face of range-unfriendly libraries from outside D realm.

Enjoy.

-- 
Tomek

** Of course, there exists a somewhat more verbose compile-time variant of the 
idiom I presented.


Re: Ad hoc ranges

2011-01-20 Thread Tomek Sowiński
bearophile napisał:

 I am not sure, but I think Andrei has deprecated the lazy attribute.

Yes, but AFAIR in favor of implicit conversions of expressions to parameterless 
delegates, which strengthens my little idiom.

-- 
Tomek



Re: Implicit delegate conversions

2011-01-17 Thread Tomek Sowiński
Steven Schveighoffer napisał:

 I think this is one place where D can improve by vast amounts without a  
 lot of effort (no change in code generation, just in implicit casting).   

Yeah, my thoughts exactly. And bumping into a signature mismatch has gotten 
really likely.

 I've brought this up, and contributed to one bugzilla report requesting  
 contravariant delegates (which was denied by Walter).

Why was it denied? (or just point me to the bug, pls)

-- 
Tomek



Re: repeat

2011-01-17 Thread Tomek Sowiński
Andrei Alexandrescu napisał:

 std.range has a function repeat that repeats one value forever. For 
 example, repeat(42) is an infinite range containing 42, 42, 42,...
 
 The same module also has a function replicate that repeats one value a 
 specific number of times. In fact, replicate can be expressed as an 
 overload of repeat, so that's what I just did (not committed yet): 
 repeat(42, 100) repeats 42 one hundred times, repeat(42) repeats 42 
 forever. I'll put replicate on the deprecation chute.
 
 So far so good. Now, string has its own repeat. repeat(abc, 2) returns 
 the string abcabc.
 
 I want to generalize the functionality in string's repeat and move it 
 outside std.string. There is an obvious semantic clash here. If you say 
 repeat(abc, 3) did you mean one string abcabcabc or three strings 
 abc, abc, and abc?
 
 So we need distinct names for the functions. One repeats one value, the 
 other repeats a range. Moreover, I'm thinking sometimes you want to 
 repeat a range lazily, i.e. instead of producing abcabc just return a 
 range that looks like it.
 
 Ideas for a good naming scheme are welcome.

Overload cycle and call it a day?

-- 
Tomek



Re: repeat

2011-01-17 Thread Tomek Sowiński
Andrei Alexandrescu napisał:

 On 1/17/11 1:53 PM, Tomek Sowiński wrote:
  Andrei Alexandrescu napisał:
 
  std.range has a function repeat that repeats one value forever. For
  example, repeat(42) is an infinite range containing 42, 42, 42,...
 
  The same module also has a function replicate that repeats one value a
  specific number of times. In fact, replicate can be expressed as an
  overload of repeat, so that's what I just did (not committed yet):
  repeat(42, 100) repeats 42 one hundred times, repeat(42) repeats 42
  forever. I'll put replicate on the deprecation chute.
 
  So far so good. Now, string has its own repeat. repeat(abc, 2) returns
  the string abcabc.
 
  I want to generalize the functionality in string's repeat and move it
  outside std.string. There is an obvious semantic clash here. If you say
  repeat(abc, 3) did you mean one string abcabcabc or three strings
  abc, abc, and abc?
 
  So we need distinct names for the functions. One repeats one value, the
  other repeats a range. Moreover, I'm thinking sometimes you want to
  repeat a range lazily, i.e. instead of producing abcabc just return a
  range that looks like it.
 
  Ideas for a good naming scheme are welcome.
 
  Overload cycle and call it a day?
 
 cycle(r, n) already has a meaning: cycle r for a maximum total of n 
 elements.

Now I'm confused. The docs say it's an initial index...

-- 
Tomek



Re: repeat

2011-01-17 Thread Tomek Sowiński
Andrei Alexandrescu napisał:

 On 1/17/11 2:14 PM, Tomek Sowiński wrote:
  Andrei Alexandrescu napisał:
 
  On 1/17/11 1:53 PM, Tomek Sowiński wrote:
  Andrei Alexandrescu napisał:
 
  std.range has a function repeat that repeats one value forever. For
  example, repeat(42) is an infinite range containing 42, 42, 42,...
 
  The same module also has a function replicate that repeats one value a
  specific number of times. In fact, replicate can be expressed as an
  overload of repeat, so that's what I just did (not committed yet):
  repeat(42, 100) repeats 42 one hundred times, repeat(42) repeats 42
  forever. I'll put replicate on the deprecation chute.
 
  So far so good. Now, string has its own repeat. repeat(abc, 2) returns
  the string abcabc.
 
  I want to generalize the functionality in string's repeat and move it
  outside std.string. There is an obvious semantic clash here. If you say
  repeat(abc, 3) did you mean one string abcabcabc or three strings
  abc, abc, and abc?
 
  So we need distinct names for the functions. One repeats one value, the
  other repeats a range. Moreover, I'm thinking sometimes you want to
  repeat a range lazily, i.e. instead of producing abcabc just return a
  range that looks like it.
 
  Ideas for a good naming scheme are welcome.
 
  Overload cycle and call it a day?
 
  cycle(r, n) already has a meaning: cycle r for a maximum total of n
  elements.
 
  Now I'm confused. The docs say it's an initial index...
 
 Sorry, my bad. You're right. Still, cycle(r, n) has a meaning distinct 
 from what we might need.

I don't think the initial index really useful (even the authors confirm by not 
bothering to unittest it:-)) My idea is to dump it in favor of popFrontN 
(provide a method on Cycle, let the stand-alone popFrontN statically recognize 
that). Bounding an infinite range is much more frequent.

Or, if you're really not keen on the idea, introduce cycleN.

 Essentially I'm looking for a name for the 
 function array(take(cycle(range), n * range.length)). That's what 
 std.string.repeat does currently.

With the above cycleN(range, n * range.length).array() doesn't look that bad. 
What are the use-cases that you want a separate name?

-- 
Tomek



Implicit delegate conversions

2011-01-15 Thread Tomek Sowiński
The profusion of D's attributes has made delegate signature mismatches all too 
likely thus one must resort to casts too often with e.g. callbacks.

const(short)[] delegate(immutable(int)*) dg1;
immutable(short)[] delegate(const(int)*) pure nothrow @safe dg2;
dg1 = dg2;  // fails (if *any* of storage classes or types don't match)

This problem is nothing new. It has been popping up in discussions and bugzilla 
but was never addressed entirely.

The sketch of the conversion rules:
dg2 is implicitly convertible to dg1 if
 - dg2 could override dg1 if they were class methods, bar polymorphic return 
type covariance; OR
 - each of d2's arguments is implicitly convertible from and binary equivalent 
of dg1's respective argument and dg2's return type is implicitly convertible to 
and binary equivalent of dg1's return type.

The overarching thought is that signature types of both delegates should be 
indistinguishable in compiled binaries to rule out polymorphism** as it 
involves vtable pointer shifting. In the type system, however, the assigned 
delegate may have looser but compatible argument types (note: overloading 
problems don't apply to delegates), a tighter return type, or covariant 
attributes. The if they were class methods contortion is my try to ease off 
the implementation -- some compiler code may be reused (I may be wrong).

Please find holes.

-- 
Tomek

** It works with C# delegates, though. Anyone knows how they do it?


Re: std.unittests for (final?) review [Update]

2011-01-11 Thread Tomek Sowiński
Jonathan M Davis napisał:

 On Monday, January 10, 2011 13:48:50 Tomek Sowiński wrote:
  Jonathan M Davis napisał:
   I followed Andrei's suggestion and merged most of the functions into a
   highly flexible assertPred. I also renamed the functions as suggested
   and attempted to fully document everything with fully functional
   examples instead of examples using types or functions which don't
   actually exist.
  
  Did you zip the right file? I still see things like nameFunc and
  assertPlease.
 
 ??? Those are supposed to be there. All examples are tested in the unit tests 
 exactly as they are.

I just thought instead of examples using types or functions which don't 
actually exist meant well-known Phobos functions would be used.

  On the whole the examples are too long. It's just daunting I can't see docs
  for *one* function without scrolling. Please give them a solid hair-cut --
  max 10 lines with a median of 5. The descriptions are also watered down by
  over-explanatory writing.
 
 Perhaps. If I cut down on the examples though, the usage wouldn't be as 
 clear. 
 The idea was to be thorough. Andrei wanted better examples, so I gave better 
 examples.

Not sure if longer means better.

 However, it is a bit of a balancing act, and I may have put too many 
 in. It's debatable. Nick's suggestion of a main description before each 
 individual overload would help with that.

I agree. Perhaps a synopsis for the whole module like in std.variant would help 
too.

   So, now there's just assertThrown, assertNotThrown, collectExceptionMsg,
   and assertPred (though there are eight different overloads of
   assertPred). So, review away.
  
  Some suggestions:
  
  assertPred:
  Try putting expected in front; uniform call syntax can then set it apart
  from the operands: assertPred!%(7, 5, 2); // old
  2.assertPred!%(7, 5); // new
 
 I really don't see any value to this.
 
 1. You can't do that with assert, and assertPred is essentially supposed to 
 be a 
 fancy assert.
 
 2. A number of assertPred overloads don't even have an expected, so it would 
 be 
 inconsistent.
 
 3. People already are annoyed enough that the operator doesn't end up between 
 the arguments. Putting the result on the left-hand side of the operator like 
 that would make it that much more confusing.

OK, I understand.

  assertNotThrown: chain the original exception with AssertError as its
  cause? Oh, this one badly needs a real-life example.
 
 I suppose that chaining it would be a good idea. I didn't think of that. But 
 if 
 you want examples, it's used in the unit tests in this very module, and I 
 used 
 it heavily in std.datetime.

I meant a real-life example in documentation. People may often ask themselves 
how is it different than !assertThrown()?.

  assertThrown: I'd rather see generified collectException (call it
  collectThrown?). assertThrown may stay as a convenience wrapper, though.
 
 ??? I don't get what you're trying for here. assertThrown isn't trying to 
 collect exceptions at all. It's testing whether the given exception was 
 thrown 
 like it's supposed to be for the given function call. If it was, then the 
 assertion succeeded. If it wasn't, then an AssertError is thrown. Just like 
 assert.

I mean now collectException doesn't have a parametrized catch block like 
assertThrown does. If it did, the latter could come down to:

void assertThrown(T : Throwable = Exception, F)
   (lazy F funcToCall, string msg = null, string file = 
__FILE__, size_t line = __LINE__)
{
T e = collectThrown!T(funcToCall);
if (e is null)
throw new AssertError(...);
}

Shortening assertThrown's implementation is a bonus, main gain is better 
collectThrown().

[there's more down]

  Looking at the code I'm seeing the same cancerous coding style std.datetime
  suffered from (to a lesser extent, I admit).
  
  For instance, this routine:
  
  if(result != expected)
  {
  if(msg.empty)
  {
  throw new AssertError(format(`assertPred!%s failed: [%s] %s
  [%s]: actual [%s], expected [%s].`, op,
   lhs,
   op,
   rhs,
   result,
   expected),
 file,
 line);
  }
  else
  {
  throw new AssertError(format(`assertPred!%s failed: [%s] %s
  [%s]: actual [%s], expected [%s]: %s`, op,
   lhs,
   op,
   rhs,
   result,
   expected,
   msg),
file,
line

Re: VLERange: a range in between BidirectionalRange and RandomAccessRange

2011-01-11 Thread Tomek Sowiński
Andrei Alexandrescu napisał:

 I've been thinking on how to better deal with Unicode strings. Currently 
 strings are formally bidirectional ranges with a surreptitious random 
 access interface. The random access interface accesses the support of 
 the string, which is understood to hold data in a variable-encoded 
 format. For as long as the programmer understands this relationship, 
 code for string manipulation can be written with relative ease. However, 
 there is still room for writing wrong code that looks legit.
 
 Sometimes the best way to tackle a hairy reality is to invite it to the 
 negotiation table and offer it promotion to first-class abstraction 
 status. Along that vein I was thinking of defining a new range: 
 VLERange, i.e. Variable Length Encoding Range. Such a range would have 
 the power somewhere in between bidirectional and random access.
 
 The primitives offered would include empty, access to front and back, 
 popFront and popBack (just like BidirectionalRange), and in addition 
 properties typical of random access ranges: indexing, slicing, and 
 length.

For some compressions implementing *back is troublesome if not impossible...

 Note that the result of the indexing operator is not the same as 
 the element type of the range, as it only represents the unit of encoding.

It's worth to mention it explicitly -- a VLERange is dually typed. It's 
important for searching. Statically check if original and encoded match, if so, 
perform fast search on directly on encoded elements. I think an important 
feature of a VLERange should be dropping  itself down to a encoded-typed range, 
so that front and back return raw data.

Dual typing will also affect foreach -- in general case you'd want to choose 
whether to decode or not by typing the element.

I can't stop thinking that VLERange is a two-piece bikini making a bare 
random-access range safe to look at, and that you can take off when partners 
have confidence, not a limited random-access probing facility to span the void 
between front and back.

 In addition to these (and connecting the two), a VLERange would offer 
 two additional primitives:
 
 1. size_t stepSize(size_t offset) gives the length of the step needed to 
 skip to the next element.
 
 2. size_t backstepSize(size_t offset) gives the size of the _backward_ 
 step that goes to the previous element.
 
 In both cases, offset is assumed to be at the beginning of a logical 
 element of the range.

So when I move the spinner in an iPod, I get catapulted in position with the 
raw data opIndex and from there I try to work my way to the next frame to start 
playback. Sounds promising.

 I suspect that a lot of functions in std.string can be written without 
 Unicode-specific knowledge just by relying on such an interface. 
 Moreover, algorithms can be generalized to other structures that use 
 variable-length encoding, such as those used in data compression. (In 
 that case, the support would be a bit array and the encoded type would 
 be ubyte.)

I agree, acknowledging encoding/compression as a general direction will bring 
substantial benefits.

 Writing to such ranges is not addressed by this design. Ideas are welcome.

Yeah, we can address outputting later, that's fair.

 Adding VLERange would legitimize strings and would clarify their 
 handling, at the cost of adding one additional concept that needs to be 
 minded. Is the trade-off worthwhile?

Well, the only way to find out is try it. My advice: VLERanges originated as a 
solution to the string problem, so start with a non-string incarnation. Having 
at least two (one, we know, is string) plugs that fit the same socket will spur 
confidence in the abstraction. 

-- 
Tomek



Re: std.unittests for (final?) review [Update]

2011-01-10 Thread Tomek Sowiński
Jonathan M Davis napisał:

 I followed Andrei's suggestion and merged most of the functions into a highly 
 flexible assertPred. I also renamed the functions as suggested and attempted 
 to 
 fully document everything with fully functional examples instead of examples 
 using types or functions which don't actually exist.

Did you zip the right file? I still see things like nameFunc and assertPlease.

On the whole the examples are too long. It's just daunting I can't see docs for 
*one* function without scrolling. Please give them a solid hair-cut -- max 10 
lines with a median of 5. The descriptions are also watered down by 
over-explanatory writing.

 So, now there's just assertThrown, assertNotThrown, collectExceptionMsg, and 
 assertPred (though there are eight different overloads of assertPred). So, 
 review 
 away.

Some suggestions:

assertPred:
Try putting expected in front; uniform call syntax can then set it apart from 
the operands:
assertPred!%(7, 5, 2); // old
2.assertPred!%(7, 5); // new

assertNotThrown: chain the original exception with AssertError as its cause?
Oh, this one badly needs a real-life example.

assertThrown: I'd rather see generified collectException (call it 
collectThrown?). assertThrown may stay as a convenience wrapper, though.


Looking at the code I'm seeing the same cancerous coding style std.datetime 
suffered from (to a lesser extent, I admit).

For instance, this routine:

if(result != expected)
{
if(msg.empty)
{
throw new AssertError(format(`assertPred!%s failed: [%s] %s [%s]: 
actual [%s], expected [%s].`,
 op,
 lhs,
 op,
 rhs,
 result,
 expected),
   file,
   line);
}
else
{
throw new AssertError(format(`assertPred!%s failed: [%s] %s [%s]: 
actual [%s], expected [%s]: %s`,
 op,
 lhs,
 op,
 rhs,
 result,
 expected,
 msg),
  file,
  line);
}
}

can be easily compressed to:

enforce(result==expected, new AssertError(
format([%s] %s [%s] failed: actual [%s], expected [%s] ~ (msg.empty ? . 
: : %s),
op, lhs, op, rhs, result, expected, msg), file, line));

BTW, actual and expected should be in new lines directly under each other for 
eye-diffing (does wonders for long input):
format([%s] %s [%s] failed:\n[%s] - actual\n[%s] - expected ~ ...

Another example:

{
bool thrown = false;
try
assertNotThrown!AssertError(throwEx(new AssertError(It's an 
AssertError, __FILE__, __LINE__)), It's a message);
catch(AssertError)
thrown = true;

assert(thrown);
}

can be:

try {
assertNotThrown!AssertError(throwEx(new AssertError(It's an 
AssertError, __FILE__, __LINE__)), It's a message);
assert(false);
} catch(AssertError) { /*OK*/ }

and you don't have to introduce a new scope every time.

Not to mention that such routines recur in your code with little discrepancies, 
so abstracting out private helpers may pay off. Fixing such readability bugs 
is essential for a standard library module.

On the bright side, I do appreciate the thoroughness and extent of unittests in 
this module. Is coverage 100%?

 From the sounds of it, if this code gets voted in, it'll be going into 
 std.exception.

Please don't rush the adoption. This module, albeit useful, still needs work.

-- 
Tomek



  1   2   3   4   >