Graham Fawcett wrote:
On Tue, 04 May 2010 09:09:29 -0700, Andrei Alexandrescu wrote:

Graham Fawcett wrote:
On Mon, 03 May 2010 16:01:30 -0700, Andrei Alexandrescu wrote:

Graham Fawcett wrote:
The fact that libxml2/libxslt support not only XML parsing and DOM
building, but also XSLT, XPath, XPointer, XInclude, RelaxNG, etc.,
means that any homegrown library will be hard-pressed to cover the
same range of tools and features.

There are too many half-baked XML libraries in the world. No
disrespect intended to opticron or anyone else; it just doesn't make
a lot of sense to reinvent such a complex wheel (and believing that
XML processing isn't complex is a sure sign that your homegrown
library's design is incomplete!).

Graham
I think what we need for the standard library is to take a solid XML
library licensed generously and adapt it to work with arbitrary
ranges.
By "adapt" do you mean writing a wrapper for an existing library, or
translating the source code of the library into D?

What constitutes a "generous license" in this context? (For what it's
worth, libxml2 is under the MIT License.)

Graham
We'd need to modify the code. I haven't looked into available xml
libraries so I don't know which would be eligible.

I think I understand your motivations: this is standard library, and
so you want to minimize dependencies. But from a maintenance
perspective, it seems a bad idea to translate a complex library into D
code that few people will actively maintain -- whereas writing a
wrapper (and introducing a library dependency) would keep the codebase
small, let you share maintenance costs with the third-party library's
developers, and (arguably) increase the stability and quality of the
stdlib?

I am not pushing for libxml2 as The Answer. I'm just questioning the
motivation to translate other people's code to D, when the D platform
excels at library integration. (Although I agree with your suggestion
to borrow inspiration/code from Boost for datetime and other features;
that's different, since Boost cannot feasibly be wrapped.)

Best,
Graham

My concern is purely technical - a library we just link to would force a number of choices, such as input representation (e.g. arrays of char). Ideally we should be able to change the library to accept any compatible range of any compatible characters.

As a simple example, consider std.algorithm.levenshteinDistance. There are plenty of good implementations and initially I just wrote one almost identical to the Web lore. However, later I needed to compute Levenshtein distances between strings stored in lists (tries, actually). Well that doesn't work because the implementation at that time used random access s[i] and t[i] all over the place. But it wasn't difficult to change the algorithm to work with forward ranges. So now we have one of the few Levenshtein distance implementations that work with other inputs than arrays. In particular, we work correctly with UTF inputs without needing to copy the input, something that I haven't seen anywhere else. If you google for ``levenshtein utf'' Google will even think the query has a typo. Search results include an OCaml implementation that copies the input (http://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Levenshtein_distance#OCaml) and a Ruby implementation that also copies the input (http://rubyforge.org/frs/?group_id=2080&release_id=7389). By using the range abstraction, we get to support UTF Levenshtein without significant additional implementation effort - the code is very similar to the one using indices throughout.



Andrei

Reply via email to