XML vs. SQL hmm.

It's worth recalling *one of* the rationales behind XML: When bytes were expensive, machine to machine communication especially across company boundaries (read EDI) couldn't afford to be self-documenting. Huge binders of ANSI EDI specifications were required to correctly parse trickles of ASCII characters coming across x.400 VANs.

EDI consultants made their living on the fact that no two specification binders were written the same.

XML, and our world of cheap bytes puts those Spec. Binders in the actual document itself and EDI consultants are less valuable.

Now in the 'good-ol-days', we didn't store random access data in EDI files. Not sure of the benefit of doing the same today.

Just showing my age and opinions.

Matthew

http://www.redmac.ca - Getting Canadian's their Macintosh accessories
http://www.justaddanoccasion.com - Great gift ideas, featuring smoked salmon


On Feb 4, 2004, at 9:24 AM, Chris Devers wrote:

I can give a longer reply later, but it's my birthday and I'm about to go
out for a late breakfast & a movie :)



On Wed, 4 Feb 2004, Bill Stephenson wrote:


Chris, you've almost convinced me, but I have to ask, is it really so
inefficient to search through one directory with 5000 sub-directories
to find one that matches the (user) name your looking for? Isn't that
what Perl is supposed to be good at?

I'll turn the question around and let you try answering it yourself:


Which do you think will be faster, traversing a directory tree, scanning
the contents of N-thousand trees & files looking for something, or
grepping the contents of one file looking for what you want? I.e., which
is likely to be faster -- this --


$ find /var/lib/xmldb/app1/ -type f | xargs grep 'foo'

-- or this --

$ grep 'foo' /var/lib/csvdb/app1/record.csv

?

My hunch is that the second approach will be *way* faster almost always.

Now of course that's a biased example, and you could do a lot to speed up
the first approach by pruning the tree that you're digging in. But, I
still think the basic point holds: if you keep the data in one file in a
well organized way, that's always likely to be faster.


If, as a lot of people say, MySQL is basically just a SQL interface to
your filesystem, then MySQL is closer than you might think to the basic
"grep /pattern/ file" approach. My hunch is that most XML solutions, even
very good ones, are structurally going to be more similar to the bigger
"find /path | xargs grep" approach.


I'm willing to be proven wrong here, but I am reasonably sure about this.

If you had 100,000 directories you could alphabetize and place them in
sub-directories that would hold, on average, less than the 5000
mentioned above.

It's called a B-Tree algorithm, and a lot of databases will already have
invisible mechanisms in place for you to do this out of the box.


Would you rather be re-implementing fundamental algorithms in your app, or
can you trust some database vendor to have already done the work for you,
so that you can just say "index fields A, B, and C", and you can get on
with the application-specific bits of your work?


Put yet another, snarkier way, if you'd rather be doing the basic search &
retrieval stuff by hand, why aren't you using C/C++ instead of Perl? :)


Really though, I do think a framework like this should be easiest:

* Data managed by some kind of database, even a toy one like MySQL or a
pseudo one like SQLite.


* A thin layer of Perl to insert & retrieve your data

* A template engine like Template Toolkit or HTML::Template to, if you
choose, wrap your data in XML syntax for exchange with others, and/or
to present your data through a web interface if that's what you need.


Honestly, the template engine could be the hardest part here, and it
really isn't that hard (that's why they exist -- to hide the hard bits for
you :). Keeping data in a database & retrieving it with something like
Perl/DBI, or the database access libraries for Python, PHP, Java, etc, is
all a Solved Problem. All that's left to do is read the docs :)


Here:

A Short Guide to DBI:
<http://www.perl.com/pub/1999/10/DBI.html>

Cooking with Perl, Part 2 (talks about SQLite)
<http://www.perl.com/pub/a/2003/09/03/perlcookbook.html>

Database Programming with Perl:
<http://www.perl.com/lpt/a/2003/10/23/databases.html>

DBI perldoc
<http://www.perldoc.com/perl5.6.1/lib/DBI.html>

Actual paper book, _Programming the Perl DBI_
<http://www.oreilly.com/catalog/perldbi/>


Go read :)





and on that note, time to go...





-- Chris Devers




Reply via email to