[farsiweb]about xml databases

Omid Milani Thu, 26 Sep 2002 04:48:17 -0700

As many of you may know, one of our problems in working with XML documents (specially in web applications) was that all (native) XML databases did not support Unicode correctly (or completely). I had tested two or three of them (open source ones) and I hadn’t found any usable one, and also had worked a bit on xindices (the solution of apache group) sources, and had been despaired. But since this plays an essential role for web pages based on XML documents, Again, I started playing with them today.

This time I focused on another one, called eXist. This is still very young, but the group of it’s developers seem to be much more active than those of xindice, since there are many versions of it released in this one or two months I’ve known it. This one has also problems with Unicode, but as I found out, no corruption occurs in data, but at the moment of sending data to server (from client) a check of well-formedness occurs, and documents having non-English (I’m not sure what characters, but sure many of Persian ones) tags are rejected; so till now the only problem is caused for documents using Persian tags, but this didn’t suffice in my view, so I worked more with it, and I found there is a possibility in it, to work directly with a server process, which is started in your application, not the real server (having the real server working for itself and answering to request, we use some codes of it in our program to change the database instead of requesting the server for that. Sure the process should be run on server machine and have needed permissions.). Using this method of adding documents, the check of wel-formedness is done without problems, and the document of having Persian tags is added to database. So after the long story, I can give the good news: we can use eXist to store any xml document.

I also checked out xindice again (I had forgotten what was problem of it exactly), and I found out it can’t (anyway) store documents having non-English tags, and also it corrupts some letters of Persian (first of those is Farsi yeh, but it wasn’t the last one), so is not usable for Persian information anyway.

After all of these, there was a question about performance of these, and as I’ve heard, there had been a simple test on xindice (Mehran said me something), and by the performance, the program seemed very hopeless to be able to work in real world. I’m not sure what they had done, but as I’ve read here and there, to get an acceptable performance in using those databases for complicated xpath queries, one should use a good indexing for his documents. None of currently existing XML database engines can automatically find a good indexing for a given schema (or DTD), and in fact, the hardly support use of schemas and namespaces in XML documents, but fortunately both of xindice and eXist give the user possibility of defining an indexing method for some documents, so anyone tries to use those, should read about this. Also about the performance, as I read somewhere, the maximum number of children any node have is critical, and this number should not be more that 800 or 1000 too get and acceptable result. I will test the performance of each of these two as soon as possible, and will inform those interested in it.

[farsiweb]about xml databases

Reply via email to