On Apr 16, 2005, at 1:17 PM, Wolfgang Hoschek wrote:
Note that "fish*~" is not a valid query expression :)

Perhaps the Lucene QueryParser should throw an exception then. Currently 1.4.3 accepts the expression as is without grumbling...

Several minor QueryParser weirdnesses like this have turned up recently. Sure enough, that is an odd one. It parses into a PrefixQuery for "fish*" and the ~ is dropped. I consider this a bug as this should really be a parse exception. I've just filed this as a bug:


        http://issues.apache.org/bugzilla/show_bug.cgi?id=34486


If you're looking for an XML DB for managing and querying large persistent data volumes, Nux/Saxon will disappoint you.

I want to store at least several hundred MB up to gigabytes and have this queryable with XQuery. We previously used Tamino with XPath, but our XML is not well enough normalized to make this very feasible to query. eXist, last I toyed with it, only scaled to 50MB.


Ok, so Nux/Saxon is out for our uses.  Any recommendations though?

Could you avoid calling match() twice here?

That's no problem for two reasons:
1) The XQuery optimizer rewrites the query into an optimized expression tree eliminating redundancies, etc. If for some reason this isn't feasible or legal then
2) There's a smart cache between the XQuery engine and the lucene invocation that returns results in O(1) for Lucene queries that have already been seen/processed before. It caches (queryString,result), plus parsed Lucene queries, plus the Lucene index data structure for any given string text (which currently is a simple RAMDirectory but could be whatever datastructure we come up with as part of the exercise - class StringIndex or some such). This works so well that I have to disable the cache to avoid getting astronomically good figures on artificial benchmarks.

Cool.

BTW, I have some small performance patches for FastCharStream and in various other places, but I'll hold off proposing those until our exercise is done and the real merits/drawbacks of those patches can be better assessed.

Excellent... we're always interested in performance improvements!

        Erik


--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to