Chris Fraschetti wrote:
One cheap solution is to ask the user to enter at least 3 alfa-numerical chars.absoultely, limiting the user's query is no problem here. I've currently implemented the lucene javascript to catcha lot of user quries that could cause issues.. blank queries, ? or * at the beginning of query, etc etc... but I couldn't think of a way to prevent the user from doing a* but not comment* wanting comments or commentary... any suggestions would be warmly welcomed.
What do you say about that?
All the best,
Sergiu
On Mon, 4 Oct 2004 14:08:00 -0400 (EDT), Stephane James Vaucher
<[EMAIL PROTECTED]> wrote:
Ok, got it, got a small comment though.
For large wildcard queries, please note that google does not support wild cards. Search hell*, and there will be no correct matches with hello.
Is there a reason why you wish to allow such large queries? We might be able to find alternative ways of helping you out. No one will use a query a*. If someone does, the results would be completely meaningless (many false positives for a user). However a query like program* might be interesting to a user.
The problem with hacking term expansion is that the rules of this expansion might be hard to define (as is maybe one should use the first, the most frequent terms or the even the least frequent, depending on your app).
sv
On Mon, 4 Oct 2004, Chris Fraschetti wrote:
The date portion of my code works great now.. no problems there, so
let me thank you now for your date filter solution... but my current problem is in regards to a stand alone.... a* query giving me the too many clauses exception....
On Mon, 4 Oct 2004 12:47:24 -0400 (EDT), Stephane James Vaucher
<[EMAIL PROTECTED]> wrote:
BTW, what's wrong with the DateFilter solution, I mentionned earlier?
I've used it before (before lucene-1.4 though) without memory problems, thus I always assumed that it avoided the allocation problems with prefix queries.
sv
On Mon, 4 Oct 2004, Chris Fraschetti wrote:
Surely some folks out there have used lucene on a large scale and have had to compensate for this somehow, any other solutions? Morus, thank you very more for your imput, and I am looking into your solution, just putting my feelers out there once more.
The lucene API is very limited as to it's descriptions of it's components, short of digging into the code, is there a good doc somewhere out there that explains the workins of lucene?
On Mon, 4 Oct 2004 01:57:06 -0700, Chris Fraschetti
<[EMAIL PROTECTED]> wrote:
So before I spend a significant amount of time digging into the lucene code, how does your experience with lucene give light to my situation.... Our current index is pretty huge, and with each increase in side i've had, i've experienced a problem like this... Without taking up too much of your time.. because obviously this i my task, I thought i'd ask you if you'd had any experience with this boolean clause nonsense... of course it can be overcome, but if you know a quick hack, awesome, otherwise.. no big, but off to work i go :)
-Fraschetti
---------- Forwarded message ---------- From: Morus Walter <[EMAIL PROTECTED]> Date: Mon, 4 Oct 2004 09:01:50 +0200 Subject: Re: BooleanQuery - Too Many Clases on date range. To: Lucene Users List <[EMAIL PROTECTED]>, Chris Fraschetti <[EMAIL PROTECTED]>
Chris Fraschetti writes:
So i decicded to move my epoch date to the 20040608 date which fixed my boolean query problem in regards to my current data size (approx 600,000) ....
but now as soon as I do a query like ... a*
I get the boolean error again. Google obviously can handle this query,
and I'm pretty sure lucene can handle it.. any ideas? With out
without a date dange specified i still get the TooManyClauses error.
I tired cranking the maxclauses up to Integer.MaxInt, but java gave me
a out of memory error. Is this b/c the boolean search tried to
allocate that many clauses by default or because my query actually
needed that many clauses?
boolean search allocates clauses for all tokens having the prefix or matching the wildcard expression.
Why does it work on small indexes but not
large?
Because there are fewer tokens starting with a.
Is there any way to have the parser create as many clauses as it can and then search with what it has? w/o recompiling the source?
You need to create your own version of Wildcard- and Prefix-Query that takes a maximum term number and ignores further clauses. And you need a variant of the query parser that uses these queries.
This can be done, even without recompiling lucene, but you will have to do some programming at the level of lucene queries. Shouldn't be hard, since you can use the sources as a starting point.
I guess this does not exist because the lucene developer decided to prefer a query error rather than uncomplete results.
Morus
-- ___________________________________________________ Chris Fraschetti, Student CompSci System Admin University of San Francisco e [EMAIL PROTECTED] | http://meteora.cs.usfca.edu
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]