Chris Fraschetti wrote:

absoultely, limiting the user's query is no problem here. I've
currently implemented the lucene javascript to catcha lot of user
quries that could cause issues.. blank queries, ? or * at the
beginning of query, etc etc... but I couldn't think of a way to
prevent the user from doing a*  but not   comment*   wanting comments
or commentary...  any suggestions would be warmly welcomed.



One cheap solution is to ask the user to enter at least 3 alfa-numerical chars.
What do you say about that?


 All the best,

 Sergiu

On Mon, 4 Oct 2004 14:08:00 -0400 (EDT), Stephane James Vaucher
<[EMAIL PROTECTED]> wrote:


Ok, got it, got a small comment though.

For large wildcard queries, please note that google does not support wild
cards. Search hell*, and there will be no correct matches with hello.

Is there a reason why you wish to allow such large queries? We might
be able to find alternative ways of helping you out. No one will use a
query a*. If someone does, the results would be completely meaningless
(many false positives for a user). However a query like program* might be
interesting to a user.

The problem with hacking term expansion is that the rules of this
expansion might be hard to define (as is maybe one should use the
first, the most frequent terms or the even the least frequent, depending
on your app).

sv

On Mon, 4 Oct 2004, Chris Fraschetti wrote:



The date portion of my code works great now.. no problems there, so




let me thank you now for your date filter solution... but my current
problem is in regards to a stand alone....   a*     query giving me
the too many clauses exception....


On Mon, 4 Oct 2004 12:47:24 -0400 (EDT), Stephane James Vaucher
<[EMAIL PROTECTED]> wrote:


BTW, what's wrong with the DateFilter solution, I mentionned earlier?

I've used it before (before lucene-1.4 though) without memory problems,
thus I always assumed that it avoided the allocation problems with prefix
queries.

sv



On Mon, 4 Oct 2004, Chris Fraschetti wrote:



Surely some folks out there have used lucene on a large scale and have
had to compensate for this somehow, any other solutions? Morus, thank
you very more for your imput, and I am looking into your solution,
just putting my feelers out there once more.

The lucene API is very limited as to it's descriptions of it's
components, short of digging into the code, is there a good doc
somewhere out there that explains the workins of lucene?


On Mon, 4 Oct 2004 01:57:06 -0700, Chris Fraschetti
<[EMAIL PROTECTED]> wrote:


So before I spend a significant amount of time digging into the lucene
code, how does your experience with lucene give light to my
situation....  Our current index is pretty huge, and with each
increase in side i've had, i've experienced a problem like this...
Without taking up too much of your time.. because obviously this i my
task, I thought i'd ask you if you'd had any experience with this
boolean clause nonsense...  of course it can be overcome, but if you
know a quick hack, awesome, otherwise.. no big, but off to work i go
:)

-Fraschetti


---------- Forwarded message ---------- From: Morus Walter <[EMAIL PROTECTED]> Date: Mon, 4 Oct 2004 09:01:50 +0200 Subject: Re: BooleanQuery - Too Many Clases on date range. To: Lucene Users List <[EMAIL PROTECTED]>, Chris Fraschetti <[EMAIL PROTECTED]>

Chris Fraschetti writes:


So i decicded to move my epoch date to the  20040608 date which fixed
my boolean query problem in regards to my current data size (approx
600,000) ....

but now as soon as I do a query like ... a*
I get the boolean error again. Google obviously can handle this query,
and I'm pretty sure lucene can handle it.. any ideas? With out
without a date dange specified i still get the TooManyClauses error.




I tired cranking the maxclauses up to Integer.MaxInt, but java gave me
a out of memory error. Is this b/c the boolean search tried to
allocate that many clauses by default or because my query actually
needed that many clauses?


boolean search allocates clauses for all tokens having the prefix or
matching the wildcard expression.



Why does it work on small indexes but not
large?


Because there are fewer tokens starting with a.



Is there any way to have the parser create as many clauses as
it can and then search with what it has? w/o recompiling the source?



You need to create your own version of Wildcard- and Prefix-Query
that takes a maximum term number and ignores further clauses.
And you need a variant of the query parser that uses these queries.

This can be done, even without recompiling lucene, but you will have to
do some programming at the level of lucene queries.
Shouldn't be hard, since you can use the sources as a starting point.

I guess this does not exist because the lucene developer decided to prefer
a query error rather than uncomplete results.

Morus


-- ___________________________________________________ Chris Fraschetti, Student CompSci System Admin University of San Francisco e [EMAIL PROTECTED] | http://meteora.cs.usfca.edu







---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


















--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to