Re: Search results excerpt similar to Google

2005-01-27 Thread Jason Polites
I think they do a proximity result based on keyword matches.  So... If you 
search for "lucene" and the document returned has this word at the very 
start and the very end of the document, then you will see the two sentences 
(sequences of words) surrounding the two keyword matches, one from the start 
of the document and one from the end.

How you determine which words from the result you include in the summary is 
up to you.  The problem with this it that in Lucene-land you have to store 
the content of the document inside in index verbatim (so you can get 
arbitrary portions of it out).  This means your index will be larger than it 
really needs to be.

I usually just store the first 255 characters in the index and use this as a 
summary.  It's not as good as Google, but it seems to work ok.

- Original Message - 
From: "Ben" <[EMAIL PROTECTED]>
To: "Lucene" 
Sent: Friday, January 28, 2005 5:08 PM
Subject: Search results excerpt similar to Google


Hi
Is it hard to implement a function that displays the search results
excerpts similar to Google?
Is it just string manipulations or there are some logic behind it? I
like their excerpts.
Thanks
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: google mini? who needs it when Lucene is there

2005-01-27 Thread David Spencer
Xiaohong Yang (Sharon) wrote:
Hi,
 
I agree that Google mini is quite expensive.  It might be similar to the desktop version in quality.  Anyone knows google's ratio of index to text?   Is it true that Lucene's index is about 500 times the original text size (not including image size)?  I don't have one installed, so I cannot measure.
500:1 for Lucene?  I don't think so.
In my wikipedia search engine the data in the MySQL DB I index from is 
approx 1.0 GB (sum of lengths of title and body), while the Lucene index 
of just these 2 fields is 250MB, thus in this case the Lucene index is 
25% of the corpus size.


 
Best,
 
Sharon

jian chen <[EMAIL PROTECTED]> wrote:
Hi,
I was searching using google and just found that there was a new
feature called "google mini". Initially I thought it was another free
service for small companies. Then I realized that it costs quite some
money ($4,995) for the hardware and software. (I guess the proprietary
software costs a whole lot more than actual hardware.)
The "nice" feature is that, you can only index up to 50,000 documents
with this price. If you need to index more, sorry, send in the
check...
It seems to me that any small biz will be ripped off if they install
this google mini thing, compared to using Lucene to implement a easy
to use search software, which could search up to whatever number of
documents you could image.
I hope the lucene project could get exposed more to the enterprise so
that people know that they have not only cheaper but more importantly,
BETTER alternatives.
Jian
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Search results excerpt similar to Google

2005-01-27 Thread Ben
Hi

Is it hard to implement a function that displays the search results
excerpts similar to Google?

Is it just string manipulations or there are some logic behind it? I
like their excerpts.

Thanks

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: rackmount lucene/nutch - Re: google mini? who needs it when Lucene is there

2005-01-27 Thread David Spencer
Jason Polites wrote:
I think everyone agrees that this would be a very neat application of 
opensource technology like Lucene... however (opens drawer, pulls out 
devil's advocate hat, places on head)... there are several complexities 
here not addressed by Lucene (et. al).  Not because Lucene isn't damn 
fantastic, just because it's not its job.

One of the big ones is security.  Enterprise search is no good if it 
doesn't match up with the authentication and authorization paradigms 
existing in the organisation.  How useful is it to return a whole bunch 
of search results for documents to which you don't have access? Not to 
mention the issues around whether you are even authorized to know it 
exists.
I was gonna mention this - you beat me to the punch.  I suspect that 
LDAP/JNDI itegration is a start, but you need hooks for an arbitrary 
auth plugin. And once we address this it might be the case that a user 
has to *log in* to the search server.  We have Verity where I work and 
this is all the case, along w/ the fact that a sale seems to involve 
mandatory consulting work (not that that's bad, but if you're trying to 
ship a shrink wrapped search engine in a box then this is an issue).

The other prickly one is file types.  It's all well and good to index 
HTML, XML and text but when you start looking at PDF, MS Office (OLE 
docs, PSTs, Outlook MSG files, MS Project files etc), Lotus Notes 
databases etc etc, things begin to look less simple and far less elegant 
than a nice clean lucene rackmount.  Sure there are great projects like 
Apache POI but they are still have a bit of a way to go before they 
mature to a point of really solving these problems.  After which time 
Microsoft will probably be rolling out Longhorn and everyone may need to 
start from scratch.
Also need http://jcifs.samba.org/ so you can spider windows file shares.
This is not to say that it's not a great idea, but as with most great 
ideas the challenge is not the formation of the idea, but its 
implementation.
Indeed.
I think a great first step would be to start developing good, reliable, 
opensource extensions to Lucene which strive to solve some of these issues.

end rant.
- Original Message - From: "Otis Gospodnetic" 
<[EMAIL PROTECTED]>
To: "Lucene Users List" 
Sent: Friday, January 28, 2005 12:40 PM
Subject: Re: rackmount lucene/nutch - Re: google mini? who needs it when 
Lucene is there


I discuss this with myself a lot inside my head... :)
Seriously, I agree with Erik.  I think this is a business opportunity.
How many people are hating me now and going "shh"?  Raise your
hands!
Otis
--- David Spencer <[EMAIL PROTECTED]> wrote:
This reminds me, has anyone every discussed something similar:
- rackmount server ( or for coolness factor, that mini mac)
- web i/f for config/control
- of course the server would have the following s/w:
-- web server
-- lucene / nutch
Part of the work here I think is having a decent web i/f to configure
the thing and to customize the L&F of the search results.

jian chen wrote:
> Hi,
>
> I was searching using google and just found that there was a new
> feature called "google mini". Initially I thought it was another
free
> service for small companies. Then I realized that it costs quite
some
> money ($4,995) for the hardware and software. (I guess the
proprietary
> software costs a whole lot more than actual hardware.)
>
> The "nice" feature is that, you can only index up to 50,000
documents
> with this price. If you need to index more, sorry, send in the
> check...
>
> It seems to me that any small biz will be ripped off if they
install
> this google mini thing, compared to using Lucene to implement a
easy
> to use search software, which could search up to whatever number of
> documents you could image.
>
> I hope the lucene project could get exposed more to the enterprise
so
> that people know that they have not only cheaper but more
importantly,
> BETTER alternatives.
>
> Jian
>
>
-
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail:
[EMAIL PROTECTED]
>
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: google mini? who needs it when Lucene is there

2005-01-27 Thread jian chen
Overall, even if google mini gives a lot of cool features compared to
a bare-born lucene project, what is good with the 50,000 documents
limit. It is useless with that limit. That is just their way of trying
to turn it into another cash cow.

Jian


On Thu, 27 Jan 2005 17:45:03 -0800 (PST), Otis Gospodnetic
<[EMAIL PROTECTED]> wrote:
> 500 times the original data?  Not true! :)
> 
> Otis
> 
> --- "Xiaohong Yang (Sharon)" <[EMAIL PROTECTED]> wrote:
> 
> > Hi,
> >
> > I agree that Google mini is quite expensive.  It might be similar to
> > the desktop version in quality.  Anyone knows google's ratio of index
> > to text?   Is it true that Lucene's index is about 500 times the
> > original text size (not including image size)?  I don't have one
> > installed, so I cannot measure.
> >
> > Best,
> >
> > Sharon
> >
> > jian chen <[EMAIL PROTECTED]> wrote:
> > Hi,
> >
> > I was searching using google and just found that there was a new
> > feature called "google mini". Initially I thought it was another free
> > service for small companies. Then I realized that it costs quite some
> > money ($4,995) for the hardware and software. (I guess the
> > proprietary
> > software costs a whole lot more than actual hardware.)
> >
> > The "nice" feature is that, you can only index up to 50,000 documents
> > with this price. If you need to index more, sorry, send in the
> > check...
> >
> > It seems to me that any small biz will be ripped off if they install
> > this google mini thing, compared to using Lucene to implement a easy
> > to use search software, which could search up to whatever number of
> > documents you could image.
> >
> > I hope the lucene project could get exposed more to the enterprise so
> > that people know that they have not only cheaper but more
> > importantly,
> > BETTER alternatives.
> >
> > Jian
> >
> > -
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> >
> >
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
>

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: query term frequency

2005-01-27 Thread Jonathan Lasko
No, the number of occurrences of a term in a Query.

Jonathan

Quoting David Spencer <[EMAIL PROTECTED]>:

> Jonathan Lasko wrote:
> 
> > What do I call to get the term frequencies for terms in the Query?  I 
> > can't seem to find it in the Javadoc...
> 
> Do you mean the # of docs that have a term?
> 
>
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/IndexReader.html#docFreq(org.apache.lucene.index.Term)
> > Thanks.
> > 
> > Jonathan
> > 
> > -
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> > 
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 
> 
> 



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: LuceneReader.delete (term t) Failure ?

2005-01-27 Thread Erik Hatcher
Could you work up a self-contained RAMDirectory-using example that 
demonstrates this issue?

Erik
On Jan 27, 2005, at 9:10 PM, <[EMAIL PROTECTED]> wrote:
Erik,
I am using the keyword field
doc.add(Field.Keyword("uid", pathRelToArea));
anything else I can check on ?
thanks
atul
PS we worked together for Darden project

From: Erik Hatcher <[EMAIL PROTECTED]>
Date: 2005/01/27 Thu PM 07:46:40 EST
To: "Lucene Users List" 
Subject: Re: LuceneReader.delete (term t) Failure ?
How did you index the "uid" field?  Field.Keyword?  If not, that may 
be
the problem in that the field was analyzed.  For a key field like 
this,
it needs to be unanalyzed/untokenized.

Erik
On Jan 27, 2005, at 6:21 PM, <[EMAIL PROTECTED]> wrote:
Hi,
I am trying to delete a document from Lucene index using:
 Term aTerm = new Term( "uid", path );
 aReader.delete( aTerm );
 aReader.close();
If the variable path="xxx/foo.txt" then I am able to delete the
document.
However, if path variable has "-" in the string, the delete method
does not work
  e.g. path="xxx-yyy/foo.txt"  // Does Not work!!
Can I get around this problem.  I cannot subsitute minus character
with '.' as
it has other implications.
is this a bug ? I am using Lucene 1.4-final version.
Thanks for the help
Atul
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Re: LuceneReader.delete (term t) Failure ?

2005-01-27 Thread akedar
Erik,

I am using the keyword field 
doc.add(Field.Keyword("uid", pathRelToArea));
anything else I can check on ?

thanks
atul 

PS we worked together for Darden project 


> 
> From: Erik Hatcher <[EMAIL PROTECTED]>
> Date: 2005/01/27 Thu PM 07:46:40 EST
> To: "Lucene Users List" 
> Subject: Re: LuceneReader.delete (term t) Failure ?
> 
> How did you index the "uid" field?  Field.Keyword?  If not, that may be 
> the problem in that the field was analyzed.  For a key field like this, 
> it needs to be unanalyzed/untokenized.
> 
>   Erik
> 
> On Jan 27, 2005, at 6:21 PM, <[EMAIL PROTECTED]> wrote:
> 
> > Hi,
> >
> > I am trying to delete a document from Lucene index using:
> >
> >  Term aTerm = new Term( "uid", path );
> >  aReader.delete( aTerm );
> >  aReader.close();
> >
> > If the variable path="xxx/foo.txt" then I am able to delete the 
> > document.
> >
> > However, if path variable has "-" in the string, the delete method 
> > does not work
> >
> >   e.g. path="xxx-yyy/foo.txt"  // Does Not work!!
> >
> >
> > Can I get around this problem.  I cannot subsitute minus character 
> > with '.' as
> > it has other implications.
> >
> > is this a bug ? I am using Lucene 1.4-final version.
> >
> > Thanks for the help
> > Atul
> >
> >
> > -
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: rackmount lucene/nutch - Re: google mini? who needs it when Lucene is there

2005-01-27 Thread Jason Polites
I think everyone agrees that this would be a very neat application of 
opensource technology like Lucene... however (opens drawer, pulls out 
devil's advocate hat, places on head)... there are several complexities here 
not addressed by Lucene (et. al).  Not because Lucene isn't damn fantastic, 
just because it's not its job.

One of the big ones is security.  Enterprise search is no good if it doesn't 
match up with the authentication and authorization paradigms existing in the 
organisation.  How useful is it to return a whole bunch of search results 
for documents to which you don't have access? Not to mention the issues 
around whether you are even authorized to know it exists.

The other prickly one is file types.  It's all well and good to index HTML, 
XML and text but when you start looking at PDF, MS Office (OLE docs, PSTs, 
Outlook MSG files, MS Project files etc), Lotus Notes databases etc etc, 
things begin to look less simple and far less elegant than a nice clean 
lucene rackmount.  Sure there are great projects like Apache POI but they 
are still have a bit of a way to go before they mature to a point of really 
solving these problems.  After which time Microsoft will probably be rolling 
out Longhorn and everyone may need to start from scratch.

This is not to say that it's not a great idea, but as with most great ideas 
the challenge is not the formation of the idea, but its implementation.

I think a great first step would be to start developing good, reliable, 
opensource extensions to Lucene which strive to solve some of these issues.

end rant.
- Original Message - 
From: "Otis Gospodnetic" <[EMAIL PROTECTED]>
To: "Lucene Users List" 
Sent: Friday, January 28, 2005 12:40 PM
Subject: Re: rackmount lucene/nutch - Re: google mini? who needs it when 
Lucene is there


I discuss this with myself a lot inside my head... :)
Seriously, I agree with Erik.  I think this is a business opportunity.
How many people are hating me now and going "shh"?  Raise your
hands!
Otis
--- David Spencer <[EMAIL PROTECTED]> wrote:
This reminds me, has anyone every discussed something similar:
- rackmount server ( or for coolness factor, that mini mac)
- web i/f for config/control
- of course the server would have the following s/w:
-- web server
-- lucene / nutch
Part of the work here I think is having a decent web i/f to configure
the thing and to customize the L&F of the search results.

jian chen wrote:
> Hi,
>
> I was searching using google and just found that there was a new
> feature called "google mini". Initially I thought it was another
free
> service for small companies. Then I realized that it costs quite
some
> money ($4,995) for the hardware and software. (I guess the
proprietary
> software costs a whole lot more than actual hardware.)
>
> The "nice" feature is that, you can only index up to 50,000
documents
> with this price. If you need to index more, sorry, send in the
> check...
>
> It seems to me that any small biz will be ripped off if they
install
> this google mini thing, compared to using Lucene to implement a
easy
> to use search software, which could search up to whatever number of
> documents you could image.
>
> I hope the lucene project could get exposed more to the enterprise
so
> that people know that they have not only cheaper but more
importantly,
> BETTER alternatives.
>
> Jian
>
>
-
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail:
[EMAIL PROTECTED]
>
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: rackmount lucene/nutch - Re: google mini? who needs it when Lucene is there

2005-01-27 Thread Chris Lamprecht
As they say, nothing lasts forever ;)

I like the idea.  If a project like this gets going, I think I'd be
interested in helping.

The Google mini looks very well done (they have two demos on the web
page).  For $5000, it's probably a very good solution for many
businesses.  If the demos are accurate, it seems like you almost
literally plug it in, configure a few things using the web interface,
and you're in business.   Demos are at
http://www.google.com/enterprise/mini/product_tours_demos.html

-chris

On Thu, 27 Jan 2005 17:40:53 -0800 (PST), Otis Gospodnetic
<[EMAIL PROTECTED]> wrote:
> I discuss this with myself a lot inside my head... :)
> Seriously, I agree with Erik.  I think this is a business opportunity.
> How many people are hating me now and going "shh"?  Raise your
> hands!
> 
> Otis
> 
> --- David Spencer <[EMAIL PROTECTED]> wrote:
> 
> > This reminds me, has anyone every discussed something similar:
> >
> > - rackmount server ( or for coolness factor, that mini mac)
> > - web i/f for config/control
> >
> > - of course the server would have the following s/w:
> > -- web server
> > -- lucene / nutch
> >
> > Part of the work here I think is having a decent web i/f to configure
> >
> > the thing and to customize the L&F of the search results.
> >
> >
> >
> > jian chen wrote:
> > > Hi,
> > >
> > > I was searching using google and just found that there was a new
> > > feature called "google mini". Initially I thought it was another
> > free
> > > service for small companies. Then I realized that it costs quite
> > some
> > > money ($4,995) for the hardware and software. (I guess the
> > proprietary
> > > software costs a whole lot more than actual hardware.)
> > >
> > > The "nice" feature is that, you can only index up to 50,000
> > documents
> > > with this price. If you need to index more, sorry, send in the
> > > check...
> > >
> > > It seems to me that any small biz will be ripped off if they
> > install
> > > this google mini thing, compared to using Lucene to implement a
> > easy
> > > to use search software, which could search up to whatever number of
> > > documents you could image.
> > >
> > > I hope the lucene project could get exposed more to the enterprise
> > so
> > > that people know that they have not only cheaper but more
> > importantly,
> > > BETTER alternatives.
> > >
> > > Jian
> > >
> > >
> > -
> > > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > > For additional commands, e-mail:
> > [EMAIL PROTECTED]
> > >
> >
> >
> > -
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> >
> >
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
>

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Reloading an index

2005-01-27 Thread Chris Hostetter

: processes ended.  If you're under linux, try running the 'lsof'
: command to see if there are any handles to files marked "(deleted)".

: > Searcher, the old Searcher is closed and nulled, but I
: > still see about twice the amount of memory in use well
: > after the original searcher has been closed.   Is
: > there something else I can do to get this memory
: > reclaimed?  Should I explicitly call garbarge
: > collection?  Any ideas?

In addition to the previous advice, keep in mind that depending on the
implimentation of your JVM, it may never actually "free" memory back to
the OS.  And even the JVMs that can, only do it after a GC which results
in a ratio of unused/used memory that they deem worthy of freeing (usually
based on tunning parameters)

assuming you are using a Sun JVM, take a look at...

http://java.sun.com/docs/hotspot/gc1.4.2/index.html

...and search for MinHeapFreeRatio and MaxHeapFreeRatio


-Hoss


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: google mini? who needs it when Lucene is there

2005-01-27 Thread Otis Gospodnetic
500 times the original data?  Not true! :)

Otis

--- "Xiaohong Yang (Sharon)" <[EMAIL PROTECTED]> wrote:

> Hi,
>  
> I agree that Google mini is quite expensive.  It might be similar to
> the desktop version in quality.  Anyone knows google's ratio of index
> to text?   Is it true that Lucene's index is about 500 times the
> original text size (not including image size)?  I don't have one
> installed, so I cannot measure.
>  
> Best,
>  
> Sharon
> 
> jian chen <[EMAIL PROTECTED]> wrote:
> Hi,
> 
> I was searching using google and just found that there was a new
> feature called "google mini". Initially I thought it was another free
> service for small companies. Then I realized that it costs quite some
> money ($4,995) for the hardware and software. (I guess the
> proprietary
> software costs a whole lot more than actual hardware.)
> 
> The "nice" feature is that, you can only index up to 50,000 documents
> with this price. If you need to index more, sorry, send in the
> check...
> 
> It seems to me that any small biz will be ripped off if they install
> this google mini thing, compared to using Lucene to implement a easy
> to use search software, which could search up to whatever number of
> documents you could image.
> 
> I hope the lucene project could get exposed more to the enterprise so
> that people know that they have not only cheaper but more
> importantly,
> BETTER alternatives.
> 
> Jian
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Disk space used by optimize

2005-01-27 Thread Otis Gospodnetic
Have you tried using the multifile index format?  Now I wonder if there
is actually a difference in disk space cosumed by optimize() when you
use multifile and compound index format...

Otis

--- "Kauler, Leto S" <[EMAIL PROTECTED]> wrote:

> Our copy of LIA is "in the mail" ;)
> 
> Yes the final three files are: the .cfs (46.8MB), deletable (4
> bytes),
> and segments (29 bytes).
> 
> --Leto
> 
> 
> 
> > -Original Message-
> > From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] 
> > 
> > Hello,
> > 
> > Yes, that is how optimize works - copies all existing index 
> > segments into one unified index segment, thus optimizing it.
> > 
> > see hit #1:
> http://www.lucenebook.com/search?query=optimize+disk+space
> > 
> > However, three times the space sounds a bit too much, or I 
> > make a mistake in the book. :)
> > 
> > You said you end up with 3 files - .cfs is one of them, right?
> > 
> > Otis
> > 
> > 
> > --- "Kauler, Leto S" <[EMAIL PROTECTED]> wrote:
> > 
> > > 
> > > Just a quick question:  after writing an index and then calling 
> > > optimize(), is it normal for the index to expand to about 
> > three times 
> > > the size before finally compressing?
> > > 
> > > In our case the optimise grinds the disk, expanding the index
> into 
> > > many files of about 145MB total, before compressing down to three
> 
> > > files of about 47MB total.  That must be a lot of disk activity
> for 
> > > the people with multi-gigabyte indexes!
> > > 
> > > Regards,
> > > Leto
> 
> CONFIDENTIALITY NOTICE AND DISCLAIMER
> 
> Information in this transmission is intended only for the person(s)
> to whom it is addressed and may contain privileged and/or
> confidential information. If you are not the intended recipient, any
> disclosure, copying or dissemination of the information is
> unauthorised and you should delete/destroy all copies and notify the
> sender. No liability is accepted for any unauthorised use of the
> information contained in this transmission.
> 
> This disclaimer has been automatically added.
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: rackmount lucene/nutch - Re: google mini? who needs it when Lucene is there

2005-01-27 Thread Otis Gospodnetic
I discuss this with myself a lot inside my head... :)
Seriously, I agree with Erik.  I think this is a business opportunity.
How many people are hating me now and going "shh"?  Raise your
hands!

Otis

--- David Spencer <[EMAIL PROTECTED]> wrote:

> This reminds me, has anyone every discussed something similar:
> 
> - rackmount server ( or for coolness factor, that mini mac)
> - web i/f for config/control
> 
> - of course the server would have the following s/w:
> -- web server
> -- lucene / nutch
> 
> Part of the work here I think is having a decent web i/f to configure
> 
> the thing and to customize the L&F of the search results.
> 
> 
> 
> jian chen wrote:
> > Hi,
> > 
> > I was searching using google and just found that there was a new
> > feature called "google mini". Initially I thought it was another
> free
> > service for small companies. Then I realized that it costs quite
> some
> > money ($4,995) for the hardware and software. (I guess the
> proprietary
> > software costs a whole lot more than actual hardware.)
> > 
> > The "nice" feature is that, you can only index up to 50,000
> documents
> > with this price. If you need to index more, sorry, send in the
> > check...
> > 
> > It seems to me that any small biz will be ripped off if they
> install
> > this google mini thing, compared to using Lucene to implement a
> easy
> > to use search software, which could search up to whatever number of
> > documents you could image.
> > 
> > I hope the lucene project could get exposed more to the enterprise
> so
> > that people know that they have not only cheaper but more
> importantly,
> > BETTER alternatives.
> > 
> > Jian
> > 
> >
> -
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail:
> [EMAIL PROTECTED]
> > 
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Reloading an index

2005-01-27 Thread Chris Lamprecht
I just ran into a similar issue.  When you close an IndexSearcher, it
doesn't necessarily close the underlying IndexReader.  It depends
which constructor you used to create the IndexSearcher.  See the
constructors javadocs or source for the details.

In my case, we were updating and optimizing the index from another
process, and reopening IndexSearchers.  We would eventually run out of
disk space because it was leaving open file handles to deleted files,
so the disk space was never being made available, until the JVM
processes ended.  If you're under linux, try running the 'lsof'
command to see if there are any handles to files marked "(deleted)".

-Chris

On Thu, 27 Jan 2005 08:28:30 -0800 (PST), Greg Gershman
<[EMAIL PROTECTED]> wrote:
> I have an index that is frequently updated.  When
> indexing is completed, an event triggers a new
> Searcher to be opened.  When the new Searcher is
> opened, incoming searches are redirected to the new
> Searcher, the old Searcher is closed and nulled, but I
> still see about twice the amount of memory in use well
> after the original searcher has been closed.   Is
> there something else I can do to get this memory
> reclaimed?  Should I explicitly call garbarge
> collection?  Any ideas?
> 
> Thanks.
> 
> Greg Gershman
> 
> __
> Do you Yahoo!?
> Meet the all-new My Yahoo! - Try it today!
> http://my.yahoo.com
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
>

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: rackmount lucene/nutch - Re: google mini? who needs it when Lucene is there

2005-01-27 Thread Erik Hatcher
I've often said that there is a business to be had in packaging up 
Lucene (and now Nutch) into a cute little box with user friendly 
management software to search your intranet.  SearchBlox is already 
there (except they don't include the box).

I really hope that an application like SearchBlox/Zilverline can be 
created as part of the Lucene project itself, replacing the sad demos 
that currently ship with Lucene.  I've got so many things on my plate 
that I don't foresee myself getting to this as soon as I'd like, but I 
would most definitely support and contribute what time I could to such 
an effort.  If the web UI used Tapestry, I'd be very inclined to dig in 
hardcore to it.  Any other web UI technology would likely turn me off.  
One of these days I'll Tapestry-ify Nutch just for grins and submit it 
as a replacement for the JSPs.

And I'm even more sold on it if Mac Mini's are involved!  :)
Erik
On Jan 27, 2005, at 7:16 PM, David Spencer wrote:
This reminds me, has anyone every discussed something similar:
- rackmount server ( or for coolness factor, that mini mac)
- web i/f for config/control
- of course the server would have the following s/w:
-- web server
-- lucene / nutch
Part of the work here I think is having a decent web i/f to configure 
the thing and to customize the L&F of the search results.


jian chen wrote:
Hi,
I was searching using google and just found that there was a new
feature called "google mini". Initially I thought it was another free
service for small companies. Then I realized that it costs quite some
money ($4,995) for the hardware and software. (I guess the proprietary
software costs a whole lot more than actual hardware.)
The "nice" feature is that, you can only index up to 50,000 documents
with this price. If you need to index more, sorry, send in the
check...
It seems to me that any small biz will be ripped off if they install
this google mini thing, compared to using Lucene to implement a easy
to use search software, which could search up to whatever number of
documents you could image.
I hope the lucene project could get exposed more to the enterprise so
that people know that they have not only cheaper but more importantly,
BETTER alternatives.
Jian
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: text highlighting

2005-01-27 Thread Youngho Cho
Thanks for your reply.

I use QueryParser instead of  TermQuery.
And all works good !.

Thanks.

Youngho

- Original Message - 
From: "mark harwood" <[EMAIL PROTECTED]>
To: 
Sent: Thursday, January 27, 2005 7:05 PM
Subject: Re: text highlighting


> >>sometimes the return Stirng is none.
> >>Is the code analyzer dependancy ?
> 
> When the highlighter.getBestFragments returns nothing
> this is because there was no match found for query
> terms in the TokenStream supplied.
> This is nearly always because of Analyzer issues.
> Check the post-analysis tokens produced for the query
> and the tokens produced in the TokenStream passed to
> the highlighter. The highlighter simply looks for
> matches in the two sources of terms and uses the token
> offsets to select the best sections of the supplied
> text.
> 
> Cheers
> Mark
> 
> 
> 
> 
> 
> ___ 
> ALL-NEW Yahoo! Messenger - all new features - even more fun! 
> http://uk.messenger.yahoo.com
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]

Re: LuceneReader.delete (term t) Failure ?

2005-01-27 Thread Erik Hatcher
How did you index the "uid" field?  Field.Keyword?  If not, that may be 
the problem in that the field was analyzed.  For a key field like this, 
it needs to be unanalyzed/untokenized.

Erik
On Jan 27, 2005, at 6:21 PM, <[EMAIL PROTECTED]> wrote:
Hi,
I am trying to delete a document from Lucene index using:
 Term aTerm = new Term( "uid", path );
 aReader.delete( aTerm );
 aReader.close();
If the variable path="xxx/foo.txt" then I am able to delete the 
document.

However, if path variable has "-" in the string, the delete method 
does not work

  e.g. path="xxx-yyy/foo.txt"  // Does Not work!!
Can I get around this problem.  I cannot subsitute minus character 
with '.' as
it has other implications.

is this a bug ? I am using Lucene 1.4-final version.
Thanks for the help
Atul
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: google mini? who needs it when Lucene is there

2005-01-27 Thread Luke Francl
I disagree. Most small companies don't have an IT staff capable of implementing 
a custom search engine using Lucene for less than $5,000. Nutch might make this 
possible, but compared to a plug-in-and-go solution like the Google mini, it 
still would probably cost a significant amount of money.
 
Getting Lucene/Nutch to the point where it is possible to easily install it on 
a computer and administrate its settings in a user-friendly way is a great 
goal, though.
 
Regards,
Luke Francl



From: jian chen [mailto:[EMAIL PROTECTED]
Sent: Thu 1/27/2005 5:44 PM
To: Lucene Users List
Subject: google mini? who needs it when Lucene is there

It seems to me that any small biz will be ripped off if they install
this google mini thing, compared to using Lucene to implement a easy
to use search software, which could search up to whatever number of
documents you could image.



Re: google mini? who needs it when Lucene is there

2005-01-27 Thread John Wang
I think Google mini also includes crawling and a server wrapper. So it
is not entirely an 1-to-1 comparison.

Of couse extending lucene to have those features are not at all
difficult anyway.

-John


On Thu, 27 Jan 2005 16:04:54 -0800 (PST), Xiaohong Yang (Sharon)
<[EMAIL PROTECTED]> wrote:
> Hi,
> 
> I agree that Google mini is quite expensive.  It might be similar to the 
> desktop version in quality.  Anyone knows google's ratio of index to text?   
> Is it true that Lucene's index is about 500 times the original text size (not 
> including image size)?  I don't have one installed, so I cannot measure.
> 
> Best,
> 
> Sharon
> 
> jian chen <[EMAIL PROTECTED]> wrote:
> Hi,
> 
> I was searching using google and just found that there was a new
> feature called "google mini". Initially I thought it was another free
> service for small companies. Then I realized that it costs quite some
> money ($4,995) for the hardware and software. (I guess the proprietary
> software costs a whole lot more than actual hardware.)
> 
> The "nice" feature is that, you can only index up to 50,000 documents
> with this price. If you need to index more, sorry, send in the
> check...
> 
> It seems to me that any small biz will be ripped off if they install
> this google mini thing, compared to using Lucene to implement a easy
> to use search software, which could search up to whatever number of
> documents you could image.
> 
> I hope the lucene project could get exposed more to the enterprise so
> that people know that they have not only cheaper but more importantly,
> BETTER alternatives.
> 
> Jian
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
>

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



rackmount lucene/nutch - Re: google mini? who needs it when Lucene is there

2005-01-27 Thread David Spencer
This reminds me, has anyone every discussed something similar:
- rackmount server ( or for coolness factor, that mini mac)
- web i/f for config/control
- of course the server would have the following s/w:
-- web server
-- lucene / nutch
Part of the work here I think is having a decent web i/f to configure 
the thing and to customize the L&F of the search results.


jian chen wrote:
Hi,
I was searching using google and just found that there was a new
feature called "google mini". Initially I thought it was another free
service for small companies. Then I realized that it costs quite some
money ($4,995) for the hardware and software. (I guess the proprietary
software costs a whole lot more than actual hardware.)
The "nice" feature is that, you can only index up to 50,000 documents
with this price. If you need to index more, sorry, send in the
check...
It seems to me that any small biz will be ripped off if they install
this google mini thing, compared to using Lucene to implement a easy
to use search software, which could search up to whatever number of
documents you could image.
I hope the lucene project could get exposed more to the enterprise so
that people know that they have not only cheaper but more importantly,
BETTER alternatives.
Jian
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: Disk space used by optimize

2005-01-27 Thread Kauler, Leto S
Our copy of LIA is "in the mail" ;)

Yes the final three files are: the .cfs (46.8MB), deletable (4 bytes),
and segments (29 bytes).

--Leto



> -Original Message-
> From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] 
> 
> Hello,
> 
> Yes, that is how optimize works - copies all existing index 
> segments into one unified index segment, thus optimizing it.
> 
> see hit #1: http://www.lucenebook.com/search?query=optimize+disk+space
> 
> However, three times the space sounds a bit too much, or I 
> make a mistake in the book. :)
> 
> You said you end up with 3 files - .cfs is one of them, right?
> 
> Otis
> 
> 
> --- "Kauler, Leto S" <[EMAIL PROTECTED]> wrote:
> 
> > 
> > Just a quick question:  after writing an index and then calling 
> > optimize(), is it normal for the index to expand to about 
> three times 
> > the size before finally compressing?
> > 
> > In our case the optimise grinds the disk, expanding the index into 
> > many files of about 145MB total, before compressing down to three 
> > files of about 47MB total.  That must be a lot of disk activity for 
> > the people with multi-gigabyte indexes!
> > 
> > Regards,
> > Leto

CONFIDENTIALITY NOTICE AND DISCLAIMER

Information in this transmission is intended only for the person(s) to whom it 
is addressed and may contain privileged and/or confidential information. If you 
are not the intended recipient, any disclosure, copying or dissemination of the 
information is unauthorised and you should delete/destroy all copies and notify 
the sender. No liability is accepted for any unauthorised use of the 
information contained in this transmission.

This disclaimer has been automatically added.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: google mini? who needs it when Lucene is there

2005-01-27 Thread Xiaohong Yang \(Sharon\)
Hi,
 
I agree that Google mini is quite expensive.  It might be similar to the 
desktop version in quality.  Anyone knows google's ratio of index to text?   Is 
it true that Lucene's index is about 500 times the original text size (not 
including image size)?  I don't have one installed, so I cannot measure.
 
Best,
 
Sharon

jian chen <[EMAIL PROTECTED]> wrote:
Hi,

I was searching using google and just found that there was a new
feature called "google mini". Initially I thought it was another free
service for small companies. Then I realized that it costs quite some
money ($4,995) for the hardware and software. (I guess the proprietary
software costs a whole lot more than actual hardware.)

The "nice" feature is that, you can only index up to 50,000 documents
with this price. If you need to index more, sorry, send in the
check...

It seems to me that any small biz will be ripped off if they install
this google mini thing, compared to using Lucene to implement a easy
to use search software, which could search up to whatever number of
documents you could image.

I hope the lucene project could get exposed more to the enterprise so
that people know that they have not only cheaper but more importantly,
BETTER alternatives.

Jian

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Disk space used by optimize

2005-01-27 Thread Otis Gospodnetic
Hello,

Yes, that is how optimize works - copies all existing index segments
into one unified index segment, thus optimizing it.

see hit #1: http://www.lucenebook.com/search?query=optimize+disk+space

However, three times the space sounds a bit too much, or I make a
mistake in the book. :)

You said you end up with 3 files - .cfs is one of them, right?

Otis


--- "Kauler, Leto S" <[EMAIL PROTECTED]> wrote:

> 
> Just a quick question:  after writing an index and then calling
> optimize(), is it normal for the index to expand to about three times
> the size before finally compressing?
> 
> In our case the optimise grinds the disk, expanding the index into
> many
> files of about 145MB total, before compressing down to three files of
> about 47MB total.  That must be a lot of disk activity for the people
> with multi-gigabyte indexes!
> 
> Regards,
> Leto
> 
> CONFIDENTIALITY NOTICE AND DISCLAIMER
> 
> Information in this transmission is intended only for the person(s)
> to whom it is addressed and may contain privileged and/or
> confidential information. If you are not the intended recipient, any
> disclosure, copying or dissemination of the information is
> unauthorised and you should delete/destroy all copies and notify the
> sender. No liability is accepted for any unauthorised use of the
> information contained in this transmission.
> 
> This disclaimer has been automatically added.
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



google mini? who needs it when Lucene is there

2005-01-27 Thread jian chen
Hi,

I was searching using google and just found that there was a new
feature called "google mini". Initially I thought it was another free
service for small companies. Then I realized that it costs quite some
money ($4,995) for the hardware and software. (I guess the proprietary
software costs a whole lot more than actual hardware.)

The "nice" feature is that, you can only index up to 50,000 documents
with this price. If you need to index more, sorry, send in the
check...

It seems to me that any small biz will be ripped off if they install
this google mini thing, compared to using Lucene to implement a easy
to use search software, which could search up to whatever number of
documents you could image.

I hope the lucene project could get exposed more to the enterprise so
that people know that they have not only cheaper but more importantly,
BETTER alternatives.

Jian

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



LuceneReader.delete (term t) Failure ?

2005-01-27 Thread akedar
Hi,

I am trying to delete a document from Lucene index using:

 Term aTerm = new Term( "uid", path );
 aReader.delete( aTerm );
 aReader.close();

If the variable path="xxx/foo.txt" then I am able to delete the document.  

However, if path variable has "-" in the string, the delete method does not work

  e.g. path="xxx-yyy/foo.txt"  // Does Not work!!


Can I get around this problem.  I cannot subsitute minus character with '.' as 
it has other implications.  

is this a bug ? I am using Lucene 1.4-final version.

Thanks for the help
Atul


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Disk space used by optimize

2005-01-27 Thread Kauler, Leto S

Just a quick question:  after writing an index and then calling
optimize(), is it normal for the index to expand to about three times
the size before finally compressing?

In our case the optimise grinds the disk, expanding the index into many
files of about 145MB total, before compressing down to three files of
about 47MB total.  That must be a lot of disk activity for the people
with multi-gigabyte indexes!

Regards,
Leto

CONFIDENTIALITY NOTICE AND DISCLAIMER

Information in this transmission is intended only for the person(s) to whom it 
is addressed and may contain privileged and/or confidential information. If you 
are not the intended recipient, any disclosure, copying or dissemination of the 
information is unauthorised and you should delete/destroy all copies and notify 
the sender. No liability is accepted for any unauthorised use of the 
information contained in this transmission.

This disclaimer has been automatically added.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: query term frequency

2005-01-27 Thread David Spencer
Jonathan Lasko wrote:
What do I call to get the term frequencies for terms in the Query?  I 
can't seem to find it in the Javadoc...
Do you mean the # of docs that have a term?
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/IndexReader.html#docFreq(org.apache.lucene.index.Term)
Thanks.
Jonathan
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


query term frequency

2005-01-27 Thread Jonathan Lasko
What do I call to get the term frequencies for terms in the Query?  I 
can't seem to find it in the Javadoc...
Thanks.

Jonathan
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Sort Performance Problems across large dataset

2005-01-27 Thread Doug Cutting
Peter Hollas wrote:
Currently we can issue a simple search query and expect a response back 
in about 0.2 seconds (~3,000 results) with the Lucene index that we have 
built. Lucene gives a much more predictable and faster average query 
time than using standard fulltext indexing with mySQL. This however 
returns result in score order, and not alphabetically.

To sort the resultset into alphabetical order, we added the species 
names as a seperate keyword field, and sorted using it whilst querying. 
This solution works fine, but is unacceptable since a query that returns 
thousands of results can take upwards of 30 seconds to sort them.
Are you using a Lucene Sort?  If you reuse the same IndexReader (or 
IndexSearcher) then perhaps the first query specifying a Sort will take 
30 seconds (although that's much slower than I'd expect), but subsequent 
searches that sort on the same field should be nearly as fast as results 
sorted by score.

Doug
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Opening up one large index takes 940M or memory?

2005-01-27 Thread Doug Cutting
Kevin A. Burton wrote:
Is there any way to reduce this footprint?  The index is fully 
optimized... I'm willing to take a performance hit if necessary.  Is 
this documented anywhere?
You can increase TermInfosWriter.indexInterval.  You'll need to re-write 
the .tii file for this to take effect.  The simplest way to do this is 
to use IndexWriter.addIndexes(), adding your index to a new, empty, 
directory.  This will of course take a while for a 60GB index...

Doubling TermInfosWriter.indexInterval should half the Term memory usage 
and double the time required to look up terms in the dictionary.  With 
an index this large the the latter is probably not an issue, since 
processing term frequency and proximity data probably overwhelmingly 
dominate search performance.

Perhaps we should make this public by adding an IndexWriter method?
Also, you can list the size of your .tii file by using the main() from 
CompoundFileReader.

Doug
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: XML index

2005-01-27 Thread Otis Gospodnetic
Hello Karl,

Grab the source code for Lucene in Action, it's got code that parses
and indexes XML with DOM and SAX.  You can see the coverage of that
stuff here: 
http://lucenebook.com/search?query=indexing+XML+section%3A7*
I haven't used kXML, but I imagine the LIA code should get you going
quickly and you are free to adapt the code to work with kXML for you.

Otis

--- Karl Koch <[EMAIL PROTECTED]> wrote:

> Hi,
> 
> I want to use kXML with Lucene to index XML files. I think it is
> possible to
> dynamically assign Node names as Document fields and Node texts as
> Text
> (after using an Analyser). 
> 
> I have seen some XML indexing in the Sandbox. Is anybody here which
> has done
> something with a thin pull parser (perhaps even kXML)? Does anybody
> know of
> a project or some sourcecode available which covers this topic?
> 
> Karl
> 
>  
> 
> -- 
> Sparen beginnt mit GMX DSL: http://www.gmx.net/de/go/dsl
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Index Layout Question

2005-01-27 Thread Jerry Jalenak
That's good to know.

I'm indexing on 11 fields (9 keyword, 2 text).  The documents themselves are
between 1K to 2K in size.

Is there a point at which IndexSearcher performance begins to fall off?  (in
term of # of index records?)

Jerry Jalenak
Senior Programmer / Analyst, Web Publishing
LabOne, Inc.
10101 Renner Blvd.
Lenexa, KS  66219
(913) 577-1496

[EMAIL PROTECTED]


-Original Message-
From: Ian Soboroff [mailto:[EMAIL PROTECTED]
Sent: Thursday, January 27, 2005 10:31 AM
To: Lucene Users List
Subject: Re: Index Layout Question


"Jerry Jalenak" <[EMAIL PROTECTED]> writes:

> I am in the process of indexing about 1.5 million documents, and have
> started down the path of indexing these by month.  Each month has between
> 100,000 and 200,000 documents.  From a performance standpoint, is this the
> right approach?  This allows me to use MultiSearcher (or
> ParallelMultiSearcher), but I'm not sure if the performance gains are
really
> there.  Would one monolithic index be better?

Depends on your search infrastructure.  Doug Cutting has sent out some
basic optimization guidelines on this list which should be in the
archives... simply, you need to think about how many CPUs and spindles
are involved.

1.5m documents isn't a challenge for Lucene to index or search on a
single machine with a monolithic index.  I indexed about 1.6m web
pages in 22 hours on a single machine with all data local, and search
with a single IndexSearcher was instantaneous.  We've also done some
testing with a larger collection (25m pages) and
ParallelMultiSearchers on several machines, and likewise on a fast
network haven't felt a slowdown, but we haven't actually benchmarked
it.

Ian



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


This transmission (and any information attached to it) may be confidential and
is intended solely for the use of the individual or entity to which it is
addressed. If you are not the intended recipient or the person responsible for
delivering the transmission to the intended recipient, be advised that you
have received this transmission in error and that any use, dissemination,
forwarding, printing, or copying of this information is strictly prohibited.
If you have received this transmission in error, please immediately notify
LabOne at the following email address: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



XML index

2005-01-27 Thread Karl Koch
Hi,

I want to use kXML with Lucene to index XML files. I think it is possible to
dynamically assign Node names as Document fields and Node texts as Text
(after using an Analyser). 

I have seen some XML indexing in the Sandbox. Is anybody here which has done
something with a thin pull parser (perhaps even kXML)? Does anybody know of
a project or some sourcecode available which covers this topic?

Karl

 

-- 
Sparen beginnt mit GMX DSL: http://www.gmx.net/de/go/dsl

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Boosting Questions

2005-01-27 Thread Luke Shannon
Thanks Otis.

- Original Message - 
From: "Otis Gospodnetic" <[EMAIL PROTECTED]>
To: "Lucene Users List" 
Sent: Thursday, January 27, 2005 12:11 PM
Subject: Re: Boosting Questions


> Luke,
> 
> Boosting is only one of the factors involved in Document/Query scoring.
>  Assuming that by applying your boosts to Document A or a single field
> of Document A increases the total score enough, yes, that Document A
> may have the highest score.  But just because you boost a single
> Document and not others, it does not mean it will emerge at the top.
> You should check out the Explanation class, which can dump all scoring
> factors in text or HTML format.
> 
> Otis
> 
> 
> --- Luke Shannon <[EMAIL PROTECTED]> wrote:
> 
> > Hi All;
> > 
> > I just want to make sure I have the right idea about boosting.
> > 
> > So if I boost a document (Document A) after I index it (lets say a
> > score of
> > 2.0) Lucene will now consider this document relativly more important
> > than
> > other documents in the index with a boost factor less than 2.0. This
> > boost
> > factor will also be applied to all the fields in the Document A.
> > Therefore,
> > if I do a TermQuery on a field that all my documents share ("title"),
> > in the
> > returned Hits (assuming Document A was among the return documents),
> > Document
> > A will score higher than other documents with a lower boost factor
> > because
> > the "title" field in A would have been boosted with all its other
> > fields.
> > Correct?
> > 
> > Now if at indexing time I decided to boost a particular field, lets
> > say
> > "address" in Document A (this is a field which all documents have)
> > the boost
> > factor is only applied to the "address" field of Document A. Nothing
> > else is
> > boosted by this operation. This means if a TermQuery on the "address"
> > field
> > returns Document A along with a collection of other documents,
> > Document A
> > will score higher than the others because of boosting. Correct?
> > 
> > Thanks,
> > 
> > Luke
> > 
> > 
> > 
> > -
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> > 
> > 
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Boosting Questions

2005-01-27 Thread Otis Gospodnetic
Luke,

Boosting is only one of the factors involved in Document/Query scoring.
 Assuming that by applying your boosts to Document A or a single field
of Document A increases the total score enough, yes, that Document A
may have the highest score.  But just because you boost a single
Document and not others, it does not mean it will emerge at the top.
You should check out the Explanation class, which can dump all scoring
factors in text or HTML format.

Otis


--- Luke Shannon <[EMAIL PROTECTED]> wrote:

> Hi All;
> 
> I just want to make sure I have the right idea about boosting.
> 
> So if I boost a document (Document A) after I index it (lets say a
> score of
> 2.0) Lucene will now consider this document relativly more important
> than
> other documents in the index with a boost factor less than 2.0. This
> boost
> factor will also be applied to all the fields in the Document A.
> Therefore,
> if I do a TermQuery on a field that all my documents share ("title"),
> in the
> returned Hits (assuming Document A was among the return documents),
> Document
> A will score higher than other documents with a lower boost factor
> because
> the "title" field in A would have been boosted with all its other
> fields.
> Correct?
> 
> Now if at indexing time I decided to boost a particular field, lets
> say
> "address" in Document A (this is a field which all documents have)
> the boost
> factor is only applied to the "address" field of Document A. Nothing
> else is
> boosted by this operation. This means if a TermQuery on the "address"
> field
> returns Document A along with a collection of other documents,
> Document A
> will score higher than the others because of boosting. Correct?
> 
> Thanks,
> 
> Luke
> 
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Boosting Questions

2005-01-27 Thread Luke Shannon
Hi All;

I just want to make sure I have the right idea about boosting.

So if I boost a document (Document A) after I index it (lets say a score of
2.0) Lucene will now consider this document relativly more important than
other documents in the index with a boost factor less than 2.0. This boost
factor will also be applied to all the fields in the Document A. Therefore,
if I do a TermQuery on a field that all my documents share ("title"), in the
returned Hits (assuming Document A was among the return documents), Document
A will score higher than other documents with a lower boost factor because
the "title" field in A would have been boosted with all its other fields.
Correct?

Now if at indexing time I decided to boost a particular field, lets say
"address" in Document A (this is a field which all documents have) the boost
factor is only applied to the "address" field of Document A. Nothing else is
boosted by this operation. This means if a TermQuery on the "address" field
returns Document A along with a collection of other documents, Document A
will score higher than the others because of boosting. Correct?

Thanks,

Luke



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Reloading an index

2005-01-27 Thread Cocula Remi
Make sure that the older searcher is not referenced elsewhere otherwise the 
garbage collector should 
delete it.
Just remember that the Garbage collector runs when memory is needed but not 
immediatly after changing a reference to null.


-Message d'origine-
De : Greg Gershman [mailto:[EMAIL PROTECTED]
Envoyé : jeudi 27 janvier 2005 17:29
À : lucene-user@jakarta.apache.org
Objet : Reloading an index


I have an index that is frequently updated.  When
indexing is completed, an event triggers a new
Searcher to be opened.  When the new Searcher is
opened, incoming searches are redirected to the new
Searcher, the old Searcher is closed and nulled, but I
still see about twice the amount of memory in use well
after the original searcher has been closed.   Is
there something else I can do to get this memory
reclaimed?  Should I explicitly call garbarge
collection?  Any ideas?

Thanks.

Greg Gershman 



__ 
Do you Yahoo!? 
Meet the all-new My Yahoo! - Try it today! 
http://my.yahoo.com 
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Index Layout Question

2005-01-27 Thread Ian Soboroff
"Jerry Jalenak" <[EMAIL PROTECTED]> writes:

> I am in the process of indexing about 1.5 million documents, and have
> started down the path of indexing these by month.  Each month has between
> 100,000 and 200,000 documents.  From a performance standpoint, is this the
> right approach?  This allows me to use MultiSearcher (or
> ParallelMultiSearcher), but I'm not sure if the performance gains are really
> there.  Would one monolithic index be better?

Depends on your search infrastructure.  Doug Cutting has sent out some
basic optimization guidelines on this list which should be in the
archives... simply, you need to think about how many CPUs and spindles
are involved.

1.5m documents isn't a challenge for Lucene to index or search on a
single machine with a monolithic index.  I indexed about 1.6m web
pages in 22 hours on a single machine with all data local, and search
with a single IndexSearcher was instantaneous.  We've also done some
testing with a larger collection (25m pages) and
ParallelMultiSearchers on several machines, and likewise on a fast
network haven't felt a slowdown, but we haven't actually benchmarked
it.

Ian



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reloading an index

2005-01-27 Thread Greg Gershman
I have an index that is frequently updated.  When
indexing is completed, an event triggers a new
Searcher to be opened.  When the new Searcher is
opened, incoming searches are redirected to the new
Searcher, the old Searcher is closed and nulled, but I
still see about twice the amount of memory in use well
after the original searcher has been closed.   Is
there something else I can do to get this memory
reclaimed?  Should I explicitly call garbarge
collection?  Any ideas?

Thanks.

Greg Gershman 



__ 
Do you Yahoo!? 
Meet the all-new My Yahoo! - Try it today! 
http://my.yahoo.com 
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Index Layout Question

2005-01-27 Thread Jerry Jalenak
I am in the process of indexing about 1.5 million documents, and have
started down the path of indexing these by month.  Each month has between
100,000 and 200,000 documents.  From a performance standpoint, is this the
right approach?  This allows me to use MultiSearcher (or
ParallelMultiSearcher), but I'm not sure if the performance gains are really
there.  Would one monolithic index be better?

Thanks.

Jerry Jalenak
Senior Programmer / Analyst, Web Publishing
LabOne, Inc.
10101 Renner Blvd.
Lenexa, KS  66219
(913) 577-1496

[EMAIL PROTECTED]


This transmission (and any information attached to it) may be confidential and
is intended solely for the use of the individual or entity to which it is
addressed. If you are not the intended recipient or the person responsible for
delivering the transmission to the intended recipient, be advised that you
have received this transmission in error and that any use, dissemination,
forwarding, printing, or copying of this information is strictly prohibited.
If you have received this transmission in error, please immediately notify
LabOne at the following email address: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Different Documents (with fields) in one index?

2005-01-27 Thread Aad Nales
Nope,
it is very possible. We have an index that holds the search info for 
documents, messages in discussion threads, filled in forms etc. etc. 
each having their own structure.

cheers,
Aad
Karl Koch wrote:
Hello all,
perhaps not such a sophisticated question: 

I would like to have a very diverse set of documents in one index. Depending
on the inside of text documents, I would like to put part of the text in
different fields. This means in the searches, when searching a particular
field, some of those documents won't be addressed at all.
Is it possible to have different kinds of Documents with different index
fields in ONE index? Or do I need one index for each set?
Karl
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Different Documents (with fields) in one index?

2005-01-27 Thread Otis Gospodnetic
Karl,

This is completely fine.  You can have documents with different fields
in the same index.

Otis

--- Karl Koch <[EMAIL PROTECTED]> wrote:

> Hello all,
> 
> perhaps not such a sophisticated question: 
> 
> I would like to have a very diverse set of documents in one index.
> Depending
> on the inside of text documents, I would like to put part of the text
> in
> different fields. This means in the searches, when searching a
> particular
> field, some of those documents won't be addressed at all.
> 
> Is it possible to have different kinds of Documents with different
> index
> fields in ONE index? Or do I need one index for each set?
> 
> Karl
> 
> -- 
> 10 GB Mailbox, 100 FreeSMS http://www.gmx.net/de/go/topmail
> +++ GMX - die erste Adresse für Mail, Message, More +++
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Different Documents (with fields) in one index?

2005-01-27 Thread Karl Koch
Hello all,

perhaps not such a sophisticated question: 

I would like to have a very diverse set of documents in one index. Depending
on the inside of text documents, I would like to put part of the text in
different fields. This means in the searches, when searching a particular
field, some of those documents won't be addressed at all.

Is it possible to have different kinds of Documents with different index
fields in ONE index? Or do I need one index for each set?

Karl

-- 
10 GB Mailbox, 100 FreeSMS http://www.gmx.net/de/go/topmail
+++ GMX - die erste Adresse für Mail, Message, More +++

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



LuceneRAR nearing first release

2005-01-27 Thread Joseph Ottinger
https://lucenerar.dev.java.net

LuceneRAR is now working on two containers, verified: The J2EE 1.4 RI and
Orion. Websphere testing is underway, with JBoss to follow.

LuceneRAR is a resource adapter for Lucene, allowing J2EE components to
look up an entry in a JNDI tree, using that reference to add and search
for documents. It's much like RemoteSearcher would be, except using JNDI
semantics for communication instead of RMI, which is a little more elegant
in a J2EE environment (where JNDI communication is very common).

LuceneRAR was created to allow J2EE components to legitimately use the
filesystem indexes (for speed) while not violating J2EE's suggestion to
not rely on filesystem access. It also allows distributed access to the
index (as remote servers would simply establish a JNDI connection to the
LuceneRAR home.)

Please take a look at it, if you're interested; the feature set isn't
complete, but it's workable. There's a sample application that allows
creation, searches, and statistical data about the search included in the
distribution.

Any comments are welcomed.

---
Joseph B. Ottinger http://enigmastation.com
IT Consultant[EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: text highlighting

2005-01-27 Thread mark harwood
>>sometimes the return Stirng is none.
>>Is the code analyzer dependancy ?

When the highlighter.getBestFragments returns nothing
this is because there was no match found for query
terms in the TokenStream supplied.
This is nearly always because of Analyzer issues.
Check the post-analysis tokens produced for the query
and the tokens produced in the TokenStream passed to
the highlighter. The highlighter simply looks for
matches in the two sources of terms and uses the token
offsets to select the best sections of the supplied
text.

Cheers
Mark





___ 
ALL-NEW Yahoo! Messenger - all new features - even more fun! 
http://uk.messenger.yahoo.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: text highlighting

2005-01-27 Thread Youngho Cho
More test result

if the text contains  ... Family ...
Than

family query string woks OK.
But if the query stirng is Family than the highlighter return none.


Thanks.

Youngho

- Original Message - 
From: "Youngho Cho" <[EMAIL PROTECTED]>
To: "Lucene Users List" 
Cc: "Che Dong" <[EMAIL PROTECTED]>
Sent: Thursday, January 27, 2005 6:10 PM
Subject: Re: text highlighting


> Hello,
> 
> When I used the code with CJKAnalyzer and search English Text 
> (Because the text is mixed with Korean and English )
> sometimes the return Stirng is none.
> Others works well.
> 
> Is the code analyzer dependancy ?
> 
> Thanks.
> 
> Youngho
> 
> ---  Test Code ( Just copy of the Book code ) -
> 
> private static final String HIGH_LIGHT_OPEN = " class=\"highlight\">";
> private static final String HIGH_LIGHT_CLOSE = "";
> 
> public static String highLight(String value, String queryString)
> throws IOException
> {
> if (StringUtils.isEmpty(value) || StringUtils.isEmpty(queryString))
> {
> return value;
> }
> 
> TermQuery query = new TermQuery(new Term("h", queryString));
> QueryScorer scorer = new QueryScorer(query);
> SimpleHTMLFormatter formatter = new 
> SimpleHTMLFormatter(HIGH_LIGHT_OPEN,
> HIGH_LIGHT_CLOSE);
> Highlighter highlighter = new Highlighter(formatter, scorer);
> 
> Fragmenter fragmenter = new SimpleFragmenter(50);
> 
> highlighter.setTextFragmenter(fragmenter);
> 
> TokenStream tokenStream = new CJKAnalyzer().tokenStream("h",
> new StringReader(value));
> 
> return highlighter.getBestFragments(tokenStream, value, 5, "...");
> }
> 
> - Original Message - 
> From: "Erik Hatcher" <[EMAIL PROTECTED]>
> To: "Lucene Users List" 
> Sent: Thursday, January 27, 2005 8:37 AM
> Subject: Re: text highlighting
> 
> 
> > Also, there are some examples in the Lucene in Action source code (grab  
> > it from http://www.lucenebook.com) (see HighlightIt.java).
> > 
> > Erik
> > 
> > On Jan 26, 2005, at 5:52 PM, markharw00d wrote:
> > 
> > > Michael Celona wrote:
> > >
> > >> Does any have a working example of the highlighter class found in the
> > >> sandbox?
> > >>
> > >>
> > > There are several in the accompanying Junit test:
> > > http://cvs.apache.org/viewcvs.cgi/jakarta-lucene-sandbox/ 
> > > contributions/highlighter/src/test/org/apache/lucene/search/highlight/
> > >
> > >
> > > Cheers
> > > Mark
> > >
> > >
> > > -
> > > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > > For additional commands, e-mail: [EMAIL PROTECTED]
> > 
> > 
> > -
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]

Re: text highlighting

2005-01-27 Thread Youngho Cho
Hello,

When I used the code with CJKAnalyzer and search English Text 
(Because the text is mixed with Korean and English )
sometimes the return Stirng is none.
Others works well.

Is the code analyzer dependancy ?

Thanks.

Youngho

---  Test Code ( Just copy of the Book code ) -

private static final String HIGH_LIGHT_OPEN = "";
private static final String HIGH_LIGHT_CLOSE = "";

public static String highLight(String value, String queryString)
throws IOException
{
if (StringUtils.isEmpty(value) || StringUtils.isEmpty(queryString))
{
return value;
}

TermQuery query = new TermQuery(new Term("h", queryString));
QueryScorer scorer = new QueryScorer(query);
SimpleHTMLFormatter formatter = new SimpleHTMLFormatter(HIGH_LIGHT_OPEN,
HIGH_LIGHT_CLOSE);
Highlighter highlighter = new Highlighter(formatter, scorer);

Fragmenter fragmenter = new SimpleFragmenter(50);

highlighter.setTextFragmenter(fragmenter);

TokenStream tokenStream = new CJKAnalyzer().tokenStream("h",
new StringReader(value));

return highlighter.getBestFragments(tokenStream, value, 5, "...");
}

- Original Message - 
From: "Erik Hatcher" <[EMAIL PROTECTED]>
To: "Lucene Users List" 
Sent: Thursday, January 27, 2005 8:37 AM
Subject: Re: text highlighting


> Also, there are some examples in the Lucene in Action source code (grab  
> it from http://www.lucenebook.com) (see HighlightIt.java).
> 
> Erik
> 
> On Jan 26, 2005, at 5:52 PM, markharw00d wrote:
> 
> > Michael Celona wrote:
> >
> >> Does any have a working example of the highlighter class found in the
> >> sandbox?
> >>
> >>
> > There are several in the accompanying Junit test:
> > http://cvs.apache.org/viewcvs.cgi/jakarta-lucene-sandbox/ 
> > contributions/highlighter/src/test/org/apache/lucene/search/highlight/
> >
> >
> > Cheers
> > Mark
> >
> >
> > -
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]

Re: Searching with words that contain % , / and the like

2005-01-27 Thread Chris Lamprecht
Without looking at the source, my guess is that StandardAnalyzer (and
StandardTokenizer) is the culprit.  The StandardAnalyzer grammar (in
StandardTokenizer.jj) is probably defined so "x/y" parses into two
tokens, "x" and "y".  "s" is a default stopword (see
StopAnalyzer.ENGLISH_STOP_WORDS), so it gets filtered out, while "p"
does not.

To get what you want, you can use a WhitespaceAnalyzer, write your own
custom Analyzer or Tokenizer, or modify the StandardTokenizer.jj
grammar to suit your needs.  WhitespaceAnalyzer is much simpler than
StandardAnalyzer, so you may see some other things being tokenized
differently.

-Chris

On Thu, 27 Jan 2005 12:12:16 +0530, Robinson Raju
<[EMAIL PROTECTED]> wrote:
> Hi ,
> 
> Is there a way to search for words that contain "/" or "%" .
> if my query is "test/s" , it is just taken as "test"
> if my query is "test/p" , it is just taken as "test p"
> has anyone done this / faced such an issue ?
> 
> Regards
> Robin
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
>

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Searching with words that contain % , / and the like

2005-01-27 Thread Robinson Raju
Hi Jason , 
yes , the doc'n does mention escaping . but thats only for special
characters used in queries , right ?
but i've tried 'escaping' too.
to answer ur question , am sure it is not HTTP request which is eating it up. 

Query query = MultiFieldQueryParser.parse("test/s",
 "value", analyzer);

  query has "value:test"

  am using StandardAnalyzer


On Thu, 27 Jan 2005 17:53:39 +1100, Jason Polites
<[EMAIL PROTECTED]> wrote:
> Lucene doco mentions escaping, but doesn't include the "/" char...
> 
> --
> Lucene supports escaping special characters that are part of the query
> syntax. The current list special characters are
> 
> + - && || ! ( ) { } [ ] ^ " ~ * ? : \
> 
> To escape these character use the \ before the character. For example to
> search for (1+1):2 use the query:
> 
> \(1\+1\)\:2
> --
> 
> You could try escaping it anyway?
> 
> Are you sure it's not an HTTP request which is screwing with the parameter?
> 
> 
> - Original Message -
> From: "Robinson Raju" <[EMAIL PROTECTED]>
> To: "Lucene Users List" 
> Sent: Thursday, January 27, 2005 5:42 PM
> Subject: Searching with words that contain % , / and the like
> 
> > Hi ,
> >
> > Is there a way to search for words that contain "/" or "%" .
> > if my query is "test/s" , it is just taken as "test"
> > if my query is "test/p" , it is just taken as "test p"
> > has anyone done this / faced such an issue ?
> >
> > Regards
> > Robin
> >
> > -
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> >
> >
> 
> 


-- 
Regards,
Robin
9886394650
"The merit of an action lies in finishing it to the end"

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]