Re: Range Query Sombody HELP please

2004-05-27 Thread Ype Kingma
On Thursday 27 May 2004 07:00, Karthik N S wrote:
 Hi
 Lucene developers

 Is it possible to do Search and retrieve relevant information on the
 Indexed Document
 within in specific range settings which may be  similar to an

 Query in SQL  =  select  *  from BOOKSHELF where  book1  between 100 and
 200

 ex:-

search_word  ,   Book between  100   AND   200

 [ Note:- where Book uniquefield  hit info which is already Indexed ]

The query parser can construct this query for you (assuming search_word
is in the query default field):

+search_word +(book:[100 TO 200])

See also: http://jakarta.apache.org/lucene/docs/queryparsersyntax.html

One problem you might run into is that Lucene does not support numbers
directly, only strings are indexed. You can index these numbers with sufficient
zero's prefixed and add these prefix zero's in the query.

Erik Hatcher wrote an article on how to do make the query:
http://today.java.net/pub/a/today/2003/11/07/QueryParserRules.html
You'll need to override the getRangeQuery() method.

Have fun,
Ype


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Range Query Sombody HELP please

2004-05-27 Thread Karthik N S
Hi
   Lucene -Developer My main intention was

 Search for an word hit  in a Unique Field  between  ranges say
book100  - book 200  indexed numbers
 It's something like creating a SUBSEARCH  with in the SEARCHINDEX.

  This is similar to a SQL =

 select  *  from BOOKSHELF.
 or
 select  *  from BOOKSHELF where  book1  between 100 and  200.


with regards
Karthik

-Original Message-
From: Ype Kingma [mailto:[EMAIL PROTECTED]
Sent: Thursday, May 27, 2004 12:46 PM
To: [EMAIL PROTECTED]
Subject: Re: Range Query Sombody HELP please


On Thursday 27 May 2004 07:00, Karthik N S wrote:
 Hi
 Lucene developers

 Is it possible to do Search and retrieve relevant information on the
 Indexed Document
 within in specific range settings which may be  similar to an

 Query in SQL  =  select  *  from BOOKSHELF where  book1  between 100 and
 200

 ex:-

search_word  ,   Book between  100   AND   200

 [ Note:- where Book uniquefield  hit info which is already Indexed ]

The query parser can construct this query for you (assuming search_word
is in the query default field):

+search_word +(book:[100 TO 200])

See also: http://jakarta.apache.org/lucene/docs/queryparsersyntax.html

One problem you might run into is that Lucene does not support numbers
directly, only strings are indexed. You can index these numbers with
sufficient
zero's prefixed and add these prefix zero's in the query.

Erik Hatcher wrote an article on how to do make the query:
http://today.java.net/pub/a/today/2003/11/07/QueryParserRules.html
You'll need to override the getRangeQuery() method.

Have fun,
Ype


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Memo: Re: RE: RE: Query parser and minus signs

2004-05-27 Thread alex . bourne




Thanks Erik :)

We are using 1.3 so it looks like an upgrade should be made asap.

Whilst hacking around I found an alternative solution. I went back to using
a Keyword field, but instead of using the minus symbol in the query I just
used -language:en* which has the desired effect.

Now I know about the upgrade to 1.4 I'll have a look at some alternative
solutions.

Thanks for everyone's suggestions on this problem.

Alex B.




Erik Hatcher [EMAIL PROTECTED] on 26 May 2004 17:24

Please respond to Lucene Users List [EMAIL PROTECTED]

To:Lucene Users List [EMAIL PROTECTED]
cc:
bcc:

Subject:Re: RE: RE: Query parser and minus signs



On May 26, 2004, at 10:48 AM, [EMAIL PROTECTED] wrote:
 Query: hsbc -language:zh-HK
 Parsed query: (contents:hsbc -language:zh -contents:hk) (keywords:hsbc
 -language:zh -keywords:hk) (title:hsbc -language:zh -title:hk)
 (language:hsbc
 -language:zh -language:HK)
 Hits: 169
 Not quite what I was expecting from the parsed query - the zh and HK
 are now separated.

I think I can safely say that you are not running the latest version of
Lucene.  This has been corrected in the 1.4 versions.

I've tested this with Wal-Mart (without the quote) and QueryParser,
and it works as expected.


 Query: hsbc -language:zh\-HK
 Parsed query: (contents:hsbc -language:zh\-HK) (keywords:hsbc
 -language:zh\-HK) (title:hsbc -language:zh\-HK) (language:hsbc
 -language:zh\-HK)
 Hits: 206
 And I'm guessing here, but I don't think the slash is escaping, does
 it just become part of the query??

Now that is odd.

QueryParser is an awkward beast at times, and combining it with
MultiFieldQueryParser (which I'd recommend against, as you can see with
the odd queries it built for you) gets even more confusing.

Hopefully the latest Lucene 1.4 RC release will fix up your situation.

 Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



**
 This message originated from the Internet. Its originator may or
 may not be who they claim to be and the information contained in
 the message and any attachments may or may not be accurate.
**








_

This transmission has been issued by a member of the HSBC Group 
(HSBC) for the information of the addressee only and should not be 
reproduced and / or distributed to any other person. Each page 
attached hereto must be read in conjunction with any disclaimer which 
forms part of it. This transmission is neither an offer nor the solicitation 
of an offer to sell or purchase any investment. Its contents are based 
on information obtained from sources believed to be reliable but HSBC 
makes no representation and accepts no responsibility or liability as to 
its completeness or accuracy.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Memo: Re: Asian languages

2004-05-27 Thread alex . bourne




Hi Christophe,

we're currently indexing Chinese pages with little difficulty. You can use
the standard analyzer to index the documents and it will tokenize the
content into individual characters. If you want to create a list of 'stop'
words you will need to create your own analyzer and supply it with a list
of unicode characters to stop. We are indexing HTML pages using a spider to
traverse the site and have subclassed Document into HTML_Document. This
allows us to set the content encoding for the input stream reader - as our
system default is iso_8859-1 in common with most western machines - which
enables it to correctly process the unicode characters. You may need to do
this too.

Hope this helps

Alex.




Christophe Lombart [EMAIL PROTECTED] on 26 May
2004 19:16

Please respond to Lucene Users List [EMAIL PROTECTED]

To:Lucene Users List [EMAIL PROTECTED]
cc:
bcc:

Subject:Asian languages


Which  asian languages are supported by Lucene ?
What about corean, japanese, thaï, ... ?
If they are not yet supported, what I need to do ?

Thanks,
Christophe

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



**
 This message originated from the Internet. Its originator may or
 may not be who they claim to be and the information contained in
 the message and any attachments may or may not be accurate.
**








_

This transmission has been issued by a member of the HSBC Group 
(HSBC) for the information of the addressee only and should not be 
reproduced and / or distributed to any other person. Each page 
attached hereto must be read in conjunction with any disclaimer which 
forms part of it. This transmission is neither an offer nor the solicitation 
of an offer to sell or purchase any investment. Its contents are based 
on information obtained from sources believed to be reliable but HSBC 
makes no representation and accepts no responsibility or liability as to 
its completeness or accuracy.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Memo: Re: Asian languages

2004-05-27 Thread alex . bourne




Sorry Christophe,

I mis-informed you. We did NOT subclass Document, we simply created an
HTMLDocument class with methods that return Lucene Documents with the
required fields added and that is where the content-encoding was set.

Alex.




Alex BOURNE/IBEU/[EMAIL PROTECTED] on 27 May 2004 09:05

Please respond to Lucene Users List [EMAIL PROTECTED]

To:Lucene Users List [EMAIL PROTECTED]
cc:
bcc:

Subject:Re: Asian languages






Hi Christophe,

we're currently indexing Chinese pages with little difficulty. You can use
the standard analyzer to index the documents and it will tokenize the
content into individual characters. If you want to create a list of 'stop'
words you will need to create your own analyzer and supply it with a list
of unicode characters to stop. We are indexing HTML pages using a spider to
traverse the site and have subclassed Document into HTML_Document. This
allows us to set the content encoding for the input stream reader - as our
system default is iso_8859-1 in common with most western machines - which
enables it to correctly process the unicode characters. You may need to do
this too.

Hope this helps

Alex.




Christophe Lombart [EMAIL PROTECTED] on 26 May
2004 19:16

Please respond to Lucene Users List [EMAIL PROTECTED]

To:Lucene Users List [EMAIL PROTECTED]
cc:
bcc:

Subject:Asian languages


Which  asian languages are supported by Lucene ?
What about corean, japanese, thaï, ... ?
If they are not yet supported, what I need to do ?

Thanks,
Christophe

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



**
 This message originated from the Internet. Its originator may or
 may not be who they claim to be and the information contained in
 the message and any attachments may or may not be accurate.
**








_

This transmission has been issued by a member of the HSBC Group
(HSBC) for the information of the addressee only and should not be
reproduced and / or distributed to any other person. Each page
attached hereto must be read in conjunction with any disclaimer which
forms part of it. This transmission is neither an offer nor the
solicitation
of an offer to sell or purchase any investment. Its contents are based
on information obtained from sources believed to be reliable but HSBC
makes no representation and accepts no responsibility or liability as to
its completeness or accuracy.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



**
 This message originated from the Internet. Its originator may or
 may not be who they claim to be and the information contained in
 the message and any attachments may or may not be accurate.
**








_

This transmission has been issued by a member of the HSBC Group 
(HSBC) for the information of the addressee only and should not be 
reproduced and / or distributed to any other person. Each page 
attached hereto must be read in conjunction with any disclaimer which 
forms part of it. This transmission is neither an offer nor the solicitation 
of an offer to sell or purchase any investment. Its contents are based 
on information obtained from sources believed to be reliable but HSBC 
makes no representation and accepts no responsibility or liability as to 
its completeness or accuracy.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Range Query Sombody HELP please

2004-05-27 Thread Erik Hatcher
On May 27, 2004, at 3:37 AM, Karthik N S wrote:
Hi
   Lucene -Developer My main intention was
 Search for an word hit  in a Unique Field  between  ranges say
book100  - book 200  indexed numbers
 It's something like creating a SUBSEARCH  with in the SEARCHINDEX.
  This is similar to a SQL =
 select  *  from BOOKSHELF.
 or
 select  *  from BOOKSHELF where  book1  between 100 and  200.
Karthik - I'm having a hard time understanding your questions 
unfortunately.  Ype replied with solution suggestion by overriding 
getRangeQuery on a custom QueryParser subclass.  You need to ensure you 
are indexing numbers in a padded fashion:

http://wiki.apache.org/jakarta-lucene/SearchNumericalFields
Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: classic scenario

2004-05-27 Thread Otis Gospodnetic
Hello,

Answers inlined.

--- Adrian Dumitru [EMAIL PROTECTED] wrote:

 I am (also) building a web crawler, a topic specific one to be more
 precise, for a vortal. I recently learned about Lucene and I'd very
 much
 like to use it in order to handle keyword specific searched on the
 info
 that I collect.
 I suspect this is a classic project, at least for Lucene, probably
 something like this has been addressed already on this disussion
 list, I'm
 interested to hear any experience anyone might have with this
 subject.

See http://www.nutch.org/
It may make sense to join Nutch, contribute patches that help you, etc.
instead of building your own crawler from scratch.

 My crawler goes on the internet, extracts/parse/ranks and saves
 websites,
 most of the information is also categoriezed and stored in the
 database
 but I also save about 10 top pages from each site in the filesystem.
 The first question is: should I care about indexing these files at
 the
 time I extract them from internet? Or should I index them later, when
 I
 make them available for search?

Lucene does not care about files and is not limited to indexing files. 
It sounds like you tried the Lucene demo that indexes files in the file
system.

However, indexing in batch instead of as you crawl may be a more
scalable and cleaner, more manageable approach.  Nutch uses that
approach for a reason. :)

 If yes, then can I still name my files the way I want?(i.e. are there
 any
 constraints in the filenames from Lucene perspective?)

No constraints.

 Is it an OK idea to have the same files repository (or index) where
 the
 crawler writes (indexes files) and the search function searches?

Not a good idea.  Keep your Lucene index directory clean, and use it
only as an index directory.  Write your files elsewhere, I would
suggest.

 I
 guess
 performance issues are important here.
 Can I still organize the files that I save the way I want? (I planned
 to
 write all the files from a given website on different folders...and
 the
 folders will have as name the id from my database)

That is up to you and your application.  I just suggest you keep that
outside the index directory, in order to keep things clean, well
organized, and such.

 I maintain a taxonomy (list of categories)...each website will fall
 into
 one or more of these categories, also each website will have a rank.
 Does
 Lucene have something that I should be aware of related to what I
 said?

Lucene ranks search result items.  Look at Similarity and
DefaultSimilarity classes.  It sounds like you may benefit from having
a custom Similarity that is aware of your categories.

 I guess that's it for now...this is more like a pet project for me, a
 pet
 which keeps growing :) I wouldn't mind any help and opinions you can
 provide, source code samples, etc.

It this is really a pet project, perhaps joining Nutch will also be fun
for you.  Some recent Nutch contributors are also Lucene users.

Otis


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Memory usage

2004-05-27 Thread Otis Gospodnetic
Sorry if I'm stating the obvious.  Is this happening in some
stand-alone unit tests, or are you running things from some application
and in some environment, like Tomcat, Jetty or in some non-web app?

Your queries are pretty big (although I recall some people using even
bigger ones... but it all depends on the hardware they had), but are
you sure running out of memory is due to Lucene, or could it be a leak
in the app from which you are running queries?

Otis


--- James Dunn [EMAIL PROTECTED] wrote:
 Doug,
 
 We only search on analyzed text fields.  There are a
 couple of additional fields in the index like
 OBJECT_ID that are keywords but we don't search
 against those, we only use them once we get a result
 back to find the thing that document represents.
 
 Thanks,
 
 Jim
 
 --- Doug Cutting [EMAIL PROTECTED] wrote:
  It is cached by the IndexReader and lives until the
  index reader is 
  garbage collected.  50-70 searchable fields is a
  *lot*.  How many are 
  analyzed text, and how many are simply keywords?
  
  Doug
  
  James Dunn wrote:
   Doug,
   
   Thanks!  
   
   I just asked a question regarding how to calculate
  the
   memory requirements for a search.  Does this
  memory
   only get used only during the search operation
  itself,
   or is it referenced by the Hits object or anything
   else after the actual search completes?
   
   Thanks again,
   
   Jim
   
   
   --- Doug Cutting [EMAIL PROTECTED] wrote:
   
  James Dunn wrote:
  
  Also I search across about 50 fields but I don't
  
  use
  
  wildcard or range queries. 
  
  Lucene uses one byte of RAM per document per
  searched field, to hold the 
  normalization values.  So if you search a 10M
  document collection with 
  50 fields, then you'll end up using 500MB of RAM.
  
  If you're using unanalyzed fields, then an easy
  workaround to reduce the 
  number of fields is to combine many in a single
  field.  So, instead of, 
  e.g., using an f1 field with value abc, and an
  f2 field with value 
  efg, use a single field named f with values
  1_abc and 2_efg.
  
  We could optimize this in Lucene.  If no values of
  an indexed field are 
  analyzed, then we could store no norms for the
  field
  and hence read none 
  into memory.  This wouldn't be too hard to
  implement...
  
  Doug
  
  
   
  
 
 -
   
  To unsubscribe, e-mail:
  [EMAIL PROTECTED]
  For additional commands, e-mail:
  [EMAIL PROTECTED]
  
   
   
   
   
 
 
   __
   Do you Yahoo!?
   Friends.  Fun.  Try the all-new Yahoo! Messenger.
   http://messenger.yahoo.com/ 
   
  
 
 -
   To unsubscribe, e-mail:
  [EMAIL PROTECTED]
   For additional commands, e-mail:
  [EMAIL PROTECTED]
   
  
 
 -
  To unsubscribe, e-mail:
  [EMAIL PROTECTED]
  For additional commands, e-mail:
  [EMAIL PROTECTED]
  
 
 
 
   
   
 __
 Do you Yahoo!?
 Friends.  Fun.  Try the all-new Yahoo! Messenger.
 http://messenger.yahoo.com/ 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Range Query Sombody HELP please

2004-05-27 Thread Otis Gospodnetic
Karthik, namaste!

I seem to be getting multiple copies of your email.
I received 4 copies of this email.

Could you please limit things to 1 message per subject?
I get hundreds of messages every day as is. :(

Thank you,
Otis

--- Karthik N S [EMAIL PROTECTED] wrote:
 
 Hi
 Lucene developers
 
 Is it possible to do Search and retrieve relevant information on the
 Indexed
 Document
 within in specific range settings which may be  similar to an
 
 Query in SQL  =  select  *  from BOOKSHELF where  book1  between 100
 and 200
 
 ex:-
 
search_word  ,   Book between  100   AND   200
 
 [ Note:- where Book uniquefield  hit info which is already Indexed ]
 
 
 Sombody Please Help me   :(
 
 
 with regards
 Karthik
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Memory usage

2004-05-27 Thread James Dunn
Otis,

My app does run within Tomcat.  But when I started
getting these OutOfMemoryErrors I wrote a little unit
test to watch the memory usage without Tomcat in the
middle and I still see the memory usage.

Thanks,

Jim
--- Otis Gospodnetic [EMAIL PROTECTED]
wrote:
 Sorry if I'm stating the obvious.  Is this happening
 in some
 stand-alone unit tests, or are you running things
 from some application
 and in some environment, like Tomcat, Jetty or in
 some non-web app?
 
 Your queries are pretty big (although I recall some
 people using even
 bigger ones... but it all depends on the hardware
 they had), but are
 you sure running out of memory is due to Lucene, or
 could it be a leak
 in the app from which you are running queries?
 
 Otis
 
 
 --- James Dunn [EMAIL PROTECTED] wrote:
  Doug,
  
  We only search on analyzed text fields.  There are
 a
  couple of additional fields in the index like
  OBJECT_ID that are keywords but we don't search
  against those, we only use them once we get a
 result
  back to find the thing that document represents.
  
  Thanks,
  
  Jim
  
  --- Doug Cutting [EMAIL PROTECTED] wrote:
   It is cached by the IndexReader and lives until
 the
   index reader is 
   garbage collected.  50-70 searchable fields is a
   *lot*.  How many are 
   analyzed text, and how many are simply keywords?
   
   Doug
   
   James Dunn wrote:
Doug,

Thanks!  

I just asked a question regarding how to
 calculate
   the
memory requirements for a search.  Does this
   memory
only get used only during the search operation
   itself,
or is it referenced by the Hits object or
 anything
else after the actual search completes?

Thanks again,

Jim


--- Doug Cutting [EMAIL PROTECTED] wrote:

   James Dunn wrote:
   
   Also I search across about 50 fields but I
 don't
   
   use
   
   wildcard or range queries. 
   
   Lucene uses one byte of RAM per document per
   searched field, to hold the 
   normalization values.  So if you search a 10M
   document collection with 
   50 fields, then you'll end up using 500MB of
 RAM.
   
   If you're using unanalyzed fields, then an
 easy
   workaround to reduce the 
   number of fields is to combine many in a
 single
   field.  So, instead of, 
   e.g., using an f1 field with value abc,
 and an
   f2 field with value 
   efg, use a single field named f with
 values
   1_abc and 2_efg.
   
   We could optimize this in Lucene.  If no
 values of
   an indexed field are 
   analyzed, then we could store no norms for the
   field
   and hence read none 
   into memory.  This wouldn't be too hard to
   implement...
   
   Doug
   
   

   
  
 

-

   To unsubscribe, e-mail:
   [EMAIL PROTECTED]
   For additional commands, e-mail:
   [EMAIL PROTECTED]
   






__
Do you Yahoo!?
Friends.  Fun.  Try the all-new Yahoo!
 Messenger.
http://messenger.yahoo.com/ 

   
  
 

-
To unsubscribe, e-mail:
   [EMAIL PROTECTED]
For additional commands, e-mail:
   [EMAIL PROTECTED]

   
  
 

-
   To unsubscribe, e-mail:
   [EMAIL PROTECTED]
   For additional commands, e-mail:
   [EMAIL PROTECTED]
   
  
  
  
  
  
  __
  Do you Yahoo!?
  Friends.  Fun.  Try the all-new Yahoo! Messenger.
  http://messenger.yahoo.com/ 
  
 

-
  To unsubscribe, e-mail:
 [EMAIL PROTECTED]
  For additional commands, e-mail:
 [EMAIL PROTECTED]
  
 
 

-
 To unsubscribe, e-mail:
 [EMAIL PROTECTED]
 For additional commands, e-mail:
 [EMAIL PROTECTED]
 





__
Do you Yahoo!?
Friends.  Fun.  Try the all-new Yahoo! Messenger.
http://messenger.yahoo.com/ 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Range Query Sombody HELP please

2004-05-27 Thread Ype Kingma
On Thursday 27 May 2004 09:37, Karthik N S wrote:
 Hi
Lucene -Developer My main intention was

  Search for an word hit  in a Unique Field  between  ranges say
 book100  - book 200  indexed numbers
  It's something like creating a SUBSEARCH  with in the SEARCHINDEX.

You don't need to shout (uppercase), I've been teaching SQL.

Could you explain what you mean by subsearch?
I suppose you might want to have a look at the various filter classes
in the org.apache.lucene.search package.

Regards,
Ype


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Number query not working

2004-05-27 Thread Reece . 1247688
Thanks Erik!  That showed me the problem right away.

-Reece



--- Lucene
Users List [EMAIL PROTECTED] wrote:

On May 26, 2004, at 6:38
PM, [EMAIL PROTECTED] wrote:

  It looks like its because I'm
using the SimpleAnalyzer instead of the

  StandardAnalyzer.  What is the
SimpleAnalyzer to this query to make it 

  not

  work?

 

   http://wiki.apache.org/jakarta-lucene/AnalysisParalysis

 

 It is a good idea to analyze the analyzer.  Do a .toString output of


 the Query and you'll see clearly what happened.

 

   Erik

 

 

 

 

  Thanks,

  Reece

 

  --- Lucene Users List [EMAIL PROTECTED]

  wrote:

  Hi,

 

  I have a bunch of digits in a field.  When
I do this search

  it returns

  nothing:

 

myField:001085609805100

 

  It returns

  the correct document

  when I add a * to the
end like this:

 

myField:001085609805100*

  --

  added
the *

 

  I'm not sure what is happening here.  I'm thinking

 
that Lucene

  is doing some number conversion internally when it sees
only

  digits.  When

  I add the * maybe it presumes it is still a
string.

 

 

  How do I get a string

  of digits to work without
adding a *?

 

 

  Thanks,

  Reece

 

  -

 

  To unsubscribe, e-mail: [EMAIL PROTECTED]

  For

  additional commands, e-mail: [EMAIL PROTECTED]

 

 

 

 

  -

  To unsubscribe, e-mail: [EMAIL PROTECTED]


 For additional commands, e-mail: [EMAIL PROTECTED]




 

 -

 To unsubscribe, e-mail: [EMAIL PROTECTED]

 For
additional commands, e-mail: [EMAIL PROTECTED]

 

 



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Hits object

2004-05-27 Thread DMGoodstein
At one point I thought I'd read that a Hits object
doesn't actually contain Documents, but rather
references to them.  However, in that case I
wouldn't expect I could save a Hits object past the
closing of it's orginiating Searcher (in this case a
MultiSearcher:  Hits hits =
myMultiSearcher.search()).  yet later when I
access the same Hits object (having reinstantiated a
new MultiSearcher, myMultiSearcher2, but *not*
performing a new search) I can retrieve documents
from the Hits object without complaint.  Is this
just my good fortune that things haven't been
garbage-collected yet?  Or does the Hits object
contain the full document set?

--David 



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Tool for analyzing analyzers

2004-05-27 Thread markharw00d
I've knocked together this tool which automatically discovers Analyzers on the 
classpath and provides a GUI to allow you to try out different Analyzers and see their 
effects:

http://www.inperspective.com/lucene/Viewer.zip
This needs JDK1.4 and you'll need to define  the classpath to include Lucene and any 
of your custom analyzers.

Paste in some example text, take your pick of analyzer and hit the Analyze button to 
see the results.

Cheers
Mark



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Hits object

2004-05-27 Thread Erik Hatcher
Hits caches up to 200 HitDocs, which may contain the underlying 
Document.  I suspect you accessed a Document that had already been 
accessed and thus found something in the cache, and it did not have to 
get back to the underlying searcher.

Erik
On May 27, 2004, at 4:51 PM, [EMAIL PROTECTED] wrote:
At one point I thought I'd read that a Hits object
doesn't actually contain Documents, but rather
references to them.  However, in that case I
wouldn't expect I could save a Hits object past the
closing of it's orginiating Searcher (in this case a
MultiSearcher:  Hits hits =
myMultiSearcher.search()).  yet later when I
access the same Hits object (having reinstantiated a
new MultiSearcher, myMultiSearcher2, but *not*
performing a new search) I can retrieve documents
from the Hits object without complaint.  Is this
just my good fortune that things haven't been
garbage-collected yet?  Or does the Hits object
contain the full document set?
--David

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Tool for analyzing analyzers

2004-05-27 Thread Erik Hatcher
Mark,
Nice idea!  (I've had this type of thing on my to-do list for the 
Lucene demo refactoring that I *promise* I'll eventually get around 
to).

I tried to get it to work, though, and was unsuccessful.  It did not 
show me any Analyzers in the drop down (I have the latest CVS version 
of Lucene in my classpath).  Maybe this could be added into Luke as a 
new tab?  You can sort of fake this with Luke now, by entering your 
text as a query and seeing what it parses to, and select an Analyzer.

Erik
On May 27, 2004, at 6:45 PM, [EMAIL PROTECTED] wrote:
I've knocked together this tool which automatically discovers 
Analyzers on the classpath and provides a GUI to allow you to try out 
different Analyzers and see their effects:

http://www.inperspective.com/lucene/Viewer.zip
This needs JDK1.4 and you'll need to define  the classpath to include 
Lucene and any of your custom analyzers.

Paste in some example text, take your pick of analyzer and hit the 
Analyze button to see the results.

Cheers
Mark

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Hits object

2004-05-27 Thread DMGoodstein
so it sounds like I shouldn't rely on documents still being there in general.
--D


- Original Message -
From: Erik Hatcher [EMAIL PROTECTED]
Date: Thursday, May 27, 2004 5:04 pm
Subject: Re: Hits object

 Hits caches up to 200 HitDocs, which may contain the underlying 
 Document.  I suspect you accessed a Document that had already been 
 accessed and thus found something in the cache, and it did not 
 have to 
 get back to the underlying searcher.
 
   Erik
 
 On May 27, 2004, at 4:51 PM, [EMAIL PROTECTED] wrote:
 
  At one point I thought I'd read that a Hits object
  doesn't actually contain Documents, but rather
  references to them.  However, in that case I
  wouldn't expect I could save a Hits object past the
  closing of it's orginiating Searcher (in this case a
  MultiSearcher:  Hits hits =
  myMultiSearcher.search()).  yet later when I
  access the same Hits object (having reinstantiated a
  new MultiSearcher, myMultiSearcher2, but *not*
  performing a new search) I can retrieve documents
  from the Hits object without complaint.  Is this
  just my good fortune that things haven't been
  garbage-collected yet?  Or does the Hits object
  contain the full document set?
 
  --David
 
 
 
  -
 
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]
 
 
 ---
 --
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Range Query Sombody HELP please

2004-05-27 Thread Karthik N S
Hey Ype

 Apologies for the misconduct.

Weh we do a search in SQL  using '*' we all know that the result would be
total no of records in the table,but when  we want to get limit our record
we apply  range between 2 specific row records [Which we call it as
subsearch]

   Similarly  on a indexed  record  I would like perform the same tecnique
as above.
  In fact I was looking at the url u sent me in the last mail on using
getRange Queries
 and was working on the same

http://jakarta.apache.org/lucene/docs/queryparsersyntax.html

and

http://today.java.net/pub/a/today/2003/11/07/QueryParserRules.html

but witou results for the last 12 hrs.

If u could spare a few minuts and please expalin or provide a simple  [
full ] example using and
over riding the  getRange() method .

with regards
Karthik

-Original Message-
From: Ype Kingma [mailto:[EMAIL PROTECTED]
Sent: Thursday, May 27, 2004 11:03 PM
To: [EMAIL PROTECTED]
Subject: Re: Range Query Sombody HELP please


On Thursday 27 May 2004 09:37, Karthik N S wrote:
 Hi
Lucene -Developer My main intention was

  Search for an word hit  in a Unique Field  between  ranges say
 book100  - book 200  indexed numbers
  It's something like creating a SUBSEARCH  with in the SEARCHINDEX.

You don't need to shout (uppercase), I've been teaching SQL.

Could you explain what you mean by subsearch?
I suppose you might want to have a look at the various filter classes
in the org.apache.lucene.search package.

Regards,
Ype


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]