Re: Analyzing Advise
Luke Shannon wrote: But now that I'm looking at the API I'm not sure I can specifiy a different analyzer when creating a field. Is PerFieldAnalyzerWrapper what you're looking for? URL:http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/analysis/PerFieldAnalyzerWrapper.html Steve - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Problem searching Field.Keyword field
Why is there no KeywordAnalyzer? That is, an analyzer which doesn't mess with its input in any way, but just returns it as-is? I realize that under most circumstances, it would probably be more code to use it than just constructing a TermQuery, but having it would regularize query handling, and simplify new users' experience. And for the purposes of the PerFieldAnalyzerWrapper, it could be helpful. Steve Erik Hatcher wrote: Kelvin - I respectfully disagree - could you elaborate on why this is not an appropriate use of Field.Keyword? If the category is How To, Field.Text would split this (depending on the Analyzer) into how and to. If the user is selecting a category from a drop-down, though, you shouldn't be using QueryParser on it, but instead aggregating a TermQuery(category, How To) into a BooleanQuery with the rest of it. The rest may be other API created clauses and likely a piece from QueryParser. Erik On Feb 8, 2005, at 11:28 AM, Kelvin Tan wrote: As I posted previously, Field.Keyword is appropriate in only certain situations. For your use-case, I believe Field.Text is more suitable. k On Tue, 8 Feb 2005 10:02:19 -0600, Mike Miller wrote: This may or may not be correct, but I am indexing it as a keyword because I provide a (required) radio button on the add screen for the user to determine which category the document should be assigned. Then in the search, provide a dropdown that can be used in the advanced search so that they can search only for a specific category of documents (like HowTo, Troubleshooting, etc). -Original Message- From: Kelvin Tan [mailto:[EMAIL PROTECTED] Sent: Tuesday, February 08, 2005 9:32 AM To: Lucene Users List Subject: RE: Problem searching Field.Keyword field Mike, is there a reason why you're indexing category as keyword not text? k On Tue, 8 Feb 2005 08:26:13 -0600, Mike Miller wrote: Thanks for the quick response. Sorry for my lack of understanding, but I am learning! Won't the query parser still handle this query? My limited understanding was that the search call provides the 'all' field as default field for query terms in the case where fields aren't specified. Using the current code, searches like author:Mike and title:Lucene work fine. -Original Message- From: Miles Barr [mailto:[EMAIL PROTECTED] Sent: Tuesday, February 08, 2005 8:08 AM To: Lucene Users List Subject: Re: Problem searching Field.Keyword field You're using the query parser with the standard analyser. You should construct a term query manually instead. -- Miles Barr [EMAIL PROTECTED] Runtime Collective Ltd. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Optimize not deleting all files
Hi Patricio, Is it the case that the old index files are not removed from session to session, or only within the same session? The discussion below pertains to the latter case, that is, where the old index files are used in the same process as the files replacing them. I was having a similar problem, and tracked the source down to IndexReaders not being closed in my application. As far as I can tell, in order for IndexReaders to present a consistent view of an index while changes are being made to it, read-only copies of the index are kept around until all IndexReaders using them are closed. If any IndexReaders are open on the index, IndexWriters first make a copy, then operate on the copy. If you track down all of these open IndexReaders and close them before optimization, all of the old index files should be deleted. (Lucene Gurus, please correct this if I have misrepresented the situation). In my application, I had a bad interaction between IndexReader caching, garbage collection, and incremental indexing, in which a new IndexReader was being opened on an index after each indexing increment, without closing the already-opened IndexReaders. On Windows, operating-system level file locking caused by IndexReaders left open was disallowing index re-creation, because the IndexWriter wasn't allowed to delete the index files opened by the abandoned IndexReaders. In short, if you need to write to an index more than once in a single session, be sure to keep careful track of your IndexReaders. Hope it helps, Steve Patricio Keilty wrote: Hi Otis, tried version 1.4.3 without success, old index files still remain in the directory. Also tried not calling optimize(), and still getting the same behaviour, maybe our problem is not related to optimize() call at all. --p Otis Gospodnetic wrote: Get and try Lucene 1.4.3. One of the older versions had a bug that was not deleting old index files. Otis --- [EMAIL PROTECTED] wrote: Hi, When I run an optimize in our production environment, old index are left in the directory and are not deleted. My understanding is that an optimize will create new index files and all existing index files should be deleted. Is this correct? We are running Lucene 1.4.2 on Windows. Any help is appreciated. Thanks! - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Lucene docs
URL:http://wiki.apache.org/jakarta-lucene/IntroductionToLucene Ian McDonnell wrote: What is the best resource for beginners looking to understand Lucenes functionality, ie its use of fields, documents, the index reader and writer etc. is there any web resource that goes into details on the exact workings of it? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: worddoucments search
Hi Lisheng, You missed a fork in this topic posted on August 24th. It answers all your questions and debunks the textmining wraps POI myth: URL:http://www.mail-archive.com/[EMAIL PROTECTED]/msg09168.html Steve Zhang, Lisheng wrote: Hi Otis, I looked at textmining site, it seems to me textmining is a wrapper on the top of POI, so the basic features should be the same as POI, is this true? I have tested POI with lucene, in general it works fine, but I found sometimes it cannot process some MSDOC files created from old version. But if I just save the old DOC file by new Word on XP, eveything is fine. Thanks very much for helps, Lisheng -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: Tuesday, August 24, 2004 10:24 AM To: Lucene Users List Subject: Re: worddoucments search As I just answered in a separate email to Ryan - we used textmining.org library, too, as an example of something that is easier to use than POI. It's been a while since I wrote that chapter, so it slipped my mind when I replied. Yes, use textmining.org first, you'll be able to include it in your code in 2 minutes. Good stuff. Otis - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Introduction to Lucene [was Re: worddoucments search]
A collection of links to introductory level Lucene articles (including one in simplified Chinese and one in Turkish) is available on the Lucene Wiki at: URL:http://wiki.apache.org/jakarta-lucene/IntroductionToLucene Steve Otis Gospodnetic wrote: that part you have to do yourself. It is easy, just create a new Document, create an appropriate Field, give it a name and the string value you got with textmining.org library, then add the Field to your Document, and then add the Document to the index with IndexWriter. Look at one of the articles about Lucene to get started. I wrote one called something like Introduction to Text Indexing with Lucene. You probably want to read that one to get going. Otis --- Santosh [EMAIL PROTECTED] wrote: I have gon through textmining.org, I am able to extract text in string format. but how can I get it as lucene document format - Original Message - From: Otis Gospodnetic [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Tuesday, August 24, 2004 11:54 PM Subject: Re: worddoucments search As I just answered in a separate email to Ryan - we used textmining.orglibrary, too, as an example of something that is easier to use thanPOI. It's been a while since I wrote that chapter, so it slipped mymind when I replied. Yes, use textmining.org first, you'll be able toinclude it in your code in 2 minutes. Good stuff. Otis - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: question on setting boost factor
Repaired URL (was extra space before Similarity.html): http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/Similarity.html#coord(int,%20int) Corresponding Tiny URL: URL:http://tinyurl.com/3bo8y Erik Hatcher wrote: On Jun 22, 2004, at 7:30 AM, Anson Lau wrote: Hi guys, Lets say I want to search the term hello world over 3 fields with different boost: ((hello:field1 world:field1)^0.001 (hello:field2 world:field2)^100 (hello:field3 world:field3)^2)) Note I've given field1 a really low boost, a heavy boost to field2 and a REALLY heavy boost to field3. What is happening to me is that a term that matches both field1 and field2, will have a higher score than a term that matches field3 only, even though field3's boost is WAY higher. Can I change this behaviour such that the match in field3 only will actually have a higher score because of the boost? First step is to get familiar with the actual factors coming out in the IndexSearcher.explain() output (just System.out.println the Explanation object). The coord() factor - http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/ Similarity.html#coord(int,%20int) - is what you'll want to tweak to change how scores are affected when multiple terms match by creating your own DefaultSimilarity sublass (and probably just returning 1.0). Read the javadocs for Similarity to see how to hook in your own implementation (see also section). Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: escaping special characters while doing search doesn't seem to work
Hi Polina, Try this (jGuru Lucene FAQ item): URL:http://www.jguru.com/faq/view.jsp?EID=538308 Or, better yet, this (the Lucene Wiki AnalysisParalysis page): URL:http://wiki.apache.org/jakarta-lucene/AnalysisParalysis Steve Polina Litvak wrote: I was trying to search my index for a term of the form a*-b* (e.g. ABC-DEFG). While tracing the code I noticed that Lucene breaks this term into two terms, ABC and DEFG. To prevent this, I tried escaping the special character - with \ to form the term ABC\-DEFG and now Lucene search can't find this term in the index. Does anyone know of this already ? Is this a bug, or I am doing something wrong ? Thanks, Polina - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [Fwd: PROPOSAL: Lucene external content store for stored fields]
Kevin, I think that this sort of thing should be built on top of the functionality provided by the binary fields proposal, or at least made to work with it: URL:http://issues.apache.org/bugzilla/show_bug.cgi?id=29370 This would take care of the blob-vs.-text aspect of your proposal. Also: Kevin Burton wrote: Supporting full unicode is important. Full java.lang.String storage is used with String.getBytes() so we should be able to avoid unicode issues. If Java has a correct java.lang.String representation it's possible easily add unicode support just by serializing the byte representation. (Note that the JDK says that the DEFAULT system char encoding is used so if this is ever changed it might break the index) It's a bad idea to use the zero-parameter version of String.getBytes() (for example, what if you want to share an index between two platforms with different DEFAULT system char encodings?). Fortunately, there's a better alternative: for the suprisingly low price of String.getBytes(String charsetName), platform independence can be yours today. Steve - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Where does the name lucene come from?
Til Schneider wrote: Hi, Working now for a few months with this really great search engine, I was wondering where the name Lucene comes from? What does it mean? Is there any deeper sense? Doug Cutting's response: URL:http://tinyurl.com/2hh5c (full original URL: URL:http://issues.apache.org/eyebrowse/[EMAIL PROTECTED]msgId=961817 ) Otis, shouldn't this be an FAQ? Steve - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]