Re: Class_for_HighFrequencyTerms
Dear Erick, I lokked for it and even added IndexReader.java and TermFreqVector.java from http://www.jarvana.com/jarvana/search?search_type=class&java_class=org.apache.lucene.index.IndexReader . But after adding the system indicated a lot of errors in the source code IndexReader.java (eg: DirectoryOwningReader cannot be resolved to a type, indexCommit cannot be resolved to a type, SegmentInfos cannot be resolved, TermEnum cannot be resolved to a type, etc.). I am using Lucene 2.9.1 and this particular website has listed this source code under 2.9.1 version of Lucene. What is the reason for this kind of scenario? Do I have to add another JAR file (in order to solve this even I added lucene-core-2.9.1-sources.jar, but nothing happened). Pls. be kind enough to make a reply. Tanks Manjula On Tue, May 11, 2010 at 1:26 AM, Erick Erickson wrote: > Have you looked at TermFreqVector? > > Best > Erick > > On Mon, May 10, 2010 at 8:10 AM, manjula wijewickrema > wrote: > > > Hi, > > > > If I index a document (single document) in Lucene, then how can I get the > > term frequencies (even the first and second highest occuring terms) of > that > > document? Is there any class/method to do taht? If anybody knows, pls. > help > > me. > > > > Thanks > > Manjula > > >
Re: best way to interest two queries?
Dear lucene experts, Let me try to make this precise since there was not answer. I have a query that's, about, a & b & c and I have a good search result. Now I want to know: a) for the first page, which matches are matches for a, b, or c b) for the remaining results (for the "tail"), are there matches of a, b, or c Thus far, I'd only know the usage of the highlighter to go to fields, it's not exactly the same and it's slow. I know I could use termDocs or another search-result for a,b, and c, probably to annotate my initial results list; that could work well for a). I still don't know what to do for b). thanks for hints. paul Le 31-mars-10 à 23:00, Paul Libbrecht a écrit : I've been wandering around but I see no solution yet: I would like to intersect two query results: going through the list of one query and indicating which ones actually match the other query or, even better, indicating that "passed this, nothing matches that query anymore". What should be the strategy? - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Class_for_HighFrequencyTerms
Sounds like your path is messed up and you're not using maven correctly. Start with the jar version that contains the class you require and use maven pom to correctly resolve dependencies Adam Sent using BlackBerry® from Orange -Original Message- From: manjula wijewickrema Date: Tue, 11 May 2010 15:13:12 To: Subject: Re: Class_for_HighFrequencyTerms Dear Erick, I lokked for it and even added IndexReader.java and TermFreqVector.java from http://www.jarvana.com/jarvana/search?search_type=class&java_class=org.apache.lucene.index.IndexReader . But after adding the system indicated a lot of errors in the source code IndexReader.java (eg: DirectoryOwningReader cannot be resolved to a type, indexCommit cannot be resolved to a type, SegmentInfos cannot be resolved, TermEnum cannot be resolved to a type, etc.). I am using Lucene 2.9.1 and this particular website has listed this source code under 2.9.1 version of Lucene. What is the reason for this kind of scenario? Do I have to add another JAR file (in order to solve this even I added lucene-core-2.9.1-sources.jar, but nothing happened). Pls. be kind enough to make a reply. Tanks Manjula On Tue, May 11, 2010 at 1:26 AM, Erick Erickson wrote: > Have you looked at TermFreqVector? > > Best > Erick > > On Mon, May 10, 2010 at 8:10 AM, manjula wijewickrema > wrote: > > > Hi, > > > > If I index a document (single document) in Lucene, then how can I get the > > term frequencies (even the first and second highest occuring terms) of > that > > document? Is there any class/method to do taht? If anybody knows, pls. > help > > me. > > > > Thanks > > Manjula > > >
Re: best way to interest two queries?
See https://issues.apache.org/jira/browse/LUCENE-1999 - Original Message From: Paul Libbrecht To: java-user@lucene.apache.org Sent: Tue, 11 May, 2010 10:52:14 Subject: Re: best way to interest two queries? Dear lucene experts, Let me try to make this precise since there was not answer. I have a query that's, about, a & b & c and I have a good search result. Now I want to know: a) for the first page, which matches are matches for a, b, or c b) for the remaining results (for the "tail"), are there matches of a, b, or c Thus far, I'd only know the usage of the highlighter to go to fields, it's not exactly the same and it's slow. I know I could use termDocs or another search-result for a,b, and c, probably to annotate my initial results list; that could work well for a). I still don't know what to do for b). thanks for hints. paul Le 31-mars-10 à 23:00, Paul Libbrecht a écrit : > I've been wandering around but I see no solution yet: I would like to > intersect two query results: going through the list of one query and > indicating which ones actually match the other query or, even better, > indicating that "passed this, nothing matches that query anymore". > > What should be the strategy? - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
External ValueSource and value mapping
Hi everyone, I am trying to implement something akin to Solr's ExternalFileSource backed by a Map object with plain lucene (as we are working on top of an existing solution) and while it is easy to write a ValueSource that does it I have a problem with the mapping phase. Basically I have tried two things: 1. keep a map of unique-ids for the documents, such as joe=10 john = 20 and at runtime retrieve the unique-key field and use that to find the value in the map 2. keep a map/array of _document ids_ as Solr's ExternalFileSource seems to do doc1=10 doc2=20 and at runtime use the document id in floatValue as the lookup. The problem I found is two-fold: the former solution seems to be pretty slow, probably because of the need to fetch a Field for every document involved in scoring, while the latter seems to be impossible: as far as I can tell, when ValueSource.getValues is called different index segments may be passed, meaning that the document id becomes a non-unique key. This also means I can't neithery precalculate this docIds/score mapping, ena neither can I cache them using a hybrid solution. Looking at solr sources, this seems to be solved using SolrIndexReader objects that have a #base attribute that can be used to offset the document id, but as I said, we are using plain old lucene's IndexReader objects and this seems impossible to replicate using only them. Is my assesent of the issue correct or am i missing something? If it is, does someone have a solution for this, or has seen this problem in the past and cares to share a workaround? Thanks in advance. -- blog en: http://www.riffraff.info blog it: http://riffraff.blogsome.com - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
spatial searches
Hi all, I hope someone can enlighten me. I am trying to figure out how spatial searches are to be implemented with Lucene. From walking through mailing lists and various web pages, looking at the JavaDoc and source code, I understand how the tiers work and how the search is limited by a special term query containing the ID(s) of the relevant grid cells. However, it still puzzles me how, where and when the final distance filtering takes place. I see three possibilities: the "Filter" class, the "ValueSourceQuery" or the use of a subclass of "Collector". With my limited understanding of the inner working of Lucene, it seems to me that the first two ways more or less operate on the whole document set, i.e. prior to the moment where the term query for the tiers comes into effect, rendering it useless. The "Collector" approach seems to be much more appropriate, but additionally to the decision whether the document meets the distance condition or not, I would like to have different scores depending on the distance (lower score for larger distances). Originally I thought that the solution would be some kind of subclass of "Query", but haven't seen any hints pointing in this direction and I don't know whether I am able to implement that on my own. I fear that I completely misunderstand something. Thanks in advance for any hints. Regards, Klaus - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
FieldCache and 2.9
Hi, I have been using the FieldCache in lucene version 2.9 compared to that in 2.4. The load time is massively decreased, however I am not seeing any benefit in getting a field cache after re-open of an index reader when I have only added a few extra documents. A small test class is included below (based off one from Lucid Imagination), that creates 5Mil docs, gets a field cache, creates another few docs and gets the field cache again. I though the second get would be very very fast, as only 1 segment should have changed, however it takes more time for the reopen and cache get than it did the original. Am I doing something wrong here or have I misunderstood the new segment changes? Thanks Carl import java.io.File; import org.apache.lucene.analysis.SimpleAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.index.IndexReader; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.search.FieldCache; import org.apache.lucene.store.Directory; import org.apache.lucene.store.FSDirectory; public class ContrivedFCTest { public static void main(String[] args) throws Exception { Directory dir = FSDirectory.open(new File(args[0])); IndexWriter writer = new IndexWriter(dir, new SimpleAnalyzer(), true, IndexWriter.MaxFieldLength.LIMITED); for (int i = 0; i < 500; i++) { if (i % 10 == 0) { System.out.println(i); } Document doc = new Document(); doc.add(new Field("field", "String" + i, Field.Store.NO, Field.Index.NOT_ANALYZED)); writer.addDocument(doc); } writer.close(); IndexReader reader = IndexReader.open(dir, true); long start = System.currentTimeMillis(); FieldCache.DEFAULT.getStrings(reader, "field"); long end = System.currentTimeMillis(); System.out.println("load time for initial field cache:" + (end - start) / 1000.0f + "s"); writer = new IndexWriter(dir, new SimpleAnalyzer(), false, IndexWriter.MaxFieldLength.LIMITED); for (int i = 501; i < 505; i++) { if (i % 10 == 0) { System.out.println(i); } Document doc = new Document(); doc.add(new Field("field", "String" + i, Field.Store.NO, Field.Index.NOT_ANALYZED)); writer.addDocument(doc); } writer.close(); IndexReader reader2 = reader.reopen(true); System.out.println("reader size = " + reader2.numDocs()); long start2 = System.currentTimeMillis(); FieldCache.DEFAULT.getStrings(reader2, "field"); long end2 = System.currentTimeMillis(); System.out.println("load time for re-opened field cache:" + (end2 - start2) / 1000.0f + "s"); } } This message should be regarded as confidential. If you have received this email in error please notify the sender and destroy it immediately. Statements of intent shall only become binding when confirmed in hard copy by an authorised signatory. The contents of this email may relate to dealings with other companies within the Detica Limited group of companies. Detica Limited is registered in England under No: 1337451. Registered offices: Surrey Research Park, Guildford, Surrey, GU2 7YP, England.
Re: FieldCache and 2.9
You are requesting the FieldCache entry from the top-level reader and hence a whole new FieldCache entry must be created. Lucene 2.9 sorting requests FieldCache entries at the segment level and hence reuses entries for those segments that haven't changed. -Yonik Apache Lucene Eurocon 2010 18-21 May 2010 | Prague On Tue, May 11, 2010 at 9:27 AM, Carl Austin wrote: > Hi, > > I have been using the FieldCache in lucene version 2.9 compared to that > in 2.4. The load time is massively decreased, however I am not seeing > any benefit in getting a field cache after re-open of an index reader > when I have only added a few extra documents. > A small test class is included below (based off one from Lucid > Imagination), that creates 5Mil docs, gets a field cache, creates > another few docs and gets the field cache again. I though the second get > would be very very fast, as only 1 segment should have changed, however > it takes more time for the reopen and cache get than it did the > original. > > Am I doing something wrong here or have I misunderstood the new segment > changes? > > Thanks > > Carl > > > import java.io.File; > > import org.apache.lucene.analysis.SimpleAnalyzer; > import org.apache.lucene.document.Document; > import org.apache.lucene.document.Field; > import org.apache.lucene.index.IndexReader; > import org.apache.lucene.index.IndexWriter; > import org.apache.lucene.search.FieldCache; > import org.apache.lucene.store.Directory; > import org.apache.lucene.store.FSDirectory; > > public class ContrivedFCTest { > > public static void main(String[] args) throws Exception { > Directory dir = FSDirectory.open(new File(args[0])); > IndexWriter writer = new IndexWriter(dir, new > SimpleAnalyzer(), true, > IndexWriter.MaxFieldLength.LIMITED); > for (int i = 0; i < 500; i++) { > if (i % 10 == 0) { > System.out.println(i); > } > Document doc = new Document(); > doc.add(new Field("field", "String" + i, > Field.Store.NO, > Field.Index.NOT_ANALYZED)); > writer.addDocument(doc); > } > writer.close(); > > IndexReader reader = IndexReader.open(dir, true); > long start = System.currentTimeMillis(); > FieldCache.DEFAULT.getStrings(reader, "field"); > long end = System.currentTimeMillis(); > System.out.println("load time for initial field cache:" > + (end - start) > / 1000.0f + "s"); > > writer = new IndexWriter(dir, new SimpleAnalyzer(), > false, > IndexWriter.MaxFieldLength.LIMITED); > for (int i = 501; i < 505; i++) { > if (i % 10 == 0) { > System.out.println(i); > } > Document doc = new Document(); > doc.add(new Field("field", "String" + i, > Field.Store.NO, > Field.Index.NOT_ANALYZED)); > writer.addDocument(doc); > } > writer.close(); > > IndexReader reader2 = reader.reopen(true); > System.out.println("reader size = " + > reader2.numDocs()); > long start2 = System.currentTimeMillis(); > FieldCache.DEFAULT.getStrings(reader2, "field"); > long end2 = System.currentTimeMillis(); > System.out.println("load time for re-opened field > cache:" > + (end2 - start2) / 1000.0f + "s"); > } > } > > This message should be regarded as confidential. If you have received this > email in error please notify the sender and destroy it immediately. > Statements of intent shall only become binding when confirmed in hard copy by > an authorised signatory. The contents of this email may relate to dealings > with other companies within the Detica Limited group of companies. > > Detica Limited is registered in England under No: 1337451. > > Registered offices: Surrey Research Park, Guildford, Surrey, GU2 7YP, England. > > - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
RE: FieldCache and 2.9
Ah, ok, thanks for that. I had hoped that the field cache would do this for me, by going through the subreaders itself. Is this likely to be done in a future release? I may have to implement some wrapper that does this anyway, and if so, can submit it as contrib if that would be useful? Thanks Carl -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley Sent: 11 May 2010 14:41 To: java-user@lucene.apache.org Subject: Re: FieldCache and 2.9 You are requesting the FieldCache entry from the top-level reader and hence a whole new FieldCache entry must be created. Lucene 2.9 sorting requests FieldCache entries at the segment level and hence reuses entries for those segments that haven't changed. -Yonik Apache Lucene Eurocon 2010 18-21 May 2010 | Prague On Tue, May 11, 2010 at 9:27 AM, Carl Austin wrote: > Hi, > > I have been using the FieldCache in lucene version 2.9 compared to > that in 2.4. The load time is massively decreased, however I am not > seeing any benefit in getting a field cache after re-open of an index > reader when I have only added a few extra documents. > A small test class is included below (based off one from Lucid > Imagination), that creates 5Mil docs, gets a field cache, creates > another few docs and gets the field cache again. I though the second > get would be very very fast, as only 1 segment should have changed, > however it takes more time for the reopen and cache get than it did > the original. > > Am I doing something wrong here or have I misunderstood the new > segment changes? > > Thanks > > Carl > > > import java.io.File; > > import org.apache.lucene.analysis.SimpleAnalyzer; > import org.apache.lucene.document.Document; > import org.apache.lucene.document.Field; import > org.apache.lucene.index.IndexReader; > import org.apache.lucene.index.IndexWriter; > import org.apache.lucene.search.FieldCache; > import org.apache.lucene.store.Directory; > import org.apache.lucene.store.FSDirectory; > > public class ContrivedFCTest { > > public static void main(String[] args) throws Exception { > Directory dir = FSDirectory.open(new File(args[0])); > IndexWriter writer = new IndexWriter(dir, new > SimpleAnalyzer(), true, > IndexWriter.MaxFieldLength.LIMITED); > for (int i = 0; i < 500; i++) { > if (i % 10 == 0) { > System.out.println(i); > } > Document doc = new Document(); > doc.add(new Field("field", "String" + i, > Field.Store.NO, > Field.Index.NOT_ANALYZED)); > writer.addDocument(doc); > } > writer.close(); > > IndexReader reader = IndexReader.open(dir, true); > long start = System.currentTimeMillis(); > FieldCache.DEFAULT.getStrings(reader, "field"); > long end = System.currentTimeMillis(); > System.out.println("load time for initial field cache:" > + (end - start) > / 1000.0f + "s"); > > writer = new IndexWriter(dir, new SimpleAnalyzer(), > false, > IndexWriter.MaxFieldLength.LIMITED); > for (int i = 501; i < 505; i++) { > if (i % 10 == 0) { > System.out.println(i); > } > Document doc = new Document(); > doc.add(new Field("field", "String" + i, > Field.Store.NO, > Field.Index.NOT_ANALYZED)); > writer.addDocument(doc); > } > writer.close(); > > IndexReader reader2 = reader.reopen(true); > System.out.println("reader size = " + > reader2.numDocs()); > long start2 = System.currentTimeMillis(); > FieldCache.DEFAULT.getStrings(reader2, "field"); > long end2 = System.currentTimeMillis(); > System.out.println("load time for re-opened field > cache:" > + (end2 - start2) / 1000.0f + "s"); > } > } > > This message should be regarded as confidential. If you have received this > email in error please notify the sender and destroy it immediately. > Statements of intent shall only become binding when confirmed in hard copy by > an authorised signatory. The contents of this email may relate to dealings > with other companies within the Detica Limited group of companies. > > Detica Limited is registered in England under No: 1337451. > > Registered offices: Surrey Research Park, Guildford, Surrey, GU2 7YP, England. > > - To unsubscribe, e-mai
is there some dangerous bug in lucene?
I have a problem. I found the store field in a document is not consistent. Here are some small case about my program. Field A = new Filed(Store.Yes,FieldAValue); FieldBValue.add(FieldAValue);// FiledBValue is a container that contains other store field value, FiledBValue is like a complete document record Field B = new Filed(Store.Yes,FieldBValue); Document doc = new Document; doc.add(A); doc.add(B); indexWriter.updateDocument(new Term(..),doc); after a long time , today some body found some bug. I observe that value of filed A is the old value, but the value of field B is the new and right value. At first I thought maybe it was the bug of indexwriter.getReader(), but after I restart the program, the bug is still existing. Finally I have to reconstruct all the data to fix it. Ps : I use FieldCache to store the value of field A, not field B I use indexwriter.getReader() to get realtime search I hope somebody to help me explain it.
Re: is there some dangerous bug in lucene?
> is there some dangerous bug in lucene? Highly unlikely. Much more likely that there is a bug in your code, perhaps somewhere in the confusing (to me, reading your uncompilable code snippet) cross linking of values between fields A and B. Or you've got duplicate docs in the index. Or something completely different. If you really do think it is a problem in lucene itself, or in your usage of lucene, I suggest that you break it down to the smallest possible self-contained test case or program that demonstrates the problem and post it here. And tell us what version of lucene you are using. Before doing that it would be worth using Luke to examine the index to double check that it holds what you thing it does. -- Ian. On Tue, May 11, 2010 at 3:20 PM, luocanrao wrote: > I have a problem. I found the store field in a document is not consistent. > > Here are some small case about my program. > > > > Field A = new Filed(Store.Yes,FieldAValue); > > FieldBValue.add(FieldAValue); // FiledBValue is a container that > contains other store field value, FiledBValue is like a complete document > record > > Field B = new Filed(Store.Yes,FieldBValue); > > Document doc = new Document; > > doc.add(A); doc.add(B); > > indexWriter.updateDocument(new Term(..),doc); > > > > > > after a long time , today some body found some bug. > > I observe that value of filed A is the old value, but the value of field B > is the new and right value. > > At first I thought maybe it was the bug of indexwriter.getReader(), > > but after I restart the program, the bug is still existing. > > Finally I have to reconstruct all the data to fix it. > > > > Ps : I use FieldCache to store the value of field A, not field B > > I use indexwriter.getReader() to get realtime search > > > > I hope somebody to help me explain it. > > - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: is there some dangerous bug in lucene?
Is it possible that you're looking at the deleted document? When you update a document you're actually deleting the old one and adding a new one If not, I second Ian's comment that a self-contained test case will be very useful. HTH Erick On Tue, May 11, 2010 at 11:25 AM, Ian Lea wrote: > > is there some dangerous bug in lucene? > > Highly unlikely. Much more likely that there is a bug in your code, > perhaps somewhere in the confusing (to me, reading your uncompilable > code snippet) cross linking of values between fields A and B. Or > you've got duplicate docs in the index. Or something completely > different. > > If you really do think it is a problem in lucene itself, or in your > usage of lucene, I suggest that you break it down to the smallest > possible self-contained test case or program that demonstrates the > problem and post it here. And tell us what version of lucene you are > using. Before doing that it would be worth using Luke to examine the > index to double check that it holds what you thing it does. > > > -- > Ian. > > > On Tue, May 11, 2010 at 3:20 PM, luocanrao > wrote: > > I have a problem. I found the store field in a document is not > consistent. > > > > Here are some small case about my program. > > > > > > > > Field A = new Filed(Store.Yes,FieldAValue); > > > > FieldBValue.add(FieldAValue);// FiledBValue is a container > that > > contains other store field value, FiledBValue is like a complete document > > record > > > > Field B = new Filed(Store.Yes,FieldBValue); > > > > Document doc = new Document; > > > > doc.add(A); doc.add(B); > > > > indexWriter.updateDocument(new Term(..),doc); > > > > > > > > > > > > after a long time , today some body found some bug. > > > > I observe that value of filed A is the old value, but the value of field > B > > is the new and right value. > > > > At first I thought maybe it was the bug of indexwriter.getReader(), > > > > but after I restart the program, the bug is still existing. > > > > Finally I have to reconstruct all the data to fix it. > > > > > > > > Ps : I use FieldCache to store the value of field A, not field B > > > > I use indexwriter.getReader() to get realtime search > > > > > > > > I hope somebody to help me explain it. > > > > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >
Location of HTMLStripCharFilter
Hi Everyone, and thanks in advance for the help. I downloaded the latest 4.0 dev release of the lucene/solr trunk. Everything seems to be fine except I can't for the life of me find the HTMLStripCharFilter Class. I've been poking around for awhile and I figure I'm missing something incredibly obvious, but I'm at the point where I have to lay myself at your mercy and ask for help. Thanks, Spence - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Location of HTMLStripCharFilter
Sorry everyone,, found it in modules. Please disregard. Thanks, Spence On Tue, May 11, 2010 at 10:42 AM, Spencer Tickner wrote: > Hi Everyone, and thanks in advance for the help. I downloaded the > latest 4.0 dev release of the lucene/solr trunk. Everything seems to > be fine except I can't for the life of me find the HTMLStripCharFilter > Class. I've been poking around for awhile and I figure I'm missing > something incredibly obvious, but I'm at the point where I have to lay > myself at your mercy and ask for help. > > Thanks, > > Spence > - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: is there some dangerous bug in lucene?
If you are using field cache for field A, and updating field A, isn't it normal that the field A is not updated? Field cache is keyed via index reader, it won't be efficient to reload the field cache for each updateDocument(). -- Chris Lu - Instant Scalable Full-Text Search On Any Database/Application site: http://www.dbsight.net demo: http://search.dbsight.com Lucene Database Search in 3 minutes: http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes DBSight customer, a shopping comparison site, (anonymous per request) got 2.6 Million Euro funding! On 5/11/2010 7:20 AM, luocanrao wrote: I have a problem. I found the store field in a document is not consistent. Here are some small case about my program. Field A = new Filed(Store.Yes,FieldAValue); FieldBValue.add(FieldAValue);// FiledBValue is a container that contains other store field value, FiledBValue is like a complete document record Field B = new Filed(Store.Yes,FieldBValue); Document doc = new Document; doc.add(A); doc.add(B); indexWriter.updateDocument(new Term(..),doc); after a long time , today some body found some bug. I observe that value of filed A is the old value, but the value of field B is the new and right value. At first I thought maybe it was the bug of indexwriter.getReader(), but after I restart the program, the bug is still existing. Finally I have to reconstruct all the data to fix it. Ps : I use FieldCache to store the value of field A, not field B I use indexwriter.getReader() to get realtime search I hope somebody to help me explain it. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: best way to interest two queries?
Very interesting, finding the field name is enough for me. What's cute is to wrap in this Flag-queries because, indeed, I don't want to know the details of each matched query just some of them. two terminology questions: - is multiplier in the mail mentioned there the same as boost? - I intended to use prefix and fuzzyqueries. I believe this is contradictory to this or? paul Le 11-mai-10 à 12:02, mark harwood a écrit : See https://issues.apache.org/jira/browse/LUCENE-1999 - Original Message From: Paul Libbrecht To: java-user@lucene.apache.org Sent: Tue, 11 May, 2010 10:52:14 Subject: Re: best way to interest two queries? Dear lucene experts, Let me try to make this precise since there was not answer. I have a query that's, about, a & b & c and I have a good search result. Now I want to know: a) for the first page, which matches are matches for a, b, or c b) for the remaining results (for the "tail"), are there matches of a, b, or c Thus far, I'd only know the usage of the highlighter to go to fields, it's not exactly the same and it's slow. I know I could use termDocs or another search-result for a,b, and c, probably to annotate my initial results list; that could work well for a). I still don't know what to do for b). thanks for hints. paul Le 31-mars-10 à 23:00, Paul Libbrecht a écrit : I've been wandering around but I see no solution yet: I would like to intersect two query results: going through the list of one query and indicating which ones actually match the other query or, even better, indicating that "passed this, nothing matches that query anymore". What should be the strategy? - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Lucene QueryParser and Analyzer
FYI: I opened a jira issue for this bug here: https://issues.apache.org/jira/browse/LUCENE-2458 On Thu, Apr 29, 2010 at 7:01 PM, Wei Ho wrote: > I think I've figured out what the problem is. Given the inputs, > > Input1: C1C2,C3C4,C5C6,C7,C8C9C10 > Input2: C1C2 C3C4 C5C6 C7 C8C9C10 > > Input1 gets parsed as > Query1: (text: "C1C2 C3C4 C5C6 C7 C8C9C10") > whereas Input2 gets parsed as > Query2: (text: "C1C2") (text: "C3C4") (text: "C5C6") (text: "C7") (text: > "C8C9C10") > > That is, Lucene constructs the query and then pass the query text through > the analyzer. Is there any way to > force QueryParser to pass the input string through the analyzer before > creating the query? That is, force Lucene > to create Query2 for both Input1 and Input2. > > Thanks, > Wei > > > Original Message > Subject: Re: Lucene QueryParser and Analyzer > From: Sudarsan, Sithu D. > To: java-user@lucene.apache.org > Date: 4/29/2010 4:54 PM >> >> ---sample code- >> Analyzer analyzer = new LingPipeAnalyzer(); Searcher searcher = new IndexSearcher(directory); QueryParser qParser = new MultiFieldQueryParser(Version.LUCENE_30, SEARCH_FIELDS, analyzer); Query query = qParser.parse(queryLine[1]); ScoreDoc[] results = searcher.search(query, TOP_N).scoreDocs; >> >> qParser will use the analyzer LingPipeAnalyzer() before forming the >> query. >> >> >> Sincerely, >> Sithu D Sudarsan >> >> >> -Original Message- >> From: Wei Ho [mailto:we...@princeton.edu] >> Sent: Thursday, April 29, 2010 4:44 PM >> To: java-user@lucene.apache.org >> Subject: Re: Lucene QueryParser and Analyzer >> >> Sorry, I guess "discarding the punctuation" was a bit misleading. >> I meant that given the two input strings, >> >> Input1: C1C2,C3C4,C5C6,C7,C8C9C10 >> Input2: C1C2 C3C4 C5C6 C7 C8C9C10 >> >> The analyzer I implemented tokenizes both Input1 and Input2 as "C1C2", >> "C3C4", "C5C6", "C7", "C8C9C10" - that is, it doesn't include the >> punctuation in the tokenization. I'm assuming that QueryParser is simply >> >> passing the entire input string to the analyzer and taking the tokens, >> in which case Input1 and Input2 should be considered identifcal. Does >> QueryParser doing any sort of pre-processing or filtering beforehand? If >> >> so, how can I turn it off? >> >> Aside from stopping tokens at punctuations, my analyzer is also doing >> Chinese word segmentation, so I'd like to be sure that QueryParser is >> using the analyzer the way I expect it to. >> >> Thanks, >> Wei >> >> >> >> Original Message >> Subject: Re: Lucene QueryParser and Analyzer >> From: Sudarsan, Sithu D. >> To: java-user@lucene.apache.org >> Date: 4/29/2010 4:08 PM >> >>> >>> If so, >>> >>> Input1: c1c2c3c4c5c6c7 >>> Input2: c1c2 c3c4 ... >>> >>> I guess, they are different! Add a whitespace after commas and see if >>> that works... >>> >>> Sincerely, >>> Sithu D Sudarsan >>> >>> >>> -Original Message- >>> From: Wei Ho [mailto:we...@princeton.edu] >>> Sent: Thursday, April 29, 2010 4:04 PM >>> To: java-user@lucene.apache.org >>> Subject: Re: Lucene QueryParser and Analyzer >>> >>> No, there is no whitespace after the comma in Input1 >>> >>> Input1: C1C2,C3C4,C5C6,C7,C8C9C10 >>> Input2: C1C2 C3C4 C5C6 C7 C8C9C10 >>> >>> Input1 is basically one big long word with commas and Chinese >>> >> >> characters >> >>> >>> one after the other. Input2 is where I manually separated the string >>> into the component terms by replacing the comma with whitespace. My >>> confusion stems from the fact that I thought it should not matter >>> >> >> since >> >>> >>> the analyzer should be discarding the punctuation anyway? So the >>> tokenization process should be the same for both Input1 and Input2? If >>> that is not the case, what do I need to change? >>> >>> Thanks, >>> Wei Ho >>> >>> Original Message >>> Subject: Re: Lucene QueryParser and Analyzer >>> From: Sudarsan, Sithu D. >>> To: java-user@lucene.apache.org >>> Date: 4/29/2010 3:54 PM >>> >>> Hi, Is there a whitespace after the comma? Sincerely, Sithu D Sudarsan -Original Message- From: Wei Ho [mailto:we...@princeton.edu] Sent: Thursday, April 29, 2010 3:51 PM To: java-user@lucene.apache.org Subject: Lucene QueryParser and Analyzer Hello, I'm using Lucene to index and search through a collection of Chinese documents. However, I'm noticing an odd behavior in query parsing/searching. Given the two queries below: (Ci refers to Chinese character i) Input1: C1C2,C3C4,C5C6,C7,C8C9C10 Input2: C1C2 C3C4 C5C6 C7 C8C9C10 Input1 returns absolutely nothing, while Input2 (replacing the commas with spaces) works as expected. I'm a bit confused why this would be happening - it seems that QueryParser uses the Analyzer passed to it >>> >>> to >>> >