Re: Highlighting and InvalidTokenOffsetsException in Lucene 4.0
Scott Smith mainstreamdata.com> writes: > > I'm migrating code from Lucene 3.5 to 4.0. I have the following code which is supposed to highlight text. I > get the exception InvalidTokenOffsetsException. I have no idea what that means. I am using a custom > analyzer which seems to work for searching/indexing, so I assume it will work here (even though it took a > couple of "minor" changes to get it to compile in 4.0 This code used to work in 3.5. > > Anyone have any ideas? > > Scott > > Code fragment: > > try > { > ctf = new CachingTokenFilter(myCustomAnalyzer > .tokenStream(MyFieldName, new StringReader(myText))); > } > catch (IOException e1) > { > s_oLog.error("Search:markCommon: Exception creating CachingTokenFilter: " + > e1.getMessage()); > return null; > > } > String markedString; > SimpleHTMLFormatter formatter; > try > { > formatter = new SimpleHTMLFormatter(_zBeginHighlight, > _zEndHighlight); > Scorer score = new QueryScorer(q); > ht = new Highlighter(formatter, score); > ht.setTextFragmenter(new NullFragmenter()); > markedString = ht.getBestFragment(ctf, myText); > } > catch (IOException e) > { > s_oLog.error("Search:markCommon: Unable to highlight string: " > + e.getMessage()); > return null; > } > catch(InvalidTokenOffsetsException e2) > { > s_oLog.error("Search:markCommon: Unable to highlight string2: " > + e2.getMessage()); > return null; > } > > Hi Scott, did you resolve? I'm new to Lucene than I don't know if this can be of real help. CachingTokenFilter.reset() method is not calling your "custom" TokenStream reset() method. Did you tried: TokenStream ts = myCustomAnalyzer .tokenStream(MyFieldName, new StringReader(myText)); ts.reset(); ctf = new CachingTokenFilter(ts); probably there is a better way to use the ChachingTokenFilter. N. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
GroupingSearch return 0 when setAllGroups(true)
Is this a bug? when using group function, I set "groupingSearch.setAllGroups(true);", but it return groupCount = 0 I saw the source , I found that : if (allGroupHeads) { collectors.add(allGroupsCollector); } if (allGroupHeads) { collectors.add(allGroupHeadsCollector); } It seemed that it should use allGroup flag, not allGroupHeads. I am a newbie to Lucene, so I want to ask whether it is right? --- Confidentiality Notice: The information contained in this e-mail and any accompanying attachment(s) is intended only for the use of the intended recipient and may be confidential and/or privileged of Neusoft Corporation, its subsidiaries and/or its affiliates. If any reader of this communication is not the intended recipient, unauthorized use, forwarding, printing, storing, disclosure or copying is strictly prohibited, and may be unlawful.If you have received this communication in error,please immediately notify the sender by return e-mail, and delete the original message and all copies from your system. Thank you. ---
Re: GroupingSearch return 0 when setAllGroups(true)
That sure looks like a bug! Could you open a Jira issue ( https://issues.apache.org/jira/browse/LUCENE ) and post a patch / test case? Thanks! Mike McCandless http://blog.mikemccandless.com On Wed, Nov 28, 2012 at 7:53 AM, d...@neusoft.com wrote: > Is this a bug? > > when using group function, I set "groupingSearch.setAllGroups(true);", but it > return groupCount = 0 > > I saw the source , I found that : > > if (allGroupHeads) { > collectors.add(allGroupsCollector); > } > if (allGroupHeads) { > collectors.add(allGroupHeadsCollector); > } > > It seemed that it should use allGroup flag, not allGroupHeads. > > I am a newbie to Lucene, so I want to ask whether it is right? > --- > Confidentiality Notice: The information contained in this e-mail and any > accompanying attachment(s) > is intended only for the use of the intended recipient and may be > confidential and/or privileged of > Neusoft Corporation, its subsidiaries and/or its affiliates. If any reader of > this communication is > not the intended recipient, unauthorized use, forwarding, printing, storing, > disclosure or copying > is strictly prohibited, and may be unlawful.If you have received this > communication in error,please > immediately notify the sender by return e-mail, and delete the original > message and all copies from > your system. Thank you. > --- - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Using Lucene 2.3 indices with Lucene 4.0
Be aware that StandardAnalyzer changed slightly. This is particularly important if you use it to analyze email addresses and certain text-numeral combinations. My understanding is that the newer version of StandardAnalyzer is more consistent with what it should be doing but if you relied on its old functionality, that could bite you. There are two solutions that I am aware of: (1) Replace StandardAnalyzer with ClassicAnalyzer which I believe is the 'old' StandardAnalayzer before it was fixed. (2) Use StandardAnalyzer with Version_23 rather than Version_40. Cheers, Clive From: Ramprakash Ramamoorthy To: java-user@lucene.apache.org Sent: Tuesday, November 20, 2012 10:31 AM Subject: Re: Using Lucene 2.3 indices with Lucene 4.0 On Tue, Nov 20, 2012 at 3:54 PM, Danil ŢORIN wrote: > However behavior of some analyzers changed. > > So even after upgrade the old index is readable with 4.0, it doesn't mean > everything still works as before. > Thank you Torin, I am using the standard analyzer only and both the systems use Unicode 4.0 and I don't smell any problems here. > > On Tue, Nov 20, 2012 at 12:20 PM, Ian Lea wrote: > > > You can upgrade the indexes with org.apache.lucene.index.IndexUpgrader. > > You'll need to do it in steps, from 2.x to 3.x to 4.x, but should work > > fine as far as I know. > > > > > > -- > > Ian. > > > Thank you Ian, this is giving me some head starts. > > > > > > On Tue, Nov 20, 2012 at 10:16 AM, Ramprakash Ramamoorthy < > > youngestachie...@gmail.com> wrote: > > > > > I understand lucene 2.x indexes are not compatible with the latest > > version > > > of lucene 4.0. However we have all our indexes indexed with lucene 2.3. > > > > > > Now that we are planning to migrate to Lucene 4.0, is there any work > > > around/hack I can do, so that I can still read the 2.3 indices? Or is > > > forgoing the older indices the only option? > > > > > > P.S : Am afraid, Re-indexing is not feasible. > > > > > > -- > > > With Thanks and Regards, > > > Ramprakash Ramamoorthy, > > > Chennai, > > > India. > > > > > > -- With Thanks and Regards, Ramprakash Ramamoorthy, Engineer Trainee, Zoho Corporation. +91 9626975420
Re: sort by field and score
I revise the code to SortField sortField[] = {new SortField("id", new CustomComparatorSource(bitSet)),SortField.FIELD_SCORE}; Sort sort = new Sort(sortField); TopFieldCollector topFieldCollector = TopFieldCollector.create(sort, 1000, true, true, true, true); indexSearcher.search(query, topFieldCollector); TopDocs topDocs = topFieldCollector.topDocs(); but I got the same result with the previous code, need I custom the class TopFieldCollector? thank you lan 2012/11/27 Ian Lea > What are you getting for the scores? If it's NaN I think you'll need > to use a TopFieldCollector. See for example > http://www.gossamer-threads.com/lists/lucene/java-user/86309 > > > -- > Ian. > > > On Tue, Nov 27, 2012 at 3:51 AM, Andy Yu wrote: > > Hi All, > > > > > > Now I want to sort by a field and the relevance > > For example > > > > SortField sortField[] = {new SortField("id", new > > CustomComparatorSource(bitSet)),SortField.FIELD_SCORE}; > > Sort sort = new Sort(sortField); > > TopDocs topDocs = indexSearcher.search(query, 10,sort); > > > > if (0 < topDocs.totalHits) { > > for (ScoreDoc scoreDoc : topDocs.scoreDocs) { > > > > System.out.println(indexSearcher.doc(scoreDoc.doc).get("id")); > > System.out.println("score is " + scoreDoc.score); > > > > System.out.println(indexSearcher.doc(scoreDoc.doc).get("name")); > > } > > } > > > > I found that the search result sort just by [new SortField("id", new > > CustomComparatorSource(bitSet))] > > [SortField.FIELD_SCORE] does not work at all > > > > > > PS: my lucene version is 3.6 > > > > does anybodu know the reason or how to solve it ? > > > > > > Thanks , > > Andy > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >
Re: Does anyone have tips on managing cached filters?
On Wed, Nov 28, 2012 at 6:28 PM, Robert Muir wrote: > My point is really that lucene (especially clear in 4.0) assumes > indexreaders are immutable points in time. I don't think it makes sense for > us to provide any e.g. filtercaching or similar otherwise, because this is > a key simplification to the design. If you depart from this, by scoring or > filtering from mutable stuff outside the inverted index, things are likely > going to get complicated. Whereas it would be lovely to live in a land of rainbows and unicorns where all the data you ever want to use is in the text index and all filters can be written as a query, that simply isn't the case for us and I very much doubt we're not the only ones in this situation. Sure, things are complicated. Anything except the most trivial forum search application is complicated. Well, the situation as it stands now is that when a filter is invalidated, it happens across all stores which are currently open. That means that results are at least correct, but after invalidating a filter, a little more work than necessary is required to populate the cache again. For certain filters (like word lists) this is necessary anyway, since adding a word might invalidate any store. For others like tags, I was hoping there would be some way to selectively invalidate only certain readers. But it seems like that isn't the case, so I will probably have to add a third level of caching to cache these sorts of filter per-store instead of globally. TX - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Does anyone have tips on managing cached filters?
On Thu, Nov 29, 2012 at 4:57 PM, Trejkaz wrote: > doubt we're not Rats. Accidentally double-negatived that. I doubt we are the only ones. * TX - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org