Re: Highlighting and InvalidTokenOffsetsException in Lucene 4.0

2012-11-28 Thread nbuso
Scott Smith  mainstreamdata.com> writes:

> 
> I'm migrating code from Lucene 3.5 to 4.0.  I have the following code which is
supposed to highlight text.  I
> get the exception InvalidTokenOffsetsException.  I have no idea what that
means.  I am using a custom
> analyzer which seems to work for searching/indexing, so I assume it will work
here (even though it took a
> couple of "minor" changes to get it to compile in 4.0  This code used to work
in 3.5.
> 
> Anyone have any ideas?
> 
> Scott
> 
> Code fragment:
> 
> try
> {
> ctf = new CachingTokenFilter(myCustomAnalyzer
> .tokenStream(MyFieldName, new StringReader(myText)));
> }
> catch (IOException e1)
> {
> s_oLog.error("Search:markCommon: Exception creating
CachingTokenFilter: " +
> e1.getMessage());
> return null;
> 
> }
> String markedString;
> SimpleHTMLFormatter formatter;
> try
> {
> formatter = new SimpleHTMLFormatter(_zBeginHighlight,
> _zEndHighlight);
> Scorer score = new QueryScorer(q);
> ht = new Highlighter(formatter, score);
> ht.setTextFragmenter(new NullFragmenter());
> markedString = ht.getBestFragment(ctf, myText);
> }
> catch (IOException e)
> {
> s_oLog.error("Search:markCommon: Unable to highlight string: "
> + e.getMessage());
> return null;
> }
> catch(InvalidTokenOffsetsException e2)
> {
> s_oLog.error("Search:markCommon: Unable to highlight string2: "
> + e2.getMessage());
> return null;
> }
> 
> 

Hi Scott,

did you resolve? I'm new to Lucene than I don't know if this can be of real 
help.

CachingTokenFilter.reset() method is not calling your "custom" TokenStream
reset() method. Did you tried:


TokenStream ts = myCustomAnalyzer
  .tokenStream(MyFieldName, new StringReader(myText));
ts.reset();
ctf = new CachingTokenFilter(ts);


probably there is a better way to use the ChachingTokenFilter.


N.




-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



GroupingSearch return 0 when setAllGroups(true)

2012-11-28 Thread d...@neusoft.com
Is this a bug?

when using group function, I set "groupingSearch.setAllGroups(true);", but it 
return groupCount = 0

I saw the source , I found that :   

if (allGroupHeads) {
collectors.add(allGroupsCollector);
}
if (allGroupHeads) {
collectors.add(allGroupHeadsCollector);
}

It seemed that it should use allGroup flag, not allGroupHeads.

I am a newbie to Lucene, so I want to ask whether it is right?
---
Confidentiality Notice: The information contained in this e-mail and any 
accompanying attachment(s) 
is intended only for the use of the intended recipient and may be confidential 
and/or privileged of 
Neusoft Corporation, its subsidiaries and/or its affiliates. If any reader of 
this communication is 
not the intended recipient, unauthorized use, forwarding, printing,  storing, 
disclosure or copying 
is strictly prohibited, and may be unlawful.If you have received this 
communication in error,please 
immediately notify the sender by return e-mail, and delete the original message 
and all copies from 
your system. Thank you. 
---


Re: GroupingSearch return 0 when setAllGroups(true)

2012-11-28 Thread Michael McCandless
That sure looks like a bug!  Could you open a Jira issue (
https://issues.apache.org/jira/browse/LUCENE ) and post a patch / test
case?  Thanks!

Mike McCandless

http://blog.mikemccandless.com

On Wed, Nov 28, 2012 at 7:53 AM, d...@neusoft.com  wrote:
> Is this a bug?
>
> when using group function, I set "groupingSearch.setAllGroups(true);", but it 
> return groupCount = 0
>
> I saw the source , I found that :
>
> if (allGroupHeads) {
> collectors.add(allGroupsCollector);
> }
> if (allGroupHeads) {
> collectors.add(allGroupHeadsCollector);
> }
>
> It seemed that it should use allGroup flag, not allGroupHeads.
>
> I am a newbie to Lucene, so I want to ask whether it is right?
> ---
> Confidentiality Notice: The information contained in this e-mail and any 
> accompanying attachment(s)
> is intended only for the use of the intended recipient and may be 
> confidential and/or privileged of
> Neusoft Corporation, its subsidiaries and/or its affiliates. If any reader of 
> this communication is
> not the intended recipient, unauthorized use, forwarding, printing,  storing, 
> disclosure or copying
> is strictly prohibited, and may be unlawful.If you have received this 
> communication in error,please
> immediately notify the sender by return e-mail, and delete the original 
> message and all copies from
> your system. Thank you.
> ---

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Using Lucene 2.3 indices with Lucene 4.0

2012-11-28 Thread kiwi clive
Be aware that StandardAnalyzer changed slightly. This is particularly important 
if you use it to analyze email addresses and certain text-numeral combinations. 
My understanding is that the newer version of StandardAnalyzer is more 
consistent with what it should be doing but if you relied on its old 
functionality, that could bite you.

There are two solutions that I am aware of:
(1) Replace StandardAnalyzer with ClassicAnalyzer which I believe is the 'old' 
StandardAnalayzer before it was fixed.
(2) Use StandardAnalyzer with Version_23 rather than Version_40.

Cheers,
Clive




 From: Ramprakash Ramamoorthy 
To: java-user@lucene.apache.org 
Sent: Tuesday, November 20, 2012 10:31 AM
Subject: Re: Using Lucene 2.3 indices with Lucene 4.0
 
On Tue, Nov 20, 2012 at 3:54 PM, Danil ŢORIN  wrote:

> However behavior of some analyzers changed.
>
> So even after upgrade the old index is readable with 4.0, it doesn't mean
> everything still works as before.
>

Thank you Torin, I am using the standard analyzer only and both the systems
use Unicode 4.0 and I don't smell any problems here.

>
> On Tue, Nov 20, 2012 at 12:20 PM, Ian Lea  wrote:
>
> > You can upgrade the indexes with org.apache.lucene.index.IndexUpgrader.
> >  You'll need to do it in steps, from 2.x to 3.x to 4.x, but should work
> > fine as far as I know.
> >
> >
> > --
> > Ian.
> >
>
Thank you Ian, this is giving me some head starts.

> >
> >
> > On Tue, Nov 20, 2012 at 10:16 AM, Ramprakash Ramamoorthy <
> > youngestachie...@gmail.com> wrote:
> >
> > > I understand lucene 2.x indexes are not compatible with the latest
> > version
> > > of lucene 4.0. However we have all our indexes indexed with lucene 2.3.
> > >
> > > Now that we are planning to migrate to Lucene 4.0, is there any work
> > > around/hack I can do, so that I can still read the 2.3 indices? Or is
> > > forgoing the older indices the only option?
> > >
> > > P.S : Am afraid, Re-indexing is not feasible.
> > >
> > > --
> > > With Thanks and Regards,
> > > Ramprakash Ramamoorthy,
> > > Chennai,
> > > India.
> > >
> >
>



-- 
With Thanks and Regards,
Ramprakash Ramamoorthy,
Engineer Trainee,
Zoho Corporation.
+91 9626975420

Re: sort by field and score

2012-11-28 Thread Andy Yu
I revise the code to

SortField sortField[] = {new SortField("id", new
CustomComparatorSource(bitSet)),SortField.FIELD_SCORE};

Sort sort = new Sort(sortField);

TopFieldCollector topFieldCollector =
TopFieldCollector.create(sort, 1000, true, true, true, true);
indexSearcher.search(query, topFieldCollector);
TopDocs topDocs = topFieldCollector.topDocs();

but I got the same result with the previous code, need I custom the
class TopFieldCollector?

thank you lan


2012/11/27 Ian Lea 

> What are you getting for the scores?  If it's NaN I think you'll need
> to use a TopFieldCollector.  See for example
> http://www.gossamer-threads.com/lists/lucene/java-user/86309
>
>
> --
> Ian.
>
>
> On Tue, Nov 27, 2012 at 3:51 AM, Andy Yu  wrote:
> > Hi All,
> >
> >
> > Now  I want to sort by a field and the relevance
> > For example
> >
> > SortField sortField[] = {new SortField("id", new
> > CustomComparatorSource(bitSet)),SortField.FIELD_SCORE};
> > Sort sort = new Sort(sortField);
> > TopDocs topDocs = indexSearcher.search(query, 10,sort);
> >
> > if (0 < topDocs.totalHits) {
> > for (ScoreDoc scoreDoc : topDocs.scoreDocs) {
> >
> > System.out.println(indexSearcher.doc(scoreDoc.doc).get("id"));
> > System.out.println("score is " + scoreDoc.score);
> >
> >  System.out.println(indexSearcher.doc(scoreDoc.doc).get("name"));
> > }
> > }
> >
> > I found that the search result sort just by [new SortField("id", new
> > CustomComparatorSource(bitSet))]
> > [SortField.FIELD_SCORE] does not work at all
> >
> >
> > PS: my lucene version is 3.6
> >
> > does anybodu know the reason or how to solve it ?
> >
> >
> > Thanks ,
> > Andy
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


Re: Does anyone have tips on managing cached filters?

2012-11-28 Thread Trejkaz
On Wed, Nov 28, 2012 at 6:28 PM, Robert Muir  wrote:
> My point is really that lucene (especially clear in 4.0) assumes
> indexreaders are immutable points in time. I don't think it makes sense for
> us to provide any e.g. filtercaching or similar otherwise, because this is
> a key simplification to the design. If you depart from this, by scoring or
> filtering from mutable stuff outside the inverted index, things are likely
> going to get complicated.

Whereas it would be lovely to live in a land of rainbows and unicorns
where all the data you ever want to use is in the text index and all
filters can be written as a query, that simply isn't the case for us
and I very much doubt we're not the only ones in this situation.

Sure, things are complicated. Anything except the most trivial forum
search application is complicated.

Well, the situation as it stands now is that when a filter is
invalidated, it happens across all stores which are currently open.
That means that results are at least correct, but after invalidating a
filter, a little more work than necessary is required to populate the
cache again. For certain filters (like word lists) this is necessary
anyway, since adding a word might invalidate any store. For others
like tags, I was hoping there would be some way to selectively
invalidate only certain readers. But it seems like that isn't the
case, so I will probably have to add a third level of caching to cache
these sorts of filter per-store instead of globally.

TX

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Does anyone have tips on managing cached filters?

2012-11-28 Thread Trejkaz
On Thu, Nov 29, 2012 at 4:57 PM, Trejkaz  wrote:
> doubt we're not

Rats. Accidentally double-negatived that. I doubt we are the only ones. *

TX

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org