Re: Class_for_HighFrequencyTerms

2010-05-11 Thread manjula wijewickrema
Dear Erick,

I lokked for it and even added IndexReader.java and TermFreqVector.java
from
http://www.jarvana.com/jarvana/search?search_type=class&java_class=org.apache.lucene.index.IndexReader
.
But after adding the system indicated a lot of errors in the source code
IndexReader.java (eg: DirectoryOwningReader cannot be resolved to a
type, indexCommit
cannot be resolved to a type, SegmentInfos cannot be resolved, TermEnum
cannot be resolved to a type, etc.). I am using Lucene 2.9.1 and this
particular website has listed this source code under 2.9.1 version of
Lucene. What is the reason for this kind of scenario? Do I have to add
another JAR file (in order to solve this even I added
lucene-core-2.9.1-sources.jar, but nothing happened). Pls. be kind enough to
make a reply.

Tanks
Manjula

On Tue, May 11, 2010 at 1:26 AM, Erick Erickson wrote:

> Have you looked at TermFreqVector?
>
> Best
> Erick
>
> On Mon, May 10, 2010 at 8:10 AM, manjula wijewickrema
> wrote:
>
> > Hi,
> >
> > If I index a document (single document) in Lucene, then how can I get the
> > term frequencies (even the first and second highest occuring terms) of
> that
> > document? Is there any class/method to do taht? If anybody knows, pls.
> help
> > me.
> >
> > Thanks
> > Manjula
> >
>


Re: best way to interest two queries?

2010-05-11 Thread Paul Libbrecht

Dear lucene experts,

Let me try to make this precise since there was not answer.

I have a query that's, about,
  a & b & c
and I have a good search result.
Now I want to know:

a) for the first page, which matches are matches for a, b, or c
b) for the remaining results (for the "tail"), are there matches of a,  
b, or c


Thus far, I'd only know the usage of the highlighter to go to fields,  
it's not exactly the same and it's slow.
I know I could use termDocs or another search-result for a,b, and c,  
probably to annotate my initial results list; that could work well for  
a).


I still don't know what to do for b).

thanks for hints.

paul

Le 31-mars-10 à 23:00, Paul Libbrecht a écrit :
I've been wandering around but I see no solution yet: I would like  
to intersect two query results: going through the list of one query  
and indicating which ones actually match the other query or, even  
better, indicating that "passed this, nothing matches that query  
anymore".


What should be the strategy?




-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Class_for_HighFrequencyTerms

2010-05-11 Thread adam . saltiel
Sounds like your path is messed up and you're not using maven correctly. Start 
with the jar version that contains the class you require and use maven pom to 
correctly resolve dependencies 
Adam
Sent using BlackBerry® from Orange

-Original Message-
From: manjula wijewickrema 
Date: Tue, 11 May 2010 15:13:12 
To: 
Subject: Re: Class_for_HighFrequencyTerms

Dear Erick,

I lokked for it and even added IndexReader.java and TermFreqVector.java
from
http://www.jarvana.com/jarvana/search?search_type=class&java_class=org.apache.lucene.index.IndexReader
.
But after adding the system indicated a lot of errors in the source code
IndexReader.java (eg: DirectoryOwningReader cannot be resolved to a
type, indexCommit
cannot be resolved to a type, SegmentInfos cannot be resolved, TermEnum
cannot be resolved to a type, etc.). I am using Lucene 2.9.1 and this
particular website has listed this source code under 2.9.1 version of
Lucene. What is the reason for this kind of scenario? Do I have to add
another JAR file (in order to solve this even I added
lucene-core-2.9.1-sources.jar, but nothing happened). Pls. be kind enough to
make a reply.

Tanks
Manjula

On Tue, May 11, 2010 at 1:26 AM, Erick Erickson wrote:

> Have you looked at TermFreqVector?
>
> Best
> Erick
>
> On Mon, May 10, 2010 at 8:10 AM, manjula wijewickrema
> wrote:
>
> > Hi,
> >
> > If I index a document (single document) in Lucene, then how can I get the
> > term frequencies (even the first and second highest occuring terms) of
> that
> > document? Is there any class/method to do taht? If anybody knows, pls.
> help
> > me.
> >
> > Thanks
> > Manjula
> >
>



Re: best way to interest two queries?

2010-05-11 Thread mark harwood
See https://issues.apache.org/jira/browse/LUCENE-1999



- Original Message 
From: Paul Libbrecht 
To: java-user@lucene.apache.org
Sent: Tue, 11 May, 2010 10:52:14
Subject: Re: best way to interest two queries?

Dear lucene experts,

Let me try to make this precise since there was not answer.

I have a query that's, about,
  a & b & c
and I have a good search result.
Now I want to know:

a) for the first page, which matches are matches for a, b, or c
b) for the remaining results (for the "tail"), are there matches of a, b, or c

Thus far, I'd only know the usage of the highlighter to go to fields, it's not 
exactly the same and it's slow.
I know I could use termDocs or another search-result for a,b, and c, probably 
to annotate my initial results list; that could work well for a).

I still don't know what to do for b).

thanks for hints.

paul

Le 31-mars-10 à 23:00, Paul Libbrecht a écrit :
> I've been wandering around but I see no solution yet: I would like to 
> intersect two query results: going through the list of one query and 
> indicating which ones actually match the other query or, even better, 
> indicating that "passed this, nothing matches that query anymore".
> 
> What should be the strategy?



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org




-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



External ValueSource and value mapping

2010-05-11 Thread gabriele renzi
Hi everyone,

I am trying to implement something akin to  Solr's ExternalFileSource
backed by a Map object with plain lucene (as we are working on top of
an existing solution) and while it is easy to write a ValueSource that
does it I have a problem with the mapping phase.

Basically I have tried two things:
1. keep a map of  unique-ids for the documents, such as
 joe=10
 john = 20
and at runtime retrieve the unique-key field and use that to find the
value in the map
2. keep a map/array of _document ids_ as Solr's ExternalFileSource seems to do
 doc1=10
 doc2=20
and at runtime use the document id in floatValue as the lookup.

The problem I found is two-fold: the former solution seems to be
pretty slow, probably because of the need to fetch a Field for every
document involved in scoring, while the latter seems to be impossible:
as far as I can tell, when ValueSource.getValues is called different
index segments may be passed, meaning that the document id becomes a
non-unique key.
This also means I can't neithery precalculate this docIds/score
mapping, ena neither can I cache them using a hybrid solution.

 Looking at solr sources, this seems to be solved using
SolrIndexReader objects that have a #base attribute that can be used
to offset the document id, but as I said, we are using plain old
lucene's IndexReader objects and this seems impossible to replicate
using only them.

Is my assesent of the issue correct or am i missing something?
If it is, does someone have a solution for this, or has seen this
problem in the past and cares to share a workaround?

Thanks in advance.

-- 
blog en: http://www.riffraff.info
blog it: http://riffraff.blogsome.com

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



spatial searches

2010-05-11 Thread Klaus Malorny


Hi all,

I hope someone can enlighten me. I am trying to figure out how spatial searches 
are to be implemented with Lucene. From walking through mailing lists and 
various web pages, looking at the JavaDoc and source code, I understand how the 
tiers work and how the search is limited by a special term query containing the 
ID(s) of the relevant grid cells.


However, it still puzzles me how, where and when the final distance filtering 
takes place. I see three possibilities: the "Filter" class, the 
"ValueSourceQuery" or the use of a subclass of "Collector". With my limited 
understanding of the inner working of Lucene, it seems to me that the first two 
ways more or less operate on the whole document set, i.e. prior to the moment 
where the term query for the tiers comes into effect, rendering it useless. The 
"Collector" approach seems to be much more appropriate, but additionally to the 
decision whether the document meets the distance condition or not, I would like 
to have different scores depending on the distance (lower score for larger 
distances). Originally I thought that the solution would be some kind of 
subclass of "Query", but haven't seen any hints pointing in this direction and I 
don't know whether I am able to implement that on my own. I fear that I 
completely misunderstand something. Thanks in advance for any hints.


Regards,

Klaus

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



FieldCache and 2.9

2010-05-11 Thread Carl Austin
Hi,

I have been using the FieldCache in lucene version 2.9 compared to that
in 2.4. The load time is massively decreased, however I am not seeing
any benefit in getting a field cache after re-open of an index reader
when I have only added a few extra documents.
A small test class is included below (based off one from Lucid
Imagination), that creates 5Mil docs, gets a field cache, creates
another few docs and gets the field cache again. I though the second get
would be very very fast, as only 1 segment should have changed, however
it takes more time for the reopen and cache get than it did the
original.

Am I doing something wrong here or have I misunderstood the new segment
changes?

Thanks

Carl


import java.io.File;

import org.apache.lucene.analysis.SimpleAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.search.FieldCache;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;

public class ContrivedFCTest {

public static void main(String[] args) throws Exception {
Directory dir = FSDirectory.open(new File(args[0]));
IndexWriter writer = new IndexWriter(dir, new
SimpleAnalyzer(), true,
IndexWriter.MaxFieldLength.LIMITED);
for (int i = 0; i < 500; i++) {
if (i % 10 == 0) {
System.out.println(i);
}
Document doc = new Document();
doc.add(new Field("field", "String" + i,
Field.Store.NO,
Field.Index.NOT_ANALYZED));
writer.addDocument(doc);
}
writer.close();

IndexReader reader = IndexReader.open(dir, true);
long start = System.currentTimeMillis();
FieldCache.DEFAULT.getStrings(reader, "field");
long end = System.currentTimeMillis();
System.out.println("load time for initial field cache:"
+ (end - start)
/ 1000.0f + "s");

writer = new IndexWriter(dir, new SimpleAnalyzer(),
false,
IndexWriter.MaxFieldLength.LIMITED);
for (int i = 501; i < 505; i++) {
if (i % 10 == 0) {
System.out.println(i);
}
Document doc = new Document();
doc.add(new Field("field", "String" + i,
Field.Store.NO,
Field.Index.NOT_ANALYZED));
writer.addDocument(doc);
}
writer.close();

IndexReader reader2 = reader.reopen(true);
System.out.println("reader size = " +
reader2.numDocs());
long start2 = System.currentTimeMillis();
FieldCache.DEFAULT.getStrings(reader2, "field");
long end2 = System.currentTimeMillis();
System.out.println("load time for re-opened field
cache:"
+ (end2 - start2) / 1000.0f + "s");
}
}

This message should be regarded as confidential. If you have received this 
email in error please notify the sender and destroy it immediately.
Statements of intent shall only become binding when confirmed in hard copy by 
an authorised signatory.  The contents of this email may relate to dealings 
with other companies within the Detica Limited group of companies.

Detica Limited is registered in England under No: 1337451.

Registered offices: Surrey Research Park, Guildford, Surrey, GU2 7YP, England.



Re: FieldCache and 2.9

2010-05-11 Thread Yonik Seeley
You are requesting the FieldCache entry from the top-level reader and
hence a whole new FieldCache entry must be created.
Lucene 2.9 sorting requests FieldCache entries at the segment level
and hence reuses entries for those segments that haven't changed.

-Yonik
Apache Lucene Eurocon 2010
18-21 May 2010 | Prague



On Tue, May 11, 2010 at 9:27 AM, Carl Austin  wrote:
> Hi,
>
> I have been using the FieldCache in lucene version 2.9 compared to that
> in 2.4. The load time is massively decreased, however I am not seeing
> any benefit in getting a field cache after re-open of an index reader
> when I have only added a few extra documents.
> A small test class is included below (based off one from Lucid
> Imagination), that creates 5Mil docs, gets a field cache, creates
> another few docs and gets the field cache again. I though the second get
> would be very very fast, as only 1 segment should have changed, however
> it takes more time for the reopen and cache get than it did the
> original.
>
> Am I doing something wrong here or have I misunderstood the new segment
> changes?
>
> Thanks
>
> Carl
>
>
> import java.io.File;
>
> import org.apache.lucene.analysis.SimpleAnalyzer;
> import org.apache.lucene.document.Document;
> import org.apache.lucene.document.Field;
> import org.apache.lucene.index.IndexReader;
> import org.apache.lucene.index.IndexWriter;
> import org.apache.lucene.search.FieldCache;
> import org.apache.lucene.store.Directory;
> import org.apache.lucene.store.FSDirectory;
>
> public class ContrivedFCTest {
>
>        public static void main(String[] args) throws Exception {
>                Directory dir = FSDirectory.open(new File(args[0]));
>                IndexWriter writer = new IndexWriter(dir, new
> SimpleAnalyzer(), true,
>                                IndexWriter.MaxFieldLength.LIMITED);
>                for (int i = 0; i < 500; i++) {
>                        if (i % 10 == 0) {
>                                System.out.println(i);
>                        }
>                        Document doc = new Document();
>                        doc.add(new Field("field", "String" + i,
> Field.Store.NO,
>                                        Field.Index.NOT_ANALYZED));
>                        writer.addDocument(doc);
>                }
>                writer.close();
>
>                IndexReader reader = IndexReader.open(dir, true);
>                long start = System.currentTimeMillis();
>                FieldCache.DEFAULT.getStrings(reader, "field");
>                long end = System.currentTimeMillis();
>                System.out.println("load time for initial field cache:"
> + (end - start)
>                                / 1000.0f + "s");
>
>                writer = new IndexWriter(dir, new SimpleAnalyzer(),
> false,
>                                IndexWriter.MaxFieldLength.LIMITED);
>                for (int i = 501; i < 505; i++) {
>                        if (i % 10 == 0) {
>                                System.out.println(i);
>                        }
>                        Document doc = new Document();
>                        doc.add(new Field("field", "String" + i,
> Field.Store.NO,
>                                        Field.Index.NOT_ANALYZED));
>                        writer.addDocument(doc);
>                }
>                writer.close();
>
>                IndexReader reader2 = reader.reopen(true);
>                System.out.println("reader size = " +
> reader2.numDocs());
>                long start2 = System.currentTimeMillis();
>                FieldCache.DEFAULT.getStrings(reader2, "field");
>                long end2 = System.currentTimeMillis();
>                System.out.println("load time for re-opened field
> cache:"
>                                + (end2 - start2) / 1000.0f + "s");
>        }
> }
>
> This message should be regarded as confidential. If you have received this 
> email in error please notify the sender and destroy it immediately.
> Statements of intent shall only become binding when confirmed in hard copy by 
> an authorised signatory.  The contents of this email may relate to dealings 
> with other companies within the Detica Limited group of companies.
>
> Detica Limited is registered in England under No: 1337451.
>
> Registered offices: Surrey Research Park, Guildford, Surrey, GU2 7YP, England.
>
>

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



RE: FieldCache and 2.9

2010-05-11 Thread Carl Austin
Ah, ok, thanks for that. 
I had hoped that the field cache would do this for me, by going through the 
subreaders itself. Is this likely to be done in a future release?
I may have to implement some wrapper that does this anyway, and if so, can 
submit it as contrib if that would be useful?

Thanks

Carl

-Original Message-
From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
Sent: 11 May 2010 14:41
To: java-user@lucene.apache.org
Subject: Re: FieldCache and 2.9

You are requesting the FieldCache entry from the top-level reader and hence a 
whole new FieldCache entry must be created.
Lucene 2.9 sorting requests FieldCache entries at the segment level and hence 
reuses entries for those segments that haven't changed.

-Yonik
Apache Lucene Eurocon 2010
18-21 May 2010 | Prague



On Tue, May 11, 2010 at 9:27 AM, Carl Austin  wrote:
> Hi,
>
> I have been using the FieldCache in lucene version 2.9 compared to 
> that in 2.4. The load time is massively decreased, however I am not 
> seeing any benefit in getting a field cache after re-open of an index 
> reader when I have only added a few extra documents.
> A small test class is included below (based off one from Lucid 
> Imagination), that creates 5Mil docs, gets a field cache, creates 
> another few docs and gets the field cache again. I though the second 
> get would be very very fast, as only 1 segment should have changed, 
> however it takes more time for the reopen and cache get than it did 
> the original.
>
> Am I doing something wrong here or have I misunderstood the new 
> segment changes?
>
> Thanks
>
> Carl
>
>
> import java.io.File;
>
> import org.apache.lucene.analysis.SimpleAnalyzer;
> import org.apache.lucene.document.Document;
> import org.apache.lucene.document.Field; import 
> org.apache.lucene.index.IndexReader;
> import org.apache.lucene.index.IndexWriter;
> import org.apache.lucene.search.FieldCache;
> import org.apache.lucene.store.Directory;
> import org.apache.lucene.store.FSDirectory;
>
> public class ContrivedFCTest {
>
>        public static void main(String[] args) throws Exception {
>                Directory dir = FSDirectory.open(new File(args[0]));
>                IndexWriter writer = new IndexWriter(dir, new 
> SimpleAnalyzer(), true,
>                                IndexWriter.MaxFieldLength.LIMITED);
>                for (int i = 0; i < 500; i++) {
>                        if (i % 10 == 0) {
>                                System.out.println(i);
>                        }
>                        Document doc = new Document();
>                        doc.add(new Field("field", "String" + i, 
> Field.Store.NO,
>                                        Field.Index.NOT_ANALYZED));
>                        writer.addDocument(doc);
>                }
>                writer.close();
>
>                IndexReader reader = IndexReader.open(dir, true);
>                long start = System.currentTimeMillis();
>                FieldCache.DEFAULT.getStrings(reader, "field");
>                long end = System.currentTimeMillis();
>                System.out.println("load time for initial field cache:"
> + (end - start)
>                                / 1000.0f + "s");
>
>                writer = new IndexWriter(dir, new SimpleAnalyzer(), 
> false,
>                                IndexWriter.MaxFieldLength.LIMITED);
>                for (int i = 501; i < 505; i++) {
>                        if (i % 10 == 0) {
>                                System.out.println(i);
>                        }
>                        Document doc = new Document();
>                        doc.add(new Field("field", "String" + i, 
> Field.Store.NO,
>                                        Field.Index.NOT_ANALYZED));
>                        writer.addDocument(doc);
>                }
>                writer.close();
>
>                IndexReader reader2 = reader.reopen(true);
>                System.out.println("reader size = " + 
> reader2.numDocs());
>                long start2 = System.currentTimeMillis();
>                FieldCache.DEFAULT.getStrings(reader2, "field");
>                long end2 = System.currentTimeMillis();
>                System.out.println("load time for re-opened field 
> cache:"
>                                + (end2 - start2) / 1000.0f + "s");
>        }
> }
>
> This message should be regarded as confidential. If you have received this 
> email in error please notify the sender and destroy it immediately.
> Statements of intent shall only become binding when confirmed in hard copy by 
> an authorised signatory.  The contents of this email may relate to dealings 
> with other companies within the Detica Limited group of companies.
>
> Detica Limited is registered in England under No: 1337451.
>
> Registered offices: Surrey Research Park, Guildford, Surrey, GU2 7YP, England.
>
>

-
To unsubscribe, e-mai

is there some dangerous bug in lucene?

2010-05-11 Thread luocanrao
I have a problem. I found the store field in a document is not consistent.

Here are some small case about my program.

 

Field A = new Filed(Store.Yes,FieldAValue);

FieldBValue.add(FieldAValue);// FiledBValue is a container that
contains other store field value, FiledBValue is like a complete document
record

Field B = new Filed(Store.Yes,FieldBValue); 

Document doc = new Document;

doc.add(A); doc.add(B);

indexWriter.updateDocument(new Term(..),doc);

 

 

after a long time , today some body found some bug.

I observe that value of filed A is the old value, but the value of field B
is the new and right value.

At first I thought maybe it was the bug of indexwriter.getReader(), 

but after I restart the program, the bug is still existing.

Finally I have to reconstruct all the data to fix it.

 

Ps : I use FieldCache to store the value of field A, not field B

I use indexwriter.getReader() to get realtime search

 

I hope somebody to help me explain it. 



Re: is there some dangerous bug in lucene?

2010-05-11 Thread Ian Lea
> is there some dangerous bug in lucene?

Highly unlikely.  Much more likely that there is a bug in your code,
perhaps somewhere in the confusing (to me, reading your uncompilable
code snippet) cross linking of values between fields A and B.  Or
you've got duplicate docs in the index.  Or something completely
different.

If you really do think it is a problem in lucene itself, or in your
usage of lucene, I suggest that you break it down to the smallest
possible self-contained test case or program that demonstrates the
problem and post it here.  And tell us what version of lucene you are
using.  Before doing that it would be worth using Luke to examine the
index to double check that it holds what you thing it does.


--
Ian.


On Tue, May 11, 2010 at 3:20 PM, luocanrao  wrote:
> I have a problem. I found the store field in a document is not consistent.
>
> Here are some small case about my program.
>
>
>
> Field A = new Filed(Store.Yes,FieldAValue);
>
> FieldBValue.add(FieldAValue);            // FiledBValue is a container that
> contains other store field value, FiledBValue is like a complete document
> record
>
> Field B = new Filed(Store.Yes,FieldBValue);
>
> Document doc = new Document;
>
> doc.add(A); doc.add(B);
>
> indexWriter.updateDocument(new Term(..),doc);
>
>
>
>
>
> after a long time , today some body found some bug.
>
> I observe that value of filed A is the old value, but the value of field B
> is the new and right value.
>
> At first I thought maybe it was the bug of indexwriter.getReader(),
>
> but after I restart the program, the bug is still existing.
>
> Finally I have to reconstruct all the data to fix it.
>
>
>
> Ps : I use FieldCache to store the value of field A, not field B
>
> I use indexwriter.getReader() to get realtime search
>
>
>
> I hope somebody to help me explain it.
>
>

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: is there some dangerous bug in lucene?

2010-05-11 Thread Erick Erickson
Is it possible that you're looking at the deleted document? When you update
a document you're actually deleting the old one and adding a new one

If not, I second Ian's comment that a self-contained test case will be very
useful.

HTH
Erick

On Tue, May 11, 2010 at 11:25 AM, Ian Lea  wrote:

> > is there some dangerous bug in lucene?
>
> Highly unlikely.  Much more likely that there is a bug in your code,
> perhaps somewhere in the confusing (to me, reading your uncompilable
> code snippet) cross linking of values between fields A and B.  Or
> you've got duplicate docs in the index.  Or something completely
> different.
>
> If you really do think it is a problem in lucene itself, or in your
> usage of lucene, I suggest that you break it down to the smallest
> possible self-contained test case or program that demonstrates the
> problem and post it here.  And tell us what version of lucene you are
> using.  Before doing that it would be worth using Luke to examine the
> index to double check that it holds what you thing it does.
>
>
> --
> Ian.
>
>
> On Tue, May 11, 2010 at 3:20 PM, luocanrao 
> wrote:
> > I have a problem. I found the store field in a document is not
> consistent.
> >
> > Here are some small case about my program.
> >
> >
> >
> > Field A = new Filed(Store.Yes,FieldAValue);
> >
> > FieldBValue.add(FieldAValue);// FiledBValue is a container
> that
> > contains other store field value, FiledBValue is like a complete document
> > record
> >
> > Field B = new Filed(Store.Yes,FieldBValue);
> >
> > Document doc = new Document;
> >
> > doc.add(A); doc.add(B);
> >
> > indexWriter.updateDocument(new Term(..),doc);
> >
> >
> >
> >
> >
> > after a long time , today some body found some bug.
> >
> > I observe that value of filed A is the old value, but the value of field
> B
> > is the new and right value.
> >
> > At first I thought maybe it was the bug of indexwriter.getReader(),
> >
> > but after I restart the program, the bug is still existing.
> >
> > Finally I have to reconstruct all the data to fix it.
> >
> >
> >
> > Ps : I use FieldCache to store the value of field A, not field B
> >
> > I use indexwriter.getReader() to get realtime search
> >
> >
> >
> > I hope somebody to help me explain it.
> >
> >
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


Location of HTMLStripCharFilter

2010-05-11 Thread Spencer Tickner
Hi Everyone, and thanks in advance for the help. I downloaded the
latest 4.0 dev release of the lucene/solr trunk. Everything seems to
be fine except I can't for the life of me find the HTMLStripCharFilter
Class. I've been poking around for awhile and I figure I'm missing
something incredibly obvious, but I'm at the point where I have to lay
myself at your mercy and ask for help.

Thanks,

Spence

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Location of HTMLStripCharFilter

2010-05-11 Thread Spencer Tickner
Sorry everyone,, found it in modules. Please disregard.

Thanks,

Spence

On Tue, May 11, 2010 at 10:42 AM, Spencer Tickner
 wrote:
> Hi Everyone, and thanks in advance for the help. I downloaded the
> latest 4.0 dev release of the lucene/solr trunk. Everything seems to
> be fine except I can't for the life of me find the HTMLStripCharFilter
> Class. I've been poking around for awhile and I figure I'm missing
> something incredibly obvious, but I'm at the point where I have to lay
> myself at your mercy and ask for help.
>
> Thanks,
>
> Spence
>

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: is there some dangerous bug in lucene?

2010-05-11 Thread Chris Lu
If you are using field cache for field A, and updating field A, isn't it 
normal that the field A is not updated?


Field cache is keyed via index reader, it won't be efficient to reload 
the field cache for each updateDocument().


--
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes:
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per request) 
got 2.6 Million Euro funding!


On 5/11/2010 7:20 AM, luocanrao wrote:

I have a problem. I found the store field in a document is not consistent.

Here are some small case about my program.



Field A = new Filed(Store.Yes,FieldAValue);

FieldBValue.add(FieldAValue);// FiledBValue is a container that
contains other store field value, FiledBValue is like a complete document
record

Field B = new Filed(Store.Yes,FieldBValue);

Document doc = new Document;

doc.add(A); doc.add(B);

indexWriter.updateDocument(new Term(..),doc);





after a long time , today some body found some bug.

I observe that value of filed A is the old value, but the value of field B
is the new and right value.

At first I thought maybe it was the bug of indexwriter.getReader(),

but after I restart the program, the bug is still existing.

Finally I have to reconstruct all the data to fix it.



Ps : I use FieldCache to store the value of field A, not field B

I use indexwriter.getReader() to get realtime search



I hope somebody to help me explain it.


   



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: best way to interest two queries?

2010-05-11 Thread Paul Libbrecht

Very interesting, finding the field name is enough for me.
What's cute is to wrap in this Flag-queries because, indeed, I don't  
want to know the details of each matched query just some of them.


two terminology questions:

- is multiplier in the mail mentioned there the same as boost?

- I intended to use prefix and fuzzyqueries. I believe this is  
contradictory to this or?


paul


Le 11-mai-10 à 12:02, mark harwood a écrit :


See https://issues.apache.org/jira/browse/LUCENE-1999



- Original Message 
From: Paul Libbrecht 
To: java-user@lucene.apache.org
Sent: Tue, 11 May, 2010 10:52:14
Subject: Re: best way to interest two queries?

Dear lucene experts,

Let me try to make this precise since there was not answer.

I have a query that's, about,
 a & b & c
and I have a good search result.
Now I want to know:

a) for the first page, which matches are matches for a, b, or c
b) for the remaining results (for the "tail"), are there matches of  
a, b, or c


Thus far, I'd only know the usage of the highlighter to go to  
fields, it's not exactly the same and it's slow.
I know I could use termDocs or another search-result for a,b, and c,  
probably to annotate my initial results list; that could work well  
for a).


I still don't know what to do for b).

thanks for hints.

paul

Le 31-mars-10 à 23:00, Paul Libbrecht a écrit :
I've been wandering around but I see no solution yet: I would like  
to intersect two query results: going through the list of one query  
and indicating which ones actually match the other query or, even  
better, indicating that "passed this, nothing matches that query  
anymore".


What should be the strategy?




-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org




-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org




-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Lucene QueryParser and Analyzer

2010-05-11 Thread Robert Muir
FYI: I opened a jira issue for this bug here:
https://issues.apache.org/jira/browse/LUCENE-2458

On Thu, Apr 29, 2010 at 7:01 PM, Wei Ho  wrote:
> I think I've figured out what the problem is. Given the inputs,
>
> Input1: C1C2,C3C4,C5C6,C7,C8C9C10
> Input2: C1C2  C3C4  C5C6  C7  C8C9C10
>
> Input1 gets parsed as
> Query1: (text: "C1C2  C3C4  C5C6  C7  C8C9C10")
> whereas Input2 gets parsed as
> Query2: (text: "C1C2") (text: "C3C4") (text: "C5C6") (text: "C7") (text:
> "C8C9C10")
>
> That is, Lucene constructs the query and then pass the query text through
> the analyzer. Is there any way to
> force QueryParser to pass the input string through the analyzer before
> creating the query? That is, force Lucene
> to create Query2 for both Input1 and Input2.
>
> Thanks,
> Wei
>
>
>  Original Message  
> Subject: Re: Lucene QueryParser and Analyzer
> From: Sudarsan, Sithu D. 
> To: java-user@lucene.apache.org
> Date: 4/29/2010 4:54 PM
>>
>> ---sample code-
>>

 Analyzer analyzer = new LingPipeAnalyzer();
 Searcher searcher = new IndexSearcher(directory);
 QueryParser qParser = new MultiFieldQueryParser(Version.LUCENE_30,
 SEARCH_FIELDS, analyzer);
 Query query = qParser.parse(queryLine[1]);
 ScoreDoc[] results = searcher.search(query, TOP_N).scoreDocs;

>>
>> qParser will use the analyzer LingPipeAnalyzer() before forming the
>> query.
>>
>>
>> Sincerely,
>> Sithu D Sudarsan
>>
>>
>> -Original Message-
>> From: Wei Ho [mailto:we...@princeton.edu]
>> Sent: Thursday, April 29, 2010 4:44 PM
>> To: java-user@lucene.apache.org
>> Subject: Re: Lucene QueryParser and Analyzer
>>
>> Sorry, I guess "discarding the punctuation" was a bit misleading.
>> I meant that given the two input strings,
>>
>> Input1: C1C2,C3C4,C5C6,C7,C8C9C10
>> Input2: C1C2  C3C4  C5C6  C7  C8C9C10
>>
>> The analyzer I implemented tokenizes both Input1 and Input2 as "C1C2",
>> "C3C4", "C5C6", "C7", "C8C9C10" - that is, it doesn't include the
>> punctuation in the tokenization. I'm assuming that QueryParser is simply
>>
>> passing the entire input string to the analyzer and taking the tokens,
>> in which case Input1 and Input2 should be considered identifcal. Does
>> QueryParser doing any sort of pre-processing or filtering beforehand? If
>>
>> so, how can I turn it off?
>>
>> Aside from stopping tokens at punctuations, my analyzer is also doing
>> Chinese word segmentation, so I'd like to be sure that QueryParser is
>> using the analyzer the way I expect it to.
>>
>> Thanks,
>> Wei
>>
>>
>>
>>  Original Message  
>> Subject: Re: Lucene QueryParser and Analyzer
>> From: Sudarsan, Sithu D.
>> To: java-user@lucene.apache.org
>> Date: 4/29/2010 4:08 PM
>>
>>>
>>> If so,
>>>
>>> Input1:  c1c2c3c4c5c6c7
>>> Input2: c1c2 c3c4 ...
>>>
>>> I guess, they are different! Add a whitespace after commas and see if
>>> that works...
>>>
>>> Sincerely,
>>> Sithu D Sudarsan
>>>
>>>
>>> -Original Message-
>>> From: Wei Ho [mailto:we...@princeton.edu]
>>> Sent: Thursday, April 29, 2010 4:04 PM
>>> To: java-user@lucene.apache.org
>>> Subject: Re: Lucene QueryParser and Analyzer
>>>
>>> No, there is no whitespace after the comma in Input1
>>>
>>> Input1: C1C2,C3C4,C5C6,C7,C8C9C10
>>> Input2: C1C2  C3C4  C5C6  C7  C8C9C10
>>>
>>> Input1 is basically one big long word with commas and Chinese
>>>
>>
>> characters
>>
>>>
>>> one after the other. Input2 is where I manually separated the string
>>> into the component terms by replacing the comma with whitespace. My
>>> confusion stems from the fact that I thought it should not matter
>>>
>>
>> since
>>
>>>
>>> the analyzer should be discarding the punctuation anyway? So the
>>> tokenization process should be the same for both Input1 and Input2? If
>>> that is not the case, what do I need to change?
>>>
>>> Thanks,
>>> Wei Ho
>>>
>>>  Original Message  
>>> Subject: Re: Lucene QueryParser and Analyzer
>>> From: Sudarsan, Sithu D.
>>> To: java-user@lucene.apache.org
>>> Date: 4/29/2010 3:54 PM
>>>
>>>

 Hi,

 Is there a whitespace after the comma?


 Sincerely,
 Sithu D Sudarsan


 -Original Message-
 From: Wei Ho [mailto:we...@princeton.edu]
 Sent: Thursday, April 29, 2010 3:51 PM
 To: java-user@lucene.apache.org
 Subject: Lucene QueryParser and Analyzer

 Hello,

 I'm using Lucene to index and search through a collection of Chinese
 documents. However, I'm noticing an odd behavior in query
 parsing/searching.

 Given the two queries below:

 (Ci refers to Chinese character i)
 Input1: C1C2,C3C4,C5C6,C7,C8C9C10
 Input2: C1C2  C3C4  C5C6  C7  C8C9C10

 Input1 returns absolutely nothing, while Input2 (replacing the commas
 with spaces) works as expected. I'm a bit confused why this would be
 happening - it seems that QueryParser uses the Analyzer passed to it


>>>
>>> to
>>>
>