date:20051019

Re: about numeric range searching with large value sets patches

2005-10-19 Thread Antoine Brun


Hello,

I juste made some junit test using some IntegerRangeQuery's, and I get 
some strange results.
I attached the junit test I used. It fails on the last assert. I am 
expecting only one result: the date between 1981 and 1983, but I get 2

The test output is:

term = date:1980
term = date:1982
term = date:1984
q = date=[1981 TO 1983]
Document>
Document>

Do you have any idea?

Antoine


Randy Puttick wrote:


My fault, I forgot to attach it.  I've added it now.  Let me know how
this works for you.

Randy Puttick

-Original Message-
From: Antoine Brun [mailto:[EMAIL PROTECTED] 
Sent: Friday, August 19, 2005 8:11 AM

To: java-dev@lucene.apache.org
Subject: about numeric range searching with large value sets patches

Hello,

I am trying to integrate the patches for the numeric range searching and

the org.apache.lucene.util.Sort class that was posted as an attachment 
imports a org.apache.lucene.util.IntStack which I can't find.

Can anyone add this class?
Maybe as an attachment to 
http://issues.apache.org/bugzilla/show_bug.cgi?id=36135


Thank you


  Antoine Brun

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


 



/*
 * Created on 15 avr. 2005
 */
package opsys.lucene.test.search;

import java.io.IOException;

import junit.framework.TestCase;
import opsys.lucene.server.search.IntegerRangeQuery;

import org.apache.lucene.analysis.WhitespaceAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.TermEnum;
import org.apache.lucene.search.Hits;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.store.RAMDirectory;

public class IntegerRangeQueryTestCase extends TestCase {

private RAMDirectory directory;
private IndexSearcher searcher;

public void setUp() throws Exception {
directory = new RAMDirectory();

IndexWriter writer = new IndexWriter(directory, new WhitespaceAnalyzer(), true);
Document doc1 = new Document();
doc1.add(new Field("date", "1980", Field.Store.YES, Field.Index.TOKENIZED));
writer.addDocument(doc1);

Document doc2 = new Document();
doc2.add(new Field("date", "1982", Field.Store.YES, Field.Index.TOKENIZED));
writer.addDocument(doc2);

Document doc3 = new Document();
doc3.add(new Field("date", "1984", Field.Store.YES, Field.Index.TOKENIZED));
writer.addDocument(doc3);
writer.close();

searcher = new IndexSearcher(directory);
}

public void tearDown() throws Exception {
searcher.close();
}

/**
 * On cherche un term qui n'est pas das le document original mais pour lequel on a une liste de synonyme
 * @throws IOException
 */
public void testIntegerRangeQuery() throws IOException {

IndexReader ir = IndexReader.open(directory);
TermEnum te = ir.terms();
int i = 0;
while (te.next()) {
i++;
System.out.println("term = " + te.term());
}
assertEquals(3, i);

IntegerRangeQuery q = new IntegerRangeQuery("date", new Integer(1981), new Integer(1983), true);
System.out.println("q = " + q);
Hits hits = searcher.search(q);
for (int j=0; j-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Created: (LUCENE-457) FieldCacheImpl take advantage of term info already being sorted

2005-10-19 Thread Sam Hough (JIRA)

FieldCacheImpl take advantage of term info already being sorted
---

 Key: LUCENE-457
 URL: http://issues.apache.org/jira/browse/LUCENE-457
 Project: Lucene - Java
Type: Improvement
  Components: Search  
Versions: 1.4
Reporter: Sam Hough
Priority: Minor


FieldCacheImpl.getStrings could take advantage of term info already being sorted
lexically. Would it be possible to have a "index order" mode which returns an 
array
of ints rather than strings storing a scalar value that increments by one for 
each
new term. 

Presumably there would be a big memory profile advantage in not holding onto
the term value Strings and a lesser one in int comparison being slightly 
quicker than
String.compareTo.

Sorry if I have missed something obvious. I don't know the code very well.

Regards

Sam


-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-456) Duplicate hits and missing hits in sorted search

2005-10-19 Thread Martin Seitz (JIRA)

[ 
http://issues.apache.org/jira/browse/LUCENE-456?page=comments#action_12332465 ] 

Martin Seitz commented on LUCENE-456:
-

Yep! That did it. I applied the three patches (I had to modify them slightly 
for release 1.4.3) and it works - the test cases as well as my application.

> Duplicate hits and missing hits in sorted search
> 
>
>  Key: LUCENE-456
>  URL: http://issues.apache.org/jira/browse/LUCENE-456
>  Project: Lucene - Java
> Type: Bug
>   Components: Search
> Versions: 1.4
>  Environment: JDK 1.4.2_06, probably OS independant, testet on Solaris 8 and 
> Win2000
> Reporter: Martin Seitz
> Priority: Minor
>  Attachments: FieldDocSortedHitQueue_dups.txt, 
> TestCustomSearcherSort_1_4_3.java, TestCustomSearcherSort_HEAD.java
>
> If using a searcher that subclasses from IndexSearcher I get different result 
> sets (besides the ordering of course). The problem only occurrs if the 
> searcher is wrapped by (Parallel)MultiSearcher and the index is not too 
> small. The number of hits returned by un unsorted and a sorted search are 
> identical but the hits are referencing different documents. A closer look at 
> the result sets revealed that the sorted search returns duplicate hits.
> I created test cases for Lucene 1.4.3 as well as for the head release. The 
> problem showed up for both, the number of duplicates beeing bigger for the 
> head realease. The test cases are written for package 
> org.apache.lucene.search. There are messages describing the problem written 
> to the console. In order to see all those hints the asserts are commented 
> out. So dont't be confused if junit reports no errors. (Sorry, beeing a 
> novice user of the bug tracker I don't see any means to attach the test cases 
> on this screen. Let's see.)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-457) FieldCacheImpl take advantage of term info already being sorted

2005-10-19 Thread Yonik Seeley (JIRA)

[ 
http://issues.apache.org/jira/browse/LUCENE-457?page=comments#action_12332477 ] 

Yonik Seeley commented on LUCENE-457:
-

I've thought the same thing... StringIndex without storing the strings.
The actual string values are sometimes needed though... a MultiSearcher needs 
them in order to sort documents from multiple indicies.

> FieldCacheImpl take advantage of term info already being sorted
> ---
>
>  Key: LUCENE-457
>  URL: http://issues.apache.org/jira/browse/LUCENE-457
>  Project: Lucene - Java
> Type: Improvement
>   Components: Search
> Versions: 1.4
> Reporter: Sam Hough
> Priority: Minor

>
> FieldCacheImpl.getStrings could take advantage of term info already being 
> sorted
> lexically. Would it be possible to have a "index order" mode which returns an 
> array
> of ints rather than strings storing a scalar value that increments by one for 
> each
> new term. 
> Presumably there would be a big memory profile advantage in not holding onto
> the term value Strings and a lesser one in int comparison being slightly 
> quicker than
> String.compareTo.
> Sorry if I have missed something obvious. I don't know the code very well.
> Regards
> Sam

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-456) Duplicate hits and missing hits in sorted search

2005-10-19 Thread Luc Vanlerberghe (JIRA)

[ 
http://issues.apache.org/jira/browse/LUCENE-456?page=comments#action_12332478 ] 

Luc Vanlerberghe commented on LUCENE-456:
-

I applied the three patches as well and I also don't see any anomalies any more.

Perhaps the fact that there is a 'hidden' ultimate sorting key (based on the 
internal document number) for equal documents should be mentioned somewhere in 
the 
the documentation.  It's the most logical solution to make the sort stable 
without sacrificing speed.

If I understand correctly, that already existed for standalone IndexSearchers, 
but was 'forgotten' for (Parallel-)MultiSearchers.


> Duplicate hits and missing hits in sorted search
> 
>
>  Key: LUCENE-456
>  URL: http://issues.apache.org/jira/browse/LUCENE-456
>  Project: Lucene - Java
> Type: Bug
>   Components: Search
> Versions: 1.4
>  Environment: JDK 1.4.2_06, probably OS independant, testet on Solaris 8 and 
> Win2000
> Reporter: Martin Seitz
> Priority: Minor
>  Attachments: FieldDocSortedHitQueue_dups.txt, 
> TestCustomSearcherSort_1_4_3.java, TestCustomSearcherSort_HEAD.java
>
> If using a searcher that subclasses from IndexSearcher I get different result 
> sets (besides the ordering of course). The problem only occurrs if the 
> searcher is wrapped by (Parallel)MultiSearcher and the index is not too 
> small. The number of hits returned by un unsorted and a sorted search are 
> identical but the hits are referencing different documents. A closer look at 
> the result sets revealed that the sorted search returns duplicate hits.
> I created test cases for Lucene 1.4.3 as well as for the head release. The 
> problem showed up for both, the number of duplicates beeing bigger for the 
> head realease. The test cases are written for package 
> org.apache.lucene.search. There are messages describing the problem written 
> to the console. In order to see all those hints the asserts are commented 
> out. So dont't be confused if junit reports no errors. (Sorry, beeing a 
> novice user of the bug tracker I don't see any means to attach the test cases 
> on this screen. Let's see.)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-455) FieldsReader does not regard offset and position flags

2005-10-19 Thread Bernhard Messer (JIRA)

[ 
http://issues.apache.org/jira/browse/LUCENE-455?page=comments#action_12332492 ] 

Bernhard Messer commented on LUCENE-455:


Frank,

thanks for the patch. I've reviewed it and commited it.

Bernhard

> FieldsReader does not regard offset and position flags
> --
>
>  Key: LUCENE-455
>  URL: http://issues.apache.org/jira/browse/LUCENE-455
>  Project: Lucene - Java
> Type: Bug
>   Components: Index
> Versions: 1.9
> Reporter: Frank Steinmann
> Priority: Minor
>  Attachments: FieldsReader.java
>
> When creating a Field the FieldsReader looks at the storeTermVector flag of 
> the FieldInfo. If true Field.TermVector.YES is used as parameter. But it 
> should be checked if storeOffsetWithTermVector and 
> storePositionWithTermVector are set and Field.TermVector.WITH_OFFSETS, 
> ...WITH_POSITIONS, or ...WITH_POSITIONS_OFFSETS should be used as appropriate.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: about numeric range searching with large value sets patches

2005-10-19 Thread Chris Hostetter


I've never really looked at the IntegerRangeQuery submission, but if you
think you've found a bug, you should attach your test to the JIRA issue
that the orriginal patch bug has been migrated to, so that it's clear to
anyone looking at applying it that it may have problems...

http://issues.apache.org/jira/secure/ViewIssue.jspa?key=LUCENE-421



: Date: Wed, 19 Oct 2005 10:17:53 +0200
: From: Antoine Brun <[EMAIL PROTECTED]>
: Reply-To: java-dev@lucene.apache.org
: To: java-dev@lucene.apache.org
: Subject: Re: about numeric range searching with large value sets patches
:
: Hello,
:
: I juste made some junit test using some IntegerRangeQuery's, and I get
: some strange results.
: I attached the junit test I used. It fails on the last assert. I am
: expecting only one result: the date between 1981 and 1983, but I get 2
: The test output is:
:
: term = date:1980
: term = date:1982
: term = date:1984
: q = date=[1981 TO 1983]
: Document>
: Document>
:
: Do you have any idea?
:
: Antoine
:
:
: Randy Puttick wrote:
:
: >My fault, I forgot to attach it.  I've added it now.  Let me know how
: >this works for you.
: >
: >Randy Puttick
: >
: >-Original Message-
: >From: Antoine Brun [mailto:[EMAIL PROTECTED]
: >Sent: Friday, August 19, 2005 8:11 AM
: >To: java-dev@lucene.apache.org
: >Subject: about numeric range searching with large value sets patches
: >
: >Hello,
: >
: >I am trying to integrate the patches for the numeric range searching and
: >
: >the org.apache.lucene.util.Sort class that was posted as an attachment
: >imports a org.apache.lucene.util.IntStack which I can't find.
: >Can anyone add this class?
: >Maybe as an attachment to
: >http://issues.apache.org/bugzilla/show_bug.cgi?id=36135
: >
: >Thank you
: >
: >
: >   Antoine Brun
: >
: >-
: >To unsubscribe, e-mail: [EMAIL PROTECTED]
: >For additional commands, e-mail: [EMAIL PROTECTED]
: >
: >
: >-
: >To unsubscribe, e-mail: [EMAIL PROTECTED]
: >For additional commands, e-mail: [EMAIL PROTECTED]
: >
: >
: >
: >
:
:



-Hoss


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-457) FieldCacheImpl take advantage of term info already being sorted

2005-10-19 Thread Sam Hough (JIRA)

[ 
http://issues.apache.org/jira/browse/LUCENE-457?page=comments#action_12332532 ] 

Sam Hough commented on LUCENE-457:
--

Doh. Guess could do merge sort but does make it much less attractive.

Presumably making sorting algorithm better than O(N) for memory would
mean changes to storage format so be 2.0 thing :(

Thanks Yonik.

Guess this should be closed. I for one can't think of a way of making it work.

> FieldCacheImpl take advantage of term info already being sorted
> ---
>
>  Key: LUCENE-457
>  URL: http://issues.apache.org/jira/browse/LUCENE-457
>  Project: Lucene - Java
> Type: Improvement
>   Components: Search
> Versions: 1.4
> Reporter: Sam Hough
> Priority: Minor

>
> FieldCacheImpl.getStrings could take advantage of term info already being 
> sorted
> lexically. Would it be possible to have a "index order" mode which returns an 
> array
> of ints rather than strings storing a scalar value that increments by one for 
> each
> new term. 
> Presumably there would be a big memory profile advantage in not holding onto
> the term value Strings and a lesser one in int comparison being slightly 
> quicker than
> String.compareTo.
> Sorry if I have missed something obvious. I don't know the code very well.
> Regards
> Sam

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: about numeric range searching with large value sets patches

[jira] Created: (LUCENE-457) FieldCacheImpl take advantage of term info already being sorted

[jira] Commented: (LUCENE-456) Duplicate hits and missing hits in sorted search

[jira] Commented: (LUCENE-457) FieldCacheImpl take advantage of term info already being sorted

[jira] Commented: (LUCENE-456) Duplicate hits and missing hits in sorted search

[jira] Commented: (LUCENE-455) FieldsReader does not regard offset and position flags

Re: about numeric range searching with large value sets patches

[jira] Commented: (LUCENE-457) FieldCacheImpl take advantage of term info already being sorted

8 matches

Site Navigation

Mail list logo

Footer information