Need help for ordering results by specific order

2007-07-18 Thread savageboy

Hi,
I am newer for lucene.
I have a project for search engine by Lucene2.0. But near the project
finished, My boss want me to order the result by the sort blew:

the query likes '+content:aleden bob carray '

contentdate 
 
order
alden bob carray ... 2005/12/23   
 
1
alden... alden ... bob... bob... carray...   2005/12/01   
 
2
alden... alden ... bob... carray   2005/11/28 
   
3
alden... carray2005/12/24 
   
4
alden... bob2005/12/24

5

the meaning of the sort above is no matter how much the term match in the
field content, there will be met four satuations :3 matched,2
matched,1 matched,0 matched. In the 3 matched group, I need sorting
the result by it's date desc, and in the 2 matched group is same...

But I dont know HOW to get this results in Lucene...
Should I override the method of scoring? (tf(t in d) term in field,idf(t)
inverse doc frequence)
Could you give me some references about it?

I am really stucked, and Need You help!!


-- 
View this message in context: 
http://www.nabble.com/Need-help-for-ordering-results-by-specific-order-tf4101844.html#a11664583
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[EMAIL PROTECTED]: Project lucene-java (in module lucene-java) failed

2007-07-18 Thread Jason van Zyl
To whom it may engage...

This is an automated request, but not an unsolicited one. For 
more information please visit http://gump.apache.org/nagged.html, 
and/or contact the folk at [EMAIL PROTECTED]

Project lucene-java has an issue affecting its community integration.
This issue affects 3 projects,
 and has been outstanding for 3 runs.
The current state of this project is 'Failed', with reason 'Build Failed'.
For reference only, the following projects are affected by this:
- eyebrowse :  Web-based mail archive browsing
- jakarta-lucene :  Java Based Search Engine
- lucene-java :  Java Based Search Engine


Full details are available at:
http://vmgump.apache.org/gump/public/lucene-java/lucene-java/index.html

That said, some information snippets are provided here.

The following annotations (debug/informational/warning/error messages) were 
provided:
 -DEBUG- Sole output [lucene-core-18072007.jar] identifier set to project name
 -DEBUG- Dependency on javacc exists, no need to add for property javacc.home.
 -INFO- Failed with reason build failed
 -INFO- Failed to extract fallback artifacts from Gump Repository



The following work was performed:
http://vmgump.apache.org/gump/public/lucene-java/lucene-java/gump_work/build_lucene-java_lucene-java.html
Work Name: build_lucene-java_lucene-java (Type: Build)
Work ended in a state of : Failed
Elapsed: 33 secs
Command Line: /usr/lib/jvm/java-1.5.0-sun/bin/java -Djava.awt.headless=true 
-Xbootclasspath/p:/srv/gump/public/workspace/xml-commons/java/external/build/xml-apis.jar:/srv/gump/public/workspace/xml-xerces2/build/xercesImpl.jar
 org.apache.tools.ant.Main -Dgump.merge=/srv/gump/public/gump/work/merge.xml 
-Dbuild.sysclasspath=only -Dversion=18072007 
-Djavacc.home=/srv/gump/packages/javacc-3.1 package 
[Working Directory: /srv/gump/public/workspace/lucene-java]
CLASSPATH: 

Re: binary at the front of CHANGES.txt

2007-07-18 Thread DM Smith


On Jul 17, 2007, at 8:40 PM, Yonik Seeley wrote:


On 7/17/07, DM Smith [EMAIL PROTECTED] wrote:

According to the UTF-8 spec \uFEFF is not a BOM. In UTF-8 the byte
order is always the same.


But there is a BOM for UTF-8 (even though there is no endian
component, it does serve as a marker indicating the text file is
unicode text encoded in UTF-8).

http://unicode.org/faq/utf_bom.html#29


This is all rather academic at this point as you have fixed the problem.

I stand corrected \uFEFF (the code point) is the BOM for all UTF,  
with its representation differing by encoding. But UTF-8 byte order  
is always the same, regardless of the presence of the BOM.


According to the Unicode 5.0 Standard book, Chapter 13, Section 13.6,  
the byte sequence of the BOM for UTF-8 is EF BB BF (3 bytes) and for  
UTF-16 it is FE FF or FF FE (2 bytes). It appears that the byte  
sequence is unique for each unicode representation.


See http://www.unicode.org/unicode/uni2book/ch13.pdf#BOM

I frequently will see FE FF at the beginning of UTF-8 files. I have  
only seen MS editors add this. This is wrong for UTF-8 files. I was  
assuming that this was the junk at the beginning of the file.


But, the junk at the beginning of the file was C2 BF. Not at all sure  
what this would be.







-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Resolved: (LUCENE-960) SpanQueryFilter addition

2007-07-18 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll resolved LUCENE-960.


   Resolution: Fixed
Lucene Fields: [Patch Available]  (was: [Patch Available, New])

I committed this on revision 557105.  Leaving it open for a few more days.  
This constitutes all new classes, so no back-compatibility issues, etc.

 SpanQueryFilter addition
 

 Key: LUCENE-960
 URL: https://issues.apache.org/jira/browse/LUCENE-960
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: SpanQueryFilter.patch


 Similar to the QueryFilter (or whatever it is called now) the SpanQueryFilter 
 is a regular Lucene Filter, but it also can return Spans-like information.  
 This is useful if you not only want to filter based on a Query, but you then 
 want to be able to compare how a given match from a new query compared to the 
 positions of the filtered SpanQuery.  Patch to come shortly also contains a 
 caching mechanism for the SpanQueryFilter

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Assigned: (LUCENE-961) RegexCapabilities is not Serializable

2007-07-18 Thread Erik Hatcher (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Hatcher reassigned LUCENE-961:
---

Assignee: Erik Hatcher

 RegexCapabilities is not Serializable
 -

 Key: LUCENE-961
 URL: https://issues.apache.org/jira/browse/LUCENE-961
 Project: Lucene - Java
  Issue Type: Bug
  Components: QueryParser
Affects Versions: 2.2
Reporter: Konrad Rokicki
Assignee: Erik Hatcher
Priority: Minor

 The class RegexQuery is marked Serializable by its super class, but it 
 contains a RegexCapabilities which is not Serializable. Thus attempting to 
 serialize the query results in an exception. 
 Making RegexCapabilities serializable should be no problem since its 
 subclasses contain only serializable classes (java.util.regex.Pattern and 
 org.apache.regexp.RE).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: binary at the front of CHANGES.txt

2007-07-18 Thread Yonik Seeley

On 7/18/07, DM Smith [EMAIL PROTECTED] wrote:

But, the junk at the beginning of the file was C2 BF. Not at all sure
what this would be.


As I said in my first reply, it *was* a UTF-8 BOM (look back at older
revisions), but I think one of my edits mangled it (I don't recall
what editor I used).

-Yonik

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-579) TermPositionVector offsets incorrect if indexed field has multiple values and one ends with non-term chars

2007-07-18 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513604
 ] 

Grant Ingersoll commented on LUCENE-579:


Can you provide a unit test for this?

 TermPositionVector offsets incorrect if indexed field has multiple values and 
 one ends with non-term chars
 --

 Key: LUCENE-579
 URL: https://issues.apache.org/jira/browse/LUCENE-579
 Project: Lucene - Java
  Issue Type: Bug
  Components: Analysis
Affects Versions: 1.9
Reporter: Keiron McCammon

 If you add multiple values for a field with term vector positions and offsets 
 enabled and one of the values ends with a non-term then the offsets for the 
 terms from subsequent values are wrong. For example (note the '.' in the 
 first value):
 IndexWriter writer = new IndexWriter(directory, new SimpleAnalyzer(), 
 true);
 Document doc = new Document();
 doc.add(new Field(, one., Field.Store.YES, Field.Index.TOKENIZED, 
 Field.TermVector.WITH_POSITIONS_OFFSETS));
 doc.add(new Field(, two, Field.Store.YES, Field.Index.TOKENIZED, 
 Field.TermVector.WITH_POSITIONS_OFFSETS));
 writer.addDocument(doc);
 writer.optimize();
 writer.close();
 IndexSearcher searcher = new IndexSearcher(directory);
 Hits hits = searcher.search(new MatchAllDocsQuery());
 Highlighter highlighter = new Highlighter(new SimpleHTMLFormatter(),
 new QueryScorer(new TermQuery(new Term(, camera)), 
 searcher.getIndexReader(), ));
 for (int i = 0; i  hits.length(); ++i) {
 TermPositionVector v = (TermPositionVector) 
 searcher.getIndexReader().getTermFreqVector(
 hits.id(i), );
 StringBuilder str = new StringBuilder();
 for (String s : hits.doc(i).getValues()) {
 str.append(s);
 str.append( );
 }
 
 System.out.println(str);
 TokenStream tokenStream = TokenSources.getTokenStream(v, false);
 String[] terms = v.getTerms();
 int[] freq = v.getTermFrequencies();
 for (int j = 0; j  terms.length; ++j) {
 System.out.print(terms[j] + : + freq[j] + :);
 
 int[] pos = v.getTermPositions(j);
 
 System.out.print(Arrays.toString(pos));
 
 TermVectorOffsetInfo[] offset = v.getOffsets(j); 
 for (int k = 0; k  offset.length; ++k) {
 
 System.out.print(:);
 
 System.out.print(str.substring(offset[k].getStartOffset(), 
 offset[k].getEndOffset()));
 }
 
 System.out.println();
 }
 }
 searcher.close();
 If I run the above I get:
 one:1:[0]:one
 two:1:[1]: tw
 Note that the offsets for the second term are off by 1.
 It seems to be that the length of the value that is stored is not taken into 
 account when calculating the offset for the fields of the next value.
 I noticed ths problem when using the highlight contrib package which can make 
 use of term vectors for highlighting. I also noticed that the offset for the 
 second string is +1 the end of the previous value, so when concatenating the 
 fields values to pass to the hgighlighter I add to append a ' ' character 
 after each string...which is quite useful, but not documented anywhere.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[EMAIL PROTECTED]: Project lucene-java (in module lucene-java) failed

2007-07-18 Thread Jason van Zyl
To whom it may engage...

This is an automated request, but not an unsolicited one. For 
more information please visit http://gump.apache.org/nagged.html, 
and/or contact the folk at [EMAIL PROTECTED]

Project lucene-java has an issue affecting its community integration.
This issue affects 3 projects.
The current state of this project is 'Failed', with reason 'Build Failed'.
For reference only, the following projects are affected by this:
- eyebrowse :  Web-based mail archive browsing
- jakarta-lucene :  Java Based Search Engine
- lucene-java :  Java Based Search Engine


Full details are available at:
http://vmgump.apache.org/gump/public/lucene-java/lucene-java/index.html

That said, some information snippets are provided here.

The following annotations (debug/informational/warning/error messages) were 
provided:
 -DEBUG- Sole output [lucene-core-18072007.jar] identifier set to project name
 -DEBUG- Dependency on javacc exists, no need to add for property javacc.home.
 -INFO- Failed with reason build failed
 -INFO- Failed to extract fallback artifacts from Gump Repository



The following work was performed:
http://vmgump.apache.org/gump/public/lucene-java/lucene-java/gump_work/build_lucene-java_lucene-java.html
Work Name: build_lucene-java_lucene-java (Type: Build)
Work ended in a state of : Failed
Elapsed: 1 min 31 secs
Command Line: /usr/lib/jvm/java-1.5.0-sun/bin/java -Djava.awt.headless=true 
-Xbootclasspath/p:/srv/gump/public/workspace/xml-commons/java/external/build/xml-apis.jar:/srv/gump/public/workspace/xml-xerces2/build/xercesImpl.jar
 org.apache.tools.ant.Main -Dgump.merge=/srv/gump/public/gump/work/merge.xml 
-Dbuild.sysclasspath=only -Dversion=18072007 
-Djavacc.home=/srv/gump/packages/javacc-3.1 package 
[Working Directory: /srv/gump/public/workspace/lucene-java]
CLASSPATH: 
/usr/lib/jvm/java-1.5.0-sun/lib/tools.jar:/srv/gump/public/workspace/lucene-java/build/classes/java:/srv/gump/public/workspace/lucene-java/build/classes/demo:/srv/gump/public/workspace/lucene-java/build/classes/test:/srv/gump/public/workspace/lucene-java/contrib/db/bdb/lib/db-4.3.29.jar:/srv/gump/public/workspace/lucene-java/contrib/gdata-server/lib/gdata-client-1.0.jar:/srv/gump/public/workspace/lucene-java/build/contrib/analyzers/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/ant/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/benchmark/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/db/bdb/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/db/bdb-je/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/gdata-server/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/highlighter/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/javascript/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/lucli/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/memory/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/queries/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/regex/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/similarity/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/snowball/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/spellchecker/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/surround/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/swing/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/wordnet/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/xml-query-parser/classes/java:/srv/gump/public/workspace/ant/dist/lib/ant-jmf.jar:/srv/gump/public/workspace/ant/dist/lib/ant-swing.jar:/srv/gump/public/workspace/ant/dist/lib/ant-apache-resolver.jar:/srv/gump/public/workspace/ant/dist/lib/ant-trax.jar:/srv/gump/public/workspace/ant/dist/lib/ant-junit.jar:/srv/gump/public/workspace/ant/dist/lib/ant-launcher.jar:/srv/gump/public/workspace/ant/dist/lib/ant-nodeps.jar:/srv/gump/public/workspace/ant/dist/lib/ant.jar:/srv/gump/packages/junit3.8.1/junit.jar:/srv/gump/public/workspace/xml-commons/java/build/resolver.jar:/srv/gump/packages/je-1.7.1/lib/je.jar:/srv/gump/public/workspace/jakarta-commons/digester/dist/commons-digester.jar:/srv/gump/public/workspace/jakarta-regexp/build/jakarta-regexp-18072007.jar:/srv/gump/packages/javacc-3.1/bin/lib/javacc.jar:/srv/gump/public/workspace/jline/target/jline-0.9.92-SNAPSHOT.jar:/srv/gump/packages/jtidy-04aug2000r7-dev/build/Tidy.jar:/srv/gump/public/workspace/junit/dist/junit-18072007.jar:/srv/gump/public/workspace/xml-commons/java/external/build/xml-apis-ext.jar:/srv/gump/public/workspace/jakarta-commons/logging/target/commons-logging-18072007.jar:/srv/gump/public/workspace/jakarta-commons/logging/target/commons-logging-api-18072007.jar:/srv/gump/public/workspace/jakarta-servletapi-5/jsr154/dist/lib/servlet-api.jar:/srv/gump/packages/nekohtml-0.9.5/nekohtml.jar

[EMAIL PROTECTED]: Project lucene-java (in module lucene-java) failed

2007-07-18 Thread Jason van Zyl
To whom it may engage...

This is an automated request, but not an unsolicited one. For 
more information please visit http://gump.apache.org/nagged.html, 
and/or contact the folk at [EMAIL PROTECTED]

Project lucene-java has an issue affecting its community integration.
This issue affects 3 projects.
The current state of this project is 'Failed', with reason 'Build Failed'.
For reference only, the following projects are affected by this:
- eyebrowse :  Web-based mail archive browsing
- jakarta-lucene :  Java Based Search Engine
- lucene-java :  Java Based Search Engine


Full details are available at:
http://vmgump.apache.org/gump/public/lucene-java/lucene-java/index.html

That said, some information snippets are provided here.

The following annotations (debug/informational/warning/error messages) were 
provided:
 -DEBUG- Sole output [lucene-core-18072007.jar] identifier set to project name
 -DEBUG- Dependency on javacc exists, no need to add for property javacc.home.
 -INFO- Failed with reason build failed
 -INFO- Failed to extract fallback artifacts from Gump Repository



The following work was performed:
http://vmgump.apache.org/gump/public/lucene-java/lucene-java/gump_work/build_lucene-java_lucene-java.html
Work Name: build_lucene-java_lucene-java (Type: Build)
Work ended in a state of : Failed
Elapsed: 1 min 31 secs
Command Line: /usr/lib/jvm/java-1.5.0-sun/bin/java -Djava.awt.headless=true 
-Xbootclasspath/p:/srv/gump/public/workspace/xml-commons/java/external/build/xml-apis.jar:/srv/gump/public/workspace/xml-xerces2/build/xercesImpl.jar
 org.apache.tools.ant.Main -Dgump.merge=/srv/gump/public/gump/work/merge.xml 
-Dbuild.sysclasspath=only -Dversion=18072007 
-Djavacc.home=/srv/gump/packages/javacc-3.1 package 
[Working Directory: /srv/gump/public/workspace/lucene-java]
CLASSPATH: 
/usr/lib/jvm/java-1.5.0-sun/lib/tools.jar:/srv/gump/public/workspace/lucene-java/build/classes/java:/srv/gump/public/workspace/lucene-java/build/classes/demo:/srv/gump/public/workspace/lucene-java/build/classes/test:/srv/gump/public/workspace/lucene-java/contrib/db/bdb/lib/db-4.3.29.jar:/srv/gump/public/workspace/lucene-java/contrib/gdata-server/lib/gdata-client-1.0.jar:/srv/gump/public/workspace/lucene-java/build/contrib/analyzers/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/ant/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/benchmark/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/db/bdb/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/db/bdb-je/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/gdata-server/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/highlighter/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/javascript/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/lucli/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/memory/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/queries/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/regex/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/similarity/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/snowball/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/spellchecker/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/surround/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/swing/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/wordnet/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/xml-query-parser/classes/java:/srv/gump/public/workspace/ant/dist/lib/ant-jmf.jar:/srv/gump/public/workspace/ant/dist/lib/ant-swing.jar:/srv/gump/public/workspace/ant/dist/lib/ant-apache-resolver.jar:/srv/gump/public/workspace/ant/dist/lib/ant-trax.jar:/srv/gump/public/workspace/ant/dist/lib/ant-junit.jar:/srv/gump/public/workspace/ant/dist/lib/ant-launcher.jar:/srv/gump/public/workspace/ant/dist/lib/ant-nodeps.jar:/srv/gump/public/workspace/ant/dist/lib/ant.jar:/srv/gump/packages/junit3.8.1/junit.jar:/srv/gump/public/workspace/xml-commons/java/build/resolver.jar:/srv/gump/packages/je-1.7.1/lib/je.jar:/srv/gump/public/workspace/jakarta-commons/digester/dist/commons-digester.jar:/srv/gump/public/workspace/jakarta-regexp/build/jakarta-regexp-18072007.jar:/srv/gump/packages/javacc-3.1/bin/lib/javacc.jar:/srv/gump/public/workspace/jline/target/jline-0.9.92-SNAPSHOT.jar:/srv/gump/packages/jtidy-04aug2000r7-dev/build/Tidy.jar:/srv/gump/public/workspace/junit/dist/junit-18072007.jar:/srv/gump/public/workspace/xml-commons/java/external/build/xml-apis-ext.jar:/srv/gump/public/workspace/jakarta-commons/logging/target/commons-logging-18072007.jar:/srv/gump/public/workspace/jakarta-commons/logging/target/commons-logging-api-18072007.jar:/srv/gump/public/workspace/jakarta-servletapi-5/jsr154/dist/lib/servlet-api.jar:/srv/gump/packages/nekohtml-0.9.5/nekohtml.jar

Re: Need help for ordering results by specific order

2007-07-18 Thread Mathieu Lecarme
Have a look of the book Lucene in action, ch 6.1 : using custom  
sort method


SortComparatorSource might be your friend. Lucene selecting stuff,  
and you sort, just like you wont.


M.
Le 18 juil. 07 à 10:29, savageboy a écrit :



Hi,
I am newer for lucene.
I have a project for search engine by Lucene2.0. But near the project
finished, My boss want me to order the result by the sort blew:

the query likes '+content:aleden bob carray '

content 
date

order
alden bob carray ...  
2005/12/23

1
alden... alden ... bob... bob... carray...   2005/12/01
2
alden... alden ... bob... carray
2005/11/28

3
alden... carray 
2005/12/24

4
alden... bob 
2005/12/24

5

the meaning of the sort above is no matter how much the term match  
in the

field content, there will be met four satuations :3 matched,2
matched,1 matched,0 matched. In the 3 matched group, I need  
sorting

the result by it's date desc, and in the 2 matched group is same...

But I dont know HOW to get this results in Lucene...
Should I override the method of scoring? (tf(t in d) term in  
field,idf(t)

inverse doc frequence)
Could you give me some references about it?

I am really stucked, and Need You help!!


--
View this message in context: http://www.nabble.com/Need-help-for- 
ordering-results-by-specific-order-tf4101844.html#a11664583
Sent from the Lucene - Java Developer mailing list archive at  
Nabble.com.



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Assigned: (LUCENE-743) IndexReader.reopen()

2007-07-18 Thread Michael Busch (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Busch reassigned LUCENE-743:


Assignee: Michael Busch

 IndexReader.reopen()
 

 Key: LUCENE-743
 URL: https://issues.apache.org/jira/browse/LUCENE-743
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Otis Gospodnetic
Assignee: Michael Busch
Priority: Minor
 Attachments: IndexReaderUtils.java, MyMultiReader.java, 
 MySegmentReader.java


 This is Robert Engels' implementation of IndexReader.reopen() functionality, 
 as a set of 3 new classes (this was easier for him to implement, but should 
 probably be folded into the core, if this looks good).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: search quality - assessment improvements

2007-07-18 Thread Chris Hostetter

: Yes, actually:  1 / sqrt((1 - Slope) * Pivot + (Slope) * Doclen)

interesting ... it doesn't really seem like there is any direct
relationship between your average length (Pivot) and your Doclen --
on the surface when i first read your example it seemed like it has more
to do with the shifting of the curve then any intrinsic property of the
docs themselves and how their lengths related to the pivot.

in my mind the key question is how the length norms of docs are afected
when they are equal distant from the pivot (one high one low) ... in
theory you want the relative differnece in length norm to be the same
regardless of what the average length (ie: if the pivot is 100 the
lengthNorm ratio of a 90 word doc vs 110 word doc should be the same
as between a 900 word doc and a 1100 word doc if the pivot is 1000 right
.. and once you actually do the path, this equation seems to satisfy it.
(which really confused me for about 10 minutes, but i'll go with it)

However ... i still think that if you realy want a length norm that takes
into account the average length of the docs, you want one that rewards
docs for being near the average ... it doesn't seem to make a lot of sense
to me to say that a doc whose length is N% longer longer then the
average length is significantly worse the docs whose length is N% shorter
then the average length.





-Hoss


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-743) IndexReader.reopen()

2007-07-18 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513772
 ] 

Hoss Man commented on LUCENE-743:
-

i somehow missed seeing this issues before ... i don't really understand the 
details, but a few comments that come to mind...

1) this approach seems to assume that when reopening a MyMultiReader, the sub 
readers will all be MySegmentReaders .. assuming we generalize this to 
MultiReader/SegmentTeader, this wouldn't work in the case were people are using 
a MultiReader containing other MultiReaders ... not to mention the possibility 
of people who have written their own IndexReader implementations.
in generally we should probably try to approach reopening a reader as a 
recursive operation if possible where each type of reader is responsible for 
checking to see if it's underlying data has changed, if not return itself, if 
so return a new reader in it's place  (much like rewrite works for Queries)

2) there is no more commit lock correct? ... is this approach something that 
can still be valid using the current backoff/retry mechanism involved with 
opening segments?

 IndexReader.reopen()
 

 Key: LUCENE-743
 URL: https://issues.apache.org/jira/browse/LUCENE-743
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Otis Gospodnetic
Assignee: Michael Busch
Priority: Minor
 Attachments: IndexReaderUtils.java, MyMultiReader.java, 
 MySegmentReader.java


 This is Robert Engels' implementation of IndexReader.reopen() functionality, 
 as a set of 3 new classes (this was easier for him to implement, but should 
 probably be folded into the core, if this looks good).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2007-07-18 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513777
 ] 

Hoss Man commented on LUCENE-831:
-

thanks for the feedback mark ... i honestly haven't looked at this patch since 
the last time i updated the issue ... (i'm not sure if i've even thought about 
it once since then).  it's the kind of things that seemed really cool important 
at the time, but then ... you know, other things come up.

by all means, feel free to update it.

as i recall, the biggest thing about this patch that was really just pie in the 
sky and may not make any sense is the whole concept of merging and letting 
subreaders of MultiReader do their own caching which could then percolate up.  
I did it on the assumption that it would come in handy when reopening an 
IndexReader that contains several segments -- many of which may not have 
changed since the last time you opened the index.  but i really didn't have any 
idea how the whole reopening things would work.  i see now there is some reopen 
code in LUCENE-743, but frankly i'm still not sure wether the API makes sense, 
or is total overkill.

it might be better to gut the merging logic from the patch and add it later 
if/when there becomes a more real use case for it (the existing mergeData and 
isMergable methods could always be re-added to the abstract base classes if it 
turns out they do make sense)


 Complete overhaul of FieldCache API/Implementation
 --

 Key: LUCENE-831
 URL: https://issues.apache.org/jira/browse/LUCENE-831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Hoss Man
 Attachments: fieldcache-overhaul.diff, fieldcache-overhaul.diff


 Motivation:
 1) Complete overhaul the API/implementation of FieldCache type things...
 a) eliminate global static map keyed on IndexReader (thus
 eliminating synch block between completley independent IndexReaders)
 b) allow more customization of cache management (ie: use 
 expiration/replacement strategies, disk backed caches, etc)
 c) allow people to define custom cache data logic (ie: custom
 parsers, complex datatypes, etc... anything tied to a reader)
 d) allow people to inspect what's in a cache (list of CacheKeys) for
 an IndexReader so a new IndexReader can be likewise warmed. 
 e) Lend support for smarter cache management if/when
 IndexReader.reopen is added (merging of cached data from subReaders).
 2) Provide backwards compatibility to support existing FieldCache API with
 the new implementation, so there is no redundent caching as client code
 migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-868) Making Term Vectors more accessible

2007-07-18 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated LUCENE-868:
---

Attachment: LUCENE-868-v3.patch

Added the start of a Position based Mapper.  This would allow indexing directly 
(almost) into the vector by position.  Still needs a little more testing, but 
wanted to put it out there for others to see.

 Making Term Vectors more accessible
 ---

 Key: LUCENE-868
 URL: https://issues.apache.org/jira/browse/LUCENE-868
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Store
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: LUCENE-868-v2.patch, LUCENE-868-v3.patch


 One of the big issues with term vector usage is that the information is 
 loaded into parallel arrays as it is loaded, which are then often times 
 manipulated again to use in the application (for instance, they are sorted by 
 frequency).
 Adding a callback mechanism that allows the vector loading to be handled by 
 the application would make this a lot more efficient.
 I propose to add to IndexReader:
 abstract public void getTermFreqVector(int docNumber, String field, 
 TermVectorMapper mapper) throws IOException;
 and a similar one for the all fields version
 Where TermVectorMapper is an interface with a single method:
 void map(String term, int frequency, int offset, int position);
 The TermVectorReader will be modified to just call the TermVectorMapper.  The 
 existing getTermFreqVectors will be reimplemented to use an implementation of 
 TermVectorMapper that creates the parallel arrays.  Additionally, some simple 
 implementations that automatically sort vectors will also be created.
 This is my first draft of this API and is subject to change.  I hope to have 
 a patch soon.
 See 
 http://www.gossamer-threads.com/lists/lucene/java-user/48003?search_string=get%20the%20total%20term%20frequency;#48003
  for related information.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Resolved: (LUCENE-963) Add setters to Field to allow re-use of Field instances during indexing

2007-07-18 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-963.
---

   Resolution: Fixed
Fix Version/s: 2.3
Lucene Fields: [New, Patch Available]  (was: [Patch Available, New])

 Add setters to Field to allow re-use of Field instances during indexing
 ---

 Key: LUCENE-963
 URL: https://issues.apache.org/jira/browse/LUCENE-963
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 1.9, 2.0.0, 2.0.1, 2.1, 2.2, 2.3
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 2.3

 Attachments: LUCENE-963.patch


 If we add setters to Field it makes it possible to re-use Field
 instances during indexing which is a sizable performance gain for
 small documents.  See here for some discussion:
 http://www.gossamer-threads.com/lists/lucene/java-dev/51041

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Need help for ordering results by specific order

2007-07-18 Thread savageboy

Yes, Mathieu.
I just have the book Lucene in action by my hand, it is chinese language
version, it is about lucene1.4, hope it is not too old.
If I use SortComparatorSource, does it means it will be do the sort work at
the user query time?
Can I sort (maybe score it atindexing time)?



Mathieu Lecarme wrote:
 
 Have a look of the book Lucene in action, ch 6.1 : using custom  
 sort method
 
 SortComparatorSource might be your friend. Lucene selecting stuff,  
 and you sort, just like you wont.
 
 M.
 Le 18 juil. 07 à 10:29, savageboy a écrit :
 

 Hi,
 I am newer for lucene.
 I have a project for search engine by Lucene2.0. But near the project
 finished, My boss want me to order the result by the sort blew:

 the query likes '+content:aleden bob carray '

 content 
 date
 order
 alden bob carray ...  
 2005/12/23
 1
 alden... alden ... bob... bob... carray...   2005/12/01
 2
 alden... alden ... bob... carray
 2005/11/28
 3
 alden... carray 
 2005/12/24
 4
 alden... bob 
 2005/12/24
 5

 the meaning of the sort above is no matter how much the term match  
 in the
 field content, there will be met four satuations :3 matched,2
 matched,1 matched,0 matched. In the 3 matched group, I need  
 sorting
 the result by it's date desc, and in the 2 matched group is same...

 But I dont know HOW to get this results in Lucene...
 Should I override the method of scoring? (tf(t in d) term in  
 field,idf(t)
 inverse doc frequence)
 Could you give me some references about it?

 I am really stucked, and Need You help!!


 -- 
 View this message in context: http://www.nabble.com/Need-help-for- 
 ordering-results-by-specific-order-tf4101844.html#a11664583
 Sent from the Lucene - Java Developer mailing list archive at  
 Nabble.com.


 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]


 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 
 

-- 
View this message in context: 
http://www.nabble.com/Need-help-for-ordering-results-by-specific-order-tf4101844.html#a11681468
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: svn commit: r557445 - in /lucene/java/trunk: CHANGES.txt src/java/org/apache/lucene/document/Field.java src/test/org/apache/lucene/document/TestDocument.java

2007-07-18 Thread Doron Cohen
mikemccand wrote:
 +  /** Expert: change the value of this field.  This can be
 +   *  used during indexing to re-use a single Field instance
 +   *  to improve indexing speed. */
 +  public void setValue(String value) {

Would it make sense to warn from modifying the field
value before the doc was added?
Something like:
  Note that fields reuse means adding the same field instance
  to multiple documents. You cannot reuse a field instance
  for adding multiple fields to the same document.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]