date:20070718

Re: svn commit: r557445 - in /lucene/java/trunk: CHANGES.txt src/java/org/apache/lucene/document/Field.java src/test/org/apache/lucene/document/TestDocument.java

2007-07-18 Thread Doron Cohen

mikemccand wrote:
> +  /** Expert: change the value of this field.  This can be
> +   *  used during indexing to re-use a single Field instance
> +   *  to improve indexing speed. */
> +  public void setValue(String value) {

Would it make sense to warn from modifying the field
value before the doc was added?
Something like:
  Note that fields reuse means adding the same field instance
  to multiple documents. You cannot reuse a field instance
  for adding multiple fields to the same document."

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Need help for ordering results by specific order

2007-07-18 Thread savageboy


Yes, Mathieu.
I just have the book "Lucene in action" by my hand, it is chinese language
version, it is about lucene1.4, hope it is not too old.
If I use SortComparatorSource, does it means it will be do the sort work at
the user query time?
Can I sort (maybe score it atindexing time)?



Mathieu Lecarme wrote:
> 
> Have a look of the book "Lucene in action", ch 6.1 : "using custom  
> sort method"
> 
> SortComparatorSource might be your friend. Lucene selecting stuff,  
> and you sort, just like you wont.
> 
> M.
> Le 18 juil. 07 à 10:29, savageboy a écrit :
> 
>>
>> Hi,
>> I am newer for lucene.
>> I have a project for search engine by Lucene2.0. But near the project
>> finished, My boss want me to order the result by the sort blew:
>>
>> the query likes '+content:"aleden bob carray" '
>>
>> content 
>> date
>> order
>> "alden bob carray ... " 
>> 2005/12/23
>> 1
>> "alden... alden ... bob... bob... carray..."   2005/12/01
>> 2
>> "alden... alden ... bob... carray"
>> 2005/11/28
>> 3
>> "alden... carray" 
>> 2005/12/24
>> 4
>> "alden... bob" 
>> 2005/12/24
>> 5
>>
>> the meaning of the sort above is no matter how much the term match  
>> in the
>> field "content", there will be met four satuations :"3 matched","2
>> matched","1 matched","0 matched". In the "3 matched" group, I need  
>> sorting
>> the result by it's date desc, and in the "2 matched" group is same...
>>
>> But I dont know HOW to get this results in Lucene...
>> Should I override the method of scoring? (tf(t in d) > field>,idf(t)
>> )
>> Could you give me some references about it?
>>
>> I am really stucked, and Need You help!!
>>
>>
>> -- 
>> View this message in context: http://www.nabble.com/Need-help-for- 
>> ordering-results-by-specific-order-tf4101844.html#a11664583
>> Sent from the Lucene - Java Developer mailing list archive at  
>> Nabble.com.
>>
>>
>> -
>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>> For additional commands, e-mail: [EMAIL PROTECTED]
>>
>>
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Need-help-for-ordering-results-by-specific-order-tf4101844.html#a11681468
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Resolved: (LUCENE-963) Add setters to Field to allow re-use of Field instances during indexing

2007-07-18 Thread Michael McCandless (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-963.
---

   Resolution: Fixed
Fix Version/s: 2.3
Lucene Fields: [New, Patch Available]  (was: [Patch Available, New])

> Add setters to Field to allow re-use of Field instances during indexing
> ---
>
> Key: LUCENE-963
> URL: https://issues.apache.org/jira/browse/LUCENE-963
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: 1.9, 2.0.0, 2.0.1, 2.1, 2.2, 2.3
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 2.3
>
> Attachments: LUCENE-963.patch
>
>
> If we add setters to Field it makes it possible to re-use Field
> instances during indexing which is a sizable performance gain for
> small documents.  See here for some discussion:
> http://www.gossamer-threads.com/lists/lucene/java-dev/51041

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Updated: (LUCENE-868) Making Term Vectors more accessible

2007-07-18 Thread Grant Ingersoll (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated LUCENE-868:
---

Attachment: LUCENE-868-v3.patch

Added the start of a Position based Mapper.  This would allow indexing directly 
(almost) into the vector by position.  Still needs a little more testing, but 
wanted to put it out there for others to see.

> Making Term Vectors more accessible
> ---
>
> Key: LUCENE-868
> URL: https://issues.apache.org/jira/browse/LUCENE-868
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Store
>Reporter: Grant Ingersoll
>Assignee: Grant Ingersoll
>Priority: Minor
> Attachments: LUCENE-868-v2.patch, LUCENE-868-v3.patch
>
>
> One of the big issues with term vector usage is that the information is 
> loaded into parallel arrays as it is loaded, which are then often times 
> manipulated again to use in the application (for instance, they are sorted by 
> frequency).
> Adding a callback mechanism that allows the vector loading to be handled by 
> the application would make this a lot more efficient.
> I propose to add to IndexReader:
> abstract public void getTermFreqVector(int docNumber, String field, 
> TermVectorMapper mapper) throws IOException;
> and a similar one for the all fields version
> Where TermVectorMapper is an interface with a single method:
> void map(String term, int frequency, int offset, int position);
> The TermVectorReader will be modified to just call the TermVectorMapper.  The 
> existing getTermFreqVectors will be reimplemented to use an implementation of 
> TermVectorMapper that creates the parallel arrays.  Additionally, some simple 
> implementations that automatically sort vectors will also be created.
> This is my first draft of this API and is subject to change.  I hope to have 
> a patch soon.
> See 
> http://www.gossamer-threads.com/lists/lucene/java-user/48003?search_string=get%20the%20total%20term%20frequency;#48003
>  for related information.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2007-07-18 Thread Hoss Man (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513777
 ] 

Hoss Man commented on LUCENE-831:
-

thanks for the feedback mark ... i honestly haven't looked at this patch since 
the last time i updated the issue ... (i'm not sure if i've even thought about 
it once since then).  it's the kind of things that seemed really cool important 
at the time, but then ... you know, other things come up.

by all means, feel free to update it.

as i recall, the biggest thing about this patch that was really just pie in the 
sky and may not make any sense is the whole concept of merging and letting 
subreaders of MultiReader do their own caching which could then percolate up.  
I did it on the assumption that it would come in handy when reopening an 
IndexReader that contains several segments -- many of which may not have 
changed since the last time you opened the index.  but i really didn't have any 
idea how the whole reopening things would work.  i see now there is some reopen 
code in LUCENE-743, but frankly i'm still not sure wether the API makes sense, 
or is total overkill.

it might be better to gut the merging logic from the patch and add it later 
if/when there becomes a more real use case for it (the existing mergeData and 
isMergable methods could always be re-added to the abstract base classes if it 
turns out they do make sense)


> Complete overhaul of FieldCache API/Implementation
> --
>
> Key: LUCENE-831
> URL: https://issues.apache.org/jira/browse/LUCENE-831
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
> Attachments: fieldcache-overhaul.diff, fieldcache-overhaul.diff
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
> a) eliminate global static map keyed on IndexReader (thus
> eliminating synch block between completley independent IndexReaders)
> b) allow more customization of cache management (ie: use 
> expiration/replacement strategies, disk backed caches, etc)
> c) allow people to define custom cache data logic (ie: custom
> parsers, complex datatypes, etc... anything tied to a reader)
> d) allow people to inspect what's in a cache (list of CacheKeys) for
> an IndexReader so a new IndexReader can be likewise warmed. 
> e) Lend support for smarter cache management if/when
> IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
> the new implementation, so there is no redundent caching as client code
> migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-743) IndexReader.reopen()

2007-07-18 Thread Hoss Man (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513772
 ] 

Hoss Man commented on LUCENE-743:
-

i somehow missed seeing this issues before ... i don't really understand the 
details, but a few comments that come to mind...

1) this approach seems to assume that when reopening a MyMultiReader, the sub 
readers will all be MySegmentReaders .. assuming we generalize this to 
MultiReader/SegmentTeader, this wouldn't work in the case were people are using 
a MultiReader containing other MultiReaders ... not to mention the possibility 
of people who have written their own IndexReader implementations.
in generally we should probably try to approach reopening a reader as a 
recursive operation if possible where each type of reader is responsible for 
checking to see if it's underlying data has changed, if not return itself, if 
so return a new reader in it's place  (much like rewrite works for Queries)

2) there is no more commit lock correct? ... is this approach something that 
can still be valid using the current backoff/retry mechanism involved with 
opening segments?

> IndexReader.reopen()
> 
>
> Key: LUCENE-743
> URL: https://issues.apache.org/jira/browse/LUCENE-743
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Otis Gospodnetic
>Assignee: Michael Busch
>Priority: Minor
> Attachments: IndexReaderUtils.java, MyMultiReader.java, 
> MySegmentReader.java
>
>
> This is Robert Engels' implementation of IndexReader.reopen() functionality, 
> as a set of 3 new classes (this was easier for him to implement, but should 
> probably be folded into the core, if this looks good).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: search quality - assessment & improvements

2007-07-18 Thread Chris Hostetter


: The Similarity portion of the payload functionality could be used for
: scoring binary fields.

that can be used as a hook to decide how to evaluate an arbitrary byte[]
payload as a float for the purposes of scoring -- but it doesn't address
the problem of how do we write/read a payload which is not term specific.

Doron is looking for a way to encode in the index arbitrary statistics
which are not specific to a single term instance (or even to a specific
document) ... mainly the average length of a field per doc.  what we were
speculating on is the notion of a generic API for writing an arbitrary
"payloads" wih each segment, and registering a PayloadMerger hook that
would give the IndexWriter a method to call when it came time to merge
segments (so it would know how to merge the generic segment payload data).

then Doron could do something like...

   AverageLengthPayloadMerger p = AverageLengthPayloadMerger();
   IndexWriter w = ...
   w.setPayloadMerger(p);
   foreach (input) {
  Document d = ...
  p.incrStats(computeLength(d))
   }
   w.flush();

...if a merge happens, IndexWriter would call a method on the
PayloadMerger giving it the payloads of hte segments being merged, and it
would already know about the stats it was recording from the current
segment, so it could then compute the new stats for the new segment and
return them to the IndexWriter to be written to disk.  when the flush
happens, the IndexWRiter would also call a method on the PayloadMerger
which would do roughly the same thing (except there is no merging since
we're just finsihing off a segment.

the same PayloadMerger would be used in the event of an optimize.

when opening an IndexReader, some new reader.getPayload() method would
recursively return all the generic payloads of all the existing segments,
and Doron could quickly calculate the average length for all docs to use
in his Similarity.


(NOTE: i'm really not very familiar with all the merge policy stuff, i'm
sure i'm glossing over a lot of details that would make this a lot more
complicated then the psuedo-code i'm imaginging)


-Hoss


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: search quality - assessment & improvements

2007-07-18 Thread Chris Hostetter


: Yes, actually:  1 / sqrt((1 - Slope) * Pivot + (Slope) * Doclen)

interesting ... it doesn't really seem like there is any direct
relationship between your average length (Pivot) and your Doclen --
on the surface when i first read your example it seemed like it has more
to do with the shifting of the curve then any intrinsic property of the
docs themselves and how their lengths related to the pivot.

in my mind the key question is how the length norms of docs are afected
when they are equal distant from the pivot (one high one low) ... in
theory you want the relative differnece in length norm to be the same
regardless of what the average length (ie: if the pivot is 100 the
lengthNorm ratio of a 90 word doc vs 110 word doc should be the same
as between a 900 word doc and a 1100 word doc if the pivot is 1000 right
.. and once you actually do the path, this equation seems to satisfy it.
(which really confused me for about 10 minutes, but i'll go with it)

However ... i still think that if you realy want a length norm that takes
into account the average length of the docs, you want one that rewards
docs for being near the average ... it doesn't seem to make a lot of sense
to me to say that a doc whose length is N% longer longer then the
average length is significantly worse the docs whose length is N% shorter
then the average length.





-Hoss


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Assigned: (LUCENE-743) IndexReader.reopen()

2007-07-18 Thread Michael Busch (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Busch reassigned LUCENE-743:


Assignee: Michael Busch

> IndexReader.reopen()
> 
>
> Key: LUCENE-743
> URL: https://issues.apache.org/jira/browse/LUCENE-743
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Otis Gospodnetic
>Assignee: Michael Busch
>Priority: Minor
> Attachments: IndexReaderUtils.java, MyMultiReader.java, 
> MySegmentReader.java
>
>
> This is Robert Engels' implementation of IndexReader.reopen() functionality, 
> as a set of 3 new classes (this was easier for him to implement, but should 
> probably be folded into the core, if this looks good).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Need help for ordering results by specific order

2007-07-18 Thread Mathieu Lecarme

Have a look of the book "Lucene in action", ch 6.1 : "using custom  
sort method"


SortComparatorSource might be your friend. Lucene selecting stuff,  
and you sort, just like you wont.


M.
Le 18 juil. 07 à 10:29, savageboy a écrit :



Hi,
I am newer for lucene.
I have a project for search engine by Lucene2.0. But near the project
finished, My boss want me to order the result by the sort blew:

the query likes '+content:"aleden bob carray" '

content 
date

order
"alden bob carray ... " 
2005/12/23

1
"alden... alden ... bob... bob... carray..."   2005/12/01
2
"alden... alden ... bob... carray"
2005/11/28

3
"alden... carray" 
2005/12/24

4
"alden... bob" 
2005/12/24

5

the meaning of the sort above is no matter how much the term match  
in the

field "content", there will be met four satuations :"3 matched","2
matched","1 matched","0 matched". In the "3 matched" group, I need  
sorting

the result by it's date desc, and in the "2 matched" group is same...

But I dont know HOW to get this results in Lucene...
Should I override the method of scoring? (tf(t in d) field>,idf(t)

)
Could you give me some references about it?

I am really stucked, and Need You help!!


--
View this message in context: http://www.nabble.com/Need-help-for- 
ordering-results-by-specific-order-tf4101844.html#a11664583
Sent from the Lucene - Java Developer mailing list archive at  
Nabble.com.



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[EMAIL PROTECTED]: Project lucene-java (in module lucene-java) failed

2007-07-18 Thread Jason van Zyl

To whom it may engage...

This is an automated request, but not an unsolicited one. For 
more information please visit http://gump.apache.org/nagged.html, 
and/or contact the folk at [EMAIL PROTECTED]

Project lucene-java has an issue affecting its community integration.
This issue affects 3 projects.
The current state of this project is 'Failed', with reason 'Build Failed'.
For reference only, the following projects are affected by this:
- eyebrowse :  Web-based mail archive browsing
- jakarta-lucene :  Java Based Search Engine
- lucene-java :  Java Based Search Engine


Full details are available at:
http://vmgump.apache.org/gump/public/lucene-java/lucene-java/index.html

That said, some information snippets are provided here.

The following annotations (debug/informational/warning/error messages) were 
provided:
 -DEBUG- Sole output [lucene-core-18072007.jar] identifier set to project name
 -DEBUG- Dependency on javacc exists, no need to add for property javacc.home.
 -INFO- Failed with reason build failed
 -INFO- Failed to extract fallback artifacts from Gump Repository



The following work was performed:
http://vmgump.apache.org/gump/public/lucene-java/lucene-java/gump_work/build_lucene-java_lucene-java.html
Work Name: build_lucene-java_lucene-java (Type: Build)
Work ended in a state of : Failed
Elapsed: 1 min 31 secs
Command Line: /usr/lib/jvm/java-1.5.0-sun/bin/java -Djava.awt.headless=true 
-Xbootclasspath/p:/srv/gump/public/workspace/xml-commons/java/external/build/xml-apis.jar:/srv/gump/public/workspace/xml-xerces2/build/xercesImpl.jar
 org.apache.tools.ant.Main -Dgump.merge=/srv/gump/public/gump/work/merge.xml 
-Dbuild.sysclasspath=only -Dversion=18072007 
-Djavacc.home=/srv/gump/packages/javacc-3.1 package 
[Working Directory: /srv/gump/public/workspace/lucene-java]
CLASSPATH: 
/usr/lib/jvm/java-1.5.0-sun/lib/tools.jar:/srv/gump/public/workspace/lucene-java/build/classes/java:/srv/gump/public/workspace/lucene-java/build/classes/demo:/srv/gump/public/workspace/lucene-java/build/classes/test:/srv/gump/public/workspace/lucene-java/contrib/db/bdb/lib/db-4.3.29.jar:/srv/gump/public/workspace/lucene-java/contrib/gdata-server/lib/gdata-client-1.0.jar:/srv/gump/public/workspace/lucene-java/build/contrib/analyzers/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/ant/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/benchmark/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/db/bdb/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/db/bdb-je/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/gdata-server/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/highlighter/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/javascript/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/lucli/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/memory/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/queries/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/regex/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/similarity/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/snowball/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/spellchecker/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/surround/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/swing/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/wordnet/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/xml-query-parser/classes/java:/srv/gump/public/workspace/ant/dist/lib/ant-jmf.jar:/srv/gump/public/workspace/ant/dist/lib/ant-swing.jar:/srv/gump/public/workspace/ant/dist/lib/ant-apache-resolver.jar:/srv/gump/public/workspace/ant/dist/lib/ant-trax.jar:/srv/gump/public/workspace/ant/dist/lib/ant-junit.jar:/srv/gump/public/workspace/ant/dist/lib/ant-launcher.jar:/srv/gump/public/workspace/ant/dist/lib/ant-nodeps.jar:/srv/gump/public/workspace/ant/dist/lib/ant.jar:/srv/gump/packages/junit3.8.1/junit.jar:/srv/gump/public/workspace/xml-commons/java/build/resolver.jar:/srv/gump/packages/je-1.7.1/lib/je.jar:/srv/gump/public/workspace/jakarta-commons/digester/dist/commons-digester.jar:/srv/gump/public/workspace/jakarta-regexp/build/jakarta-regexp-18072007.jar:/srv/gump/packages/javacc-3.1/bin/lib/javacc.jar:/srv/gump/public/workspace/jline/target/jline-0.9.92-SNAPSHOT.jar:/srv/gump/packages/jtidy-04aug2000r7-dev/build/Tidy.jar:/srv/gump/public/workspace/junit/dist/junit-18072007.jar:/srv/gump/public/workspace/xml-commons/java/external/build/xml-apis-ext.jar:/srv/gump/public/workspace/jakarta-commons/logging/target/commons-logging-18072007.jar:/srv/gump/public/workspace/jakarta-commons/logging/target/commons-logging-api-18072007.jar:/srv/gump/public/workspace/jakarta-servletapi-5/jsr154/dist/lib/servlet-api.jar:/srv/gump/packages/nekohtml-0.9.5/nekohtml.jar
---

[EMAIL PROTECTED]: Project lucene-java (in module lucene-java) failed

2007-07-18 Thread Jason van Zyl

To whom it may engage...

This is an automated request, but not an unsolicited one. For 
more information please visit http://gump.apache.org/nagged.html, 
and/or contact the folk at [EMAIL PROTECTED]

Project lucene-java has an issue affecting its community integration.
This issue affects 3 projects.
The current state of this project is 'Failed', with reason 'Build Failed'.
For reference only, the following projects are affected by this:
- eyebrowse :  Web-based mail archive browsing
- jakarta-lucene :  Java Based Search Engine
- lucene-java :  Java Based Search Engine


Full details are available at:
http://vmgump.apache.org/gump/public/lucene-java/lucene-java/index.html

That said, some information snippets are provided here.

The following annotations (debug/informational/warning/error messages) were 
provided:
 -DEBUG- Sole output [lucene-core-18072007.jar] identifier set to project name
 -DEBUG- Dependency on javacc exists, no need to add for property javacc.home.
 -INFO- Failed with reason build failed
 -INFO- Failed to extract fallback artifacts from Gump Repository



The following work was performed:
http://vmgump.apache.org/gump/public/lucene-java/lucene-java/gump_work/build_lucene-java_lucene-java.html
Work Name: build_lucene-java_lucene-java (Type: Build)
Work ended in a state of : Failed
Elapsed: 1 min 31 secs
Command Line: /usr/lib/jvm/java-1.5.0-sun/bin/java -Djava.awt.headless=true 
-Xbootclasspath/p:/srv/gump/public/workspace/xml-commons/java/external/build/xml-apis.jar:/srv/gump/public/workspace/xml-xerces2/build/xercesImpl.jar
 org.apache.tools.ant.Main -Dgump.merge=/srv/gump/public/gump/work/merge.xml 
-Dbuild.sysclasspath=only -Dversion=18072007 
-Djavacc.home=/srv/gump/packages/javacc-3.1 package 
[Working Directory: /srv/gump/public/workspace/lucene-java]
CLASSPATH: 
/usr/lib/jvm/java-1.5.0-sun/lib/tools.jar:/srv/gump/public/workspace/lucene-java/build/classes/java:/srv/gump/public/workspace/lucene-java/build/classes/demo:/srv/gump/public/workspace/lucene-java/build/classes/test:/srv/gump/public/workspace/lucene-java/contrib/db/bdb/lib/db-4.3.29.jar:/srv/gump/public/workspace/lucene-java/contrib/gdata-server/lib/gdata-client-1.0.jar:/srv/gump/public/workspace/lucene-java/build/contrib/analyzers/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/ant/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/benchmark/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/db/bdb/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/db/bdb-je/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/gdata-server/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/highlighter/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/javascript/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/lucli/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/memory/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/queries/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/regex/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/similarity/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/snowball/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/spellchecker/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/surround/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/swing/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/wordnet/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/xml-query-parser/classes/java:/srv/gump/public/workspace/ant/dist/lib/ant-jmf.jar:/srv/gump/public/workspace/ant/dist/lib/ant-swing.jar:/srv/gump/public/workspace/ant/dist/lib/ant-apache-resolver.jar:/srv/gump/public/workspace/ant/dist/lib/ant-trax.jar:/srv/gump/public/workspace/ant/dist/lib/ant-junit.jar:/srv/gump/public/workspace/ant/dist/lib/ant-launcher.jar:/srv/gump/public/workspace/ant/dist/lib/ant-nodeps.jar:/srv/gump/public/workspace/ant/dist/lib/ant.jar:/srv/gump/packages/junit3.8.1/junit.jar:/srv/gump/public/workspace/xml-commons/java/build/resolver.jar:/srv/gump/packages/je-1.7.1/lib/je.jar:/srv/gump/public/workspace/jakarta-commons/digester/dist/commons-digester.jar:/srv/gump/public/workspace/jakarta-regexp/build/jakarta-regexp-18072007.jar:/srv/gump/packages/javacc-3.1/bin/lib/javacc.jar:/srv/gump/public/workspace/jline/target/jline-0.9.92-SNAPSHOT.jar:/srv/gump/packages/jtidy-04aug2000r7-dev/build/Tidy.jar:/srv/gump/public/workspace/junit/dist/junit-18072007.jar:/srv/gump/public/workspace/xml-commons/java/external/build/xml-apis-ext.jar:/srv/gump/public/workspace/jakarta-commons/logging/target/commons-logging-18072007.jar:/srv/gump/public/workspace/jakarta-commons/logging/target/commons-logging-api-18072007.jar:/srv/gump/public/workspace/jakarta-servletapi-5/jsr154/dist/lib/servlet-api.jar:/srv/gump/packages/nekohtml-0.9.5/nekohtml.jar
---

[jira] Commented: (LUCENE-579) TermPositionVector offsets incorrect if indexed field has multiple values and one ends with non-term chars

2007-07-18 Thread Grant Ingersoll (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513604
 ] 

Grant Ingersoll commented on LUCENE-579:


Can you provide a unit test for this?

> TermPositionVector offsets incorrect if indexed field has multiple values and 
> one ends with non-term chars
> --
>
> Key: LUCENE-579
> URL: https://issues.apache.org/jira/browse/LUCENE-579
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Analysis
>Affects Versions: 1.9
>Reporter: Keiron McCammon
>
> If you add multiple values for a field with term vector positions and offsets 
> enabled and one of the values ends with a non-term then the offsets for the 
> terms from subsequent values are wrong. For example (note the '.' in the 
> first value):
> IndexWriter writer = new IndexWriter(directory, new SimpleAnalyzer(), 
> true);
> Document doc = new Document();
> doc.add(new Field("", "one.", Field.Store.YES, Field.Index.TOKENIZED, 
> Field.TermVector.WITH_POSITIONS_OFFSETS));
> doc.add(new Field("", "two", Field.Store.YES, Field.Index.TOKENIZED, 
> Field.TermVector.WITH_POSITIONS_OFFSETS));
> writer.addDocument(doc);
> writer.optimize();
> writer.close();
> IndexSearcher searcher = new IndexSearcher(directory);
> Hits hits = searcher.search(new MatchAllDocsQuery());
> Highlighter highlighter = new Highlighter(new SimpleHTMLFormatter(),
> new QueryScorer(new TermQuery(new Term("", "camera")), 
> searcher.getIndexReader(), ""));
> for (int i = 0; i < hits.length(); ++i) {
> TermPositionVector v = (TermPositionVector) 
> searcher.getIndexReader().getTermFreqVector(
> hits.id(i), "");
> StringBuilder str = new StringBuilder();
> for (String s : hits.doc(i).getValues("")) {
> str.append(s);
> str.append(" ");
> }
> 
> System.out.println(str);
> TokenStream tokenStream = TokenSources.getTokenStream(v, false);
> String[] terms = v.getTerms();
> int[] freq = v.getTermFrequencies();
> for (int j = 0; j < terms.length; ++j) {
> System.out.print(terms[j] + ":" + freq[j] + ":");
> 
> int[] pos = v.getTermPositions(j);
> 
> System.out.print(Arrays.toString(pos));
> 
> TermVectorOffsetInfo[] offset = v.getOffsets(j); 
> for (int k = 0; k < offset.length; ++k) {
> 
> System.out.print(":");
> 
> System.out.print(str.substring(offset[k].getStartOffset(), 
> offset[k].getEndOffset()));
> }
> 
> System.out.println();
> }
> }
> searcher.close();
> If I run the above I get:
> one:1:[0]:one
> two:1:[1]: tw
> Note that the offsets for the second term are off by 1.
> It seems to be that the length of the value that is stored is not taken into 
> account when calculating the offset for the fields of the next value.
> I noticed ths problem when using the highlight contrib package which can make 
> use of term vectors for highlighting. I also noticed that the offset for the 
> second string is +1 the end of the previous value, so when concatenating the 
> fields values to pass to the hgighlighter I add to append a ' ' character 
> after each string...which is quite useful, but not documented anywhere.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: binary at the front of CHANGES.txt

2007-07-18 Thread Yonik Seeley


On 7/18/07, DM Smith <[EMAIL PROTECTED]> wrote:

But, the junk at the beginning of the file was C2 BF. Not at all sure
what this would be.


As I said in my first reply, it *was* a UTF-8 BOM (look back at older
revisions), but I think one of my edits mangled it (I don't recall
what editor I used).

-Yonik

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Assigned: (LUCENE-961) RegexCapabilities is not Serializable

2007-07-18 Thread Erik Hatcher (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Hatcher reassigned LUCENE-961:
---

Assignee: Erik Hatcher

> RegexCapabilities is not Serializable
> -
>
> Key: LUCENE-961
> URL: https://issues.apache.org/jira/browse/LUCENE-961
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: QueryParser
>Affects Versions: 2.2
>Reporter: Konrad Rokicki
>Assignee: Erik Hatcher
>Priority: Minor
>
> The class RegexQuery is marked Serializable by its super class, but it 
> contains a RegexCapabilities which is not Serializable. Thus attempting to 
> serialize the query results in an exception. 
> Making RegexCapabilities serializable should be no problem since its 
> subclasses contain only serializable classes (java.util.regex.Pattern and 
> org.apache.regexp.RE).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Resolved: (LUCENE-960) SpanQueryFilter addition

2007-07-18 Thread Grant Ingersoll (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll resolved LUCENE-960.


   Resolution: Fixed
Lucene Fields: [Patch Available]  (was: [Patch Available, New])

I committed this on revision 557105.  Leaving it open for a few more days.  
This constitutes all new classes, so no back-compatibility issues, etc.

> SpanQueryFilter addition
> 
>
> Key: LUCENE-960
> URL: https://issues.apache.org/jira/browse/LUCENE-960
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Grant Ingersoll
>Assignee: Grant Ingersoll
>Priority: Minor
> Attachments: SpanQueryFilter.patch
>
>
> Similar to the QueryFilter (or whatever it is called now) the SpanQueryFilter 
> is a regular Lucene Filter, but it also can return Spans-like information.  
> This is useful if you not only want to filter based on a Query, but you then 
> want to be able to compare how a given match from a new query compared to the 
> positions of the filtered SpanQuery.  Patch to come shortly also contains a 
> caching mechanism for the SpanQueryFilter

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: binary at the front of CHANGES.txt

2007-07-18 Thread DM Smith



On Jul 17, 2007, at 8:40 PM, Yonik Seeley wrote:


On 7/17/07, DM Smith <[EMAIL PROTECTED]> wrote:

According to the UTF-8 spec \uFEFF is not a BOM. In UTF-8 the byte
order is always the same.


But there is a BOM for UTF-8 (even though there is no endian
component, it does serve as a marker indicating the text file is
unicode text encoded in UTF-8).

http://unicode.org/faq/utf_bom.html#29


This is all rather academic at this point as you have fixed the problem.

I stand corrected \uFEFF (the code point) is the BOM for all UTF,  
with its representation differing by encoding. But UTF-8 byte order  
is always the same, regardless of the presence of the BOM.


According to the Unicode 5.0 Standard book, Chapter 13, Section 13.6,  
the byte sequence of the BOM for UTF-8 is EF BB BF (3 bytes) and for  
UTF-16 it is FE FF or FF FE (2 bytes). It appears that the byte  
sequence is unique for each unicode representation.


See http://www.unicode.org/unicode/uni2book/ch13.pdf#BOM

I frequently will see FE FF at the beginning of UTF-8 files. I have  
only seen MS editors add this. This is wrong for UTF-8 files. I was  
assuming that this was the junk at the beginning of the file.


But, the junk at the beginning of the file was C2 BF. Not at all sure  
what this would be.







-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[EMAIL PROTECTED]: Project lucene-java (in module lucene-java) failed

2007-07-18 Thread Jason van Zyl

To whom it may engage...

This is an automated request, but not an unsolicited one. For 
more information please visit http://gump.apache.org/nagged.html, 
and/or contact the folk at [EMAIL PROTECTED]

Project lucene-java has an issue affecting its community integration.
This issue affects 3 projects,
 and has been outstanding for 3 runs.
The current state of this project is 'Failed', with reason 'Build Failed'.
For reference only, the following projects are affected by this:
- eyebrowse :  Web-based mail archive browsing
- jakarta-lucene :  Java Based Search Engine
- lucene-java :  Java Based Search Engine


Full details are available at:
http://vmgump.apache.org/gump/public/lucene-java/lucene-java/index.html

That said, some information snippets are provided here.

The following annotations (debug/informational/warning/error messages) were 
provided:
 -DEBUG- Sole output [lucene-core-18072007.jar] identifier set to project name
 -DEBUG- Dependency on javacc exists, no need to add for property javacc.home.
 -INFO- Failed with reason build failed
 -INFO- Failed to extract fallback artifacts from Gump Repository



The following work was performed:
http://vmgump.apache.org/gump/public/lucene-java/lucene-java/gump_work/build_lucene-java_lucene-java.html
Work Name: build_lucene-java_lucene-java (Type: Build)
Work ended in a state of : Failed
Elapsed: 33 secs
Command Line: /usr/lib/jvm/java-1.5.0-sun/bin/java -Djava.awt.headless=true 
-Xbootclasspath/p:/srv/gump/public/workspace/xml-commons/java/external/build/xml-apis.jar:/srv/gump/public/workspace/xml-xerces2/build/xercesImpl.jar
 org.apache.tools.ant.Main -Dgump.merge=/srv/gump/public/gump/work/merge.xml 
-Dbuild.sysclasspath=only -Dversion=18072007 
-Djavacc.home=/srv/gump/packages/javacc-3.1 package 
[Working Directory: /srv/gump/public/workspace/lucene-java]
CLASSPATH: 
/usr/lib/jvm/java-1.5.0-sun/lib/tools.jar:/srv/gump/public/workspace/lucene-java/build/classes/java:/srv/gump/public/workspace/lucene-java/build/classes/demo:/srv/gump/public/workspace/lucene-java/build/classes/test:/srv/gump/public/workspace/lucene-java/contrib/db/bdb/lib/db-4.3.29.jar:/srv/gump/public/workspace/lucene-java/contrib/gdata-server/lib/gdata-client-1.0.jar:/srv/gump/public/workspace/lucene-java/build/contrib/analyzers/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/ant/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/benchmark/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/db/bdb/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/db/bdb-je/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/gdata-server/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/highlighter/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/javascript/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/lucli/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/memory/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/queries/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/regex/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/similarity/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/snowball/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/spellchecker/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/surround/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/swing/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/wordnet/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/xml-query-parser/classes/java:/srv/gump/public/workspace/ant/dist/lib/ant-jmf.jar:/srv/gump/public/workspace/ant/dist/lib/ant-swing.jar:/srv/gump/public/workspace/ant/dist/lib/ant-apache-resolver.jar:/srv/gump/public/workspace/ant/dist/lib/ant-trax.jar:/srv/gump/public/workspace/ant/dist/lib/ant-junit.jar:/srv/gump/public/workspace/ant/dist/lib/ant-launcher.jar:/srv/gump/public/workspace/ant/dist/lib/ant-nodeps.jar:/srv/gump/public/workspace/ant/dist/lib/ant.jar:/srv/gump/packages/junit3.8.1/junit.jar:/srv/gump/public/workspace/xml-commons/java/build/resolver.jar:/srv/gump/packages/je-1.7.1/lib/je.jar:/srv/gump/public/workspace/jakarta-commons/digester/dist/commons-digester.jar:/srv/gump/public/workspace/jakarta-regexp/build/jakarta-regexp-18072007.jar:/srv/gump/packages/javacc-3.1/bin/lib/javacc.jar:/srv/gump/public/workspace/jline/target/jline-0.9.92-SNAPSHOT.jar:/srv/gump/packages/jtidy-04aug2000r7-dev/build/Tidy.jar:/srv/gump/public/workspace/junit/dist/junit-18072007.jar:/srv/gump/public/workspace/xml-commons/java/external/build/xml-apis-ext.jar:/srv/gump/public/workspace/jakarta-commons/logging/target/commons-logging-18072007.jar:/srv/gump/public/workspace/jakarta-commons/logging/target/commons-logging-api-18072007.jar:/srv/gump/public/workspace/jakarta-servletapi-5/jsr154/dist/lib/servlet-api.jar:/srv/gump/packages/nek

[EMAIL PROTECTED]: Project lucene-java (in module lucene-java) failed

2007-07-18 Thread Jason van Zyl

To whom it may engage...

This is an automated request, but not an unsolicited one. For 
more information please visit http://gump.apache.org/nagged.html, 
and/or contact the folk at [EMAIL PROTECTED]

Project lucene-java has an issue affecting its community integration.
This issue affects 3 projects,
 and has been outstanding for 3 runs.
The current state of this project is 'Failed', with reason 'Build Failed'.
For reference only, the following projects are affected by this:
- eyebrowse :  Web-based mail archive browsing
- jakarta-lucene :  Java Based Search Engine
- lucene-java :  Java Based Search Engine


Full details are available at:
http://vmgump.apache.org/gump/public/lucene-java/lucene-java/index.html

That said, some information snippets are provided here.

The following annotations (debug/informational/warning/error messages) were 
provided:
 -DEBUG- Sole output [lucene-core-18072007.jar] identifier set to project name
 -DEBUG- Dependency on javacc exists, no need to add for property javacc.home.
 -INFO- Failed with reason build failed
 -INFO- Failed to extract fallback artifacts from Gump Repository



The following work was performed:
http://vmgump.apache.org/gump/public/lucene-java/lucene-java/gump_work/build_lucene-java_lucene-java.html
Work Name: build_lucene-java_lucene-java (Type: Build)
Work ended in a state of : Failed
Elapsed: 33 secs
Command Line: /usr/lib/jvm/java-1.5.0-sun/bin/java -Djava.awt.headless=true 
-Xbootclasspath/p:/srv/gump/public/workspace/xml-commons/java/external/build/xml-apis.jar:/srv/gump/public/workspace/xml-xerces2/build/xercesImpl.jar
 org.apache.tools.ant.Main -Dgump.merge=/srv/gump/public/gump/work/merge.xml 
-Dbuild.sysclasspath=only -Dversion=18072007 
-Djavacc.home=/srv/gump/packages/javacc-3.1 package 
[Working Directory: /srv/gump/public/workspace/lucene-java]
CLASSPATH: 
/usr/lib/jvm/java-1.5.0-sun/lib/tools.jar:/srv/gump/public/workspace/lucene-java/build/classes/java:/srv/gump/public/workspace/lucene-java/build/classes/demo:/srv/gump/public/workspace/lucene-java/build/classes/test:/srv/gump/public/workspace/lucene-java/contrib/db/bdb/lib/db-4.3.29.jar:/srv/gump/public/workspace/lucene-java/contrib/gdata-server/lib/gdata-client-1.0.jar:/srv/gump/public/workspace/lucene-java/build/contrib/analyzers/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/ant/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/benchmark/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/db/bdb/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/db/bdb-je/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/gdata-server/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/highlighter/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/javascript/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/lucli/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/memory/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/queries/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/regex/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/similarity/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/snowball/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/spellchecker/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/surround/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/swing/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/wordnet/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/xml-query-parser/classes/java:/srv/gump/public/workspace/ant/dist/lib/ant-jmf.jar:/srv/gump/public/workspace/ant/dist/lib/ant-swing.jar:/srv/gump/public/workspace/ant/dist/lib/ant-apache-resolver.jar:/srv/gump/public/workspace/ant/dist/lib/ant-trax.jar:/srv/gump/public/workspace/ant/dist/lib/ant-junit.jar:/srv/gump/public/workspace/ant/dist/lib/ant-launcher.jar:/srv/gump/public/workspace/ant/dist/lib/ant-nodeps.jar:/srv/gump/public/workspace/ant/dist/lib/ant.jar:/srv/gump/packages/junit3.8.1/junit.jar:/srv/gump/public/workspace/xml-commons/java/build/resolver.jar:/srv/gump/packages/je-1.7.1/lib/je.jar:/srv/gump/public/workspace/jakarta-commons/digester/dist/commons-digester.jar:/srv/gump/public/workspace/jakarta-regexp/build/jakarta-regexp-18072007.jar:/srv/gump/packages/javacc-3.1/bin/lib/javacc.jar:/srv/gump/public/workspace/jline/target/jline-0.9.92-SNAPSHOT.jar:/srv/gump/packages/jtidy-04aug2000r7-dev/build/Tidy.jar:/srv/gump/public/workspace/junit/dist/junit-18072007.jar:/srv/gump/public/workspace/xml-commons/java/external/build/xml-apis-ext.jar:/srv/gump/public/workspace/jakarta-commons/logging/target/commons-logging-18072007.jar:/srv/gump/public/workspace/jakarta-commons/logging/target/commons-logging-api-18072007.jar:/srv/gump/public/workspace/jakarta-servletapi-5/jsr154/dist/lib/servlet-api.jar:/srv/gump/packages/nek

Need help for ordering results by specific order

2007-07-18 Thread savageboy


Hi,
I am newer for lucene.
I have a project for search engine by Lucene2.0. But near the project
finished, My boss want me to order the result by the sort blew:

the query likes '+content:"aleden bob carray" '

contentdate 
 
order
"alden bob carray ... "2005/12/23   
 
1
"alden... alden ... bob... bob... carray..."   2005/12/01   
 
2
"alden... alden ... bob... carray"   2005/11/28 
   
3
"alden... carray"2005/12/24 
   
4
"alden... bob"2005/12/24

5

the meaning of the sort above is no matter how much the term match in the
field "content", there will be met four satuations :"3 matched","2
matched","1 matched","0 matched". In the "3 matched" group, I need sorting
the result by it's date desc, and in the "2 matched" group is same...

But I dont know HOW to get this results in Lucene...
Should I override the method of scoring? (tf(t in d) ,idf(t)
)
Could you give me some references about it?

I am really stucked, and Need You help!!


-- 
View this message in context: 
http://www.nabble.com/Need-help-for-ordering-results-by-specific-order-tf4101844.html#a11664583
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: svn commit: r557445 - in /lucene/java/trunk: CHANGES.txt src/java/org/apache/lucene/document/Field.java src/test/org/apache/lucene/document/TestDocument.java

Re: Need help for ordering results by specific order

[jira] Resolved: (LUCENE-963) Add setters to Field to allow re-use of Field instances during indexing

[jira] Updated: (LUCENE-868) Making Term Vectors more accessible

[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

[jira] Commented: (LUCENE-743) IndexReader.reopen()

Re: search quality - assessment & improvements

Re: search quality - assessment & improvements

[jira] Assigned: (LUCENE-743) IndexReader.reopen()

Re: Need help for ordering results by specific order

[EMAIL PROTECTED]: Project lucene-java (in module lucene-java) failed

[EMAIL PROTECTED]: Project lucene-java (in module lucene-java) failed

[jira] Commented: (LUCENE-579) TermPositionVector offsets incorrect if indexed field has multiple values and one ends with non-term chars

Re: binary at the front of CHANGES.txt

[jira] Assigned: (LUCENE-961) RegexCapabilities is not Serializable

[jira] Resolved: (LUCENE-960) SpanQueryFilter addition

Re: binary at the front of CHANGES.txt

[EMAIL PROTECTED]: Project lucene-java (in module lucene-java) failed

[EMAIL PROTECTED]: Project lucene-java (in module lucene-java) failed

Need help for ordering results by specific order

20 matches

Site Navigation

Mail list logo

Footer information