OK, I can wait

On Jul 10, 2007, at 9:45 AM, Karl Wettin (JIRA) wrote:


[ https://issues.apache.org/jira/browse/LUCENE-868? page=com.atlassian.jira.plugin.system.issuetabpanels:comment- tabpanel#action_12511442 ]

Karl Wettin commented on LUCENE-868:
------------------------------------

Grant Ingersoll - [09/Jul/07 02:05 PM ]
Anyone have any comments on this approach for Term Vectors?

I'm not sure if the patch still applies to trunk, but I will update it
and commit on Wednesday or Thursday unless I hear other comments.

I can give the code an overview in the weekend if you want. I'll defintely be using this stuff when I get back from vacation.


Making Term Vectors more accessible
-----------------------------------

                Key: LUCENE-868
                URL: https://issues.apache.org/jira/browse/LUCENE-868
            Project: Lucene - Java
         Issue Type: New Feature
         Components: Store
           Reporter: Grant Ingersoll
           Assignee: Grant Ingersoll
           Priority: Minor
        Attachments: LUCENE-868-v1.patch


One of the big issues with term vector usage is that the information is loaded into parallel arrays as it is loaded, which are then often times manipulated again to use in the application (for instance, they are sorted by frequency). Adding a callback mechanism that allows the vector loading to be handled by the application would make this a lot more efficient.
I propose to add to IndexReader:
abstract public void getTermFreqVector(int docNumber, String field, TermVectorMapper mapper) throws IOException;
and a similar one for the all fields version
Where TermVectorMapper is an interface with a single method:
void map(String term, int frequency, int offset, int position);
The TermVectorReader will be modified to just call the TermVectorMapper. The existing getTermFreqVectors will be reimplemented to use an implementation of TermVectorMapper that creates the parallel arrays. Additionally, some simple implementations that automatically sort vectors will also be created. This is my first draft of this API and is subject to change. I hope to have a patch soon. See http://www.gossamer-threads.com/lists/lucene/java-user/48003? search_string=get%20the%20total%20term%20frequency;#48003 for related information.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


------------------------------------------------------
Grant Ingersoll
http://www.grantingersoll.com/
http://lucene.grantingersoll.com
http://www.paperoftheweek.com/



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to