[ 
https://issues.apache.org/jira/browse/LUCENE-868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated LUCENE-868:
-----------------------------------

    Attachment: LUCENE-868-v1.patch

First pass at a patch on providing a callback mechanism for term vectors.  The 
big benefit to this is the ability to control how to store the TV info, instead 
of being stuck with parallel arrays.  Two sample implementations are provided, 
as FieldSortedTermVectorMapper and SortedTermVectorMapper.

One thing I am not completely sure on and would appreciate a review of, is the 
interface definition of TermVectorMapper, in particular the interplay of the 
setExpectations method with the actual map method (see the TermVectorsReader).  
It is not thread-safe (although I am not so sure the current way is either)

The existing access methods for TVs still works just the same, even though the 
underlying implementation is different.  

I don't know if this is any faster in the Lucene core, but it should speed up 
those applications that have to reformat the TV info for their needs.



> Making Term Vectors more accessible
> -----------------------------------
>
>                 Key: LUCENE-868
>                 URL: https://issues.apache.org/jira/browse/LUCENE-868
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Store
>            Reporter: Grant Ingersoll
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: LUCENE-868-v1.patch
>
>
> One of the big issues with term vector usage is that the information is 
> loaded into parallel arrays as it is loaded, which are then often times 
> manipulated again to use in the application (for instance, they are sorted by 
> frequency).
> Adding a callback mechanism that allows the vector loading to be handled by 
> the application would make this a lot more efficient.
> I propose to add to IndexReader:
> abstract public void getTermFreqVector(int docNumber, String field, 
> TermVectorMapper mapper) throws IOException;
> and a similar one for the all fields version
> Where TermVectorMapper is an interface with a single method:
> void map(String term, int frequency, int offset, int position);
> The TermVectorReader will be modified to just call the TermVectorMapper.  The 
> existing getTermFreqVectors will be reimplemented to use an implementation of 
> TermVectorMapper that creates the parallel arrays.  Additionally, some simple 
> implementations that automatically sort vectors will also be created.
> This is my first draft of this API and is subject to change.  I hope to have 
> a patch soon.
> See 
> http://www.gossamer-threads.com/lists/lucene/java-user/48003?search_string=get%20the%20total%20term%20frequency;#48003
>  for related information.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to