[ 
https://issues.apache.org/jira/browse/SOLR-3229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-3229:
---------------------------

    Attachment: SOLR-3229.patch

Hang: Thank you for your patch.

I agree, the "docid" as a key is dangerous and misleading in distributed mode, 
and we should switch to using the uniqueKey when available, but if we leave 
things as you had it in your patch, existing (single node) users who don't have 
a uniqueKey field would no longer be able to get term vectors at all.

I updated your patch to leave the key alone if there is no uniqueKey, and 
eliminate the "doc-" prefix when there is one.  I also added a new distributed 
test to prove that everything is working, and that turned up a few problems - 
some of which i fixed (dealing with warnings, and ensuring that TVC results are 
in the correct order for the result documents).

One thing i discovered that i'm not sure about is what to do about the "df" and 
"tf-idf" values when requested. in the test they have to be ignored because the 
way the distributed test works is to create a single node instance and compare 
it with a multi-node instance that has identical documents, and in the 
distributed TVC code, these won't match up -- but i'm not sure if that's a bug 
(because the df & tf-idf values aren't "merged" from all nodes) or a feature 
(because you get the real df & tf-idf values for that term for that doc from 
the shard it lives in) ... either way it shouldn't stop fixing the basic 
problem of TVC failing painfully in a distributed request, so i've opened 
SOLR-3720 to track this in the future.

feedback on this revised patch/test would be appreciated
                
> TermVectorComponent does not return terms in distributed search
> ---------------------------------------------------------------
>
>                 Key: SOLR-3229
>                 URL: https://issues.apache.org/jira/browse/SOLR-3229
>             Project: Solr
>          Issue Type: Bug
>          Components: SearchComponents - other
>    Affects Versions: 4.0-ALPHA
>         Environment: Ubuntu 11.10, openjdk-6
>            Reporter: Hang Xie
>            Assignee: Hoss Man
>              Labels: patch
>             Fix For: 4.0
>
>         Attachments: SOLR-3229.patch, TermVectorComponent.patch
>
>
> TermVectorComponent does not return terms in distributed search, the 
> distributedProcess() incorrectly uses Solr Unique Key to do subrequests, 
> while process() expects Lucene document ids. Also, parameters are transferred 
> in different format thus making distributed search returns no result.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to