[jira] [Comment Edited] (SOLR-8030) Transaction log does not store the update chain (or req params?) used for updates

Eugene Tenkaev (Jira) Sun, 29 Mar 2020 14:17:00 -0700


    [ 
https://issues.apache.org/jira/browse/SOLR-8030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17070498#comment-17070498
 ]


Eugene Tenkaev edited comment on SOLR-8030 at 3/29/20, 9:15 PM:
----------------------------------------------------------------

We need to operate on fully constructed document,  to remove set of dynamic 
fields that replaced by new different set of dynamic fields with different 
fields names.

And here we come up with post-processor.
We put this post-processor in the *default chain*.
We getting *SolrQueryRequest* from *AddUpdateCommand*:
{code}
    @Override
    protected void process(AddUpdateCommand cmd, SolrQueryRequest req, 
SolrQueryResponse rsp) {
        String value = cmd.getReq().getParams().get(NAME + ".xxx");
{code}
and remove old set of dynamic fields from the full document according to the 
parameters in *SolrQueryRequest* and ignore newly added fields.

Is there possibility that somehow code here will not work during replay and we 
loose behavior that adds this processor?

h4. Possible workaround for our case:
We can introduce workaround, when we add special technical field in schema that 
will contain command for removing old set of dynamic fields. But we will not 
index this technical field. So our post-processor will work only with data from 
*SolrInputDocument* and this technical field.

Is this workaround will handle current situation around replaying of updates? 
Or there some cases when all post-processors completely ignored even in default 
chain?

h3. Additionally
To the idea of [~elyograg], is it possible to move routing code out from 
*DistributedUpdateProcessor*? So all processors that comes after this routing 
processor will be executed on the proper node?
If so when we can move out Atomic Update processing from 
*DistributedUpdateProcessor* and it will be executed on node that have proper 
data.


was (Author: hronom):
We need to operate on fully constructed document,  to remove set of dynamic 
fields that replaced by new different set of dynamic fields with different 
fields names.

And here we come up with post-processor.
We put this post-processor in the *default chain*.
We getting *SolrQueryRequest* from *AddUpdateCommand*:
{code}
    @Override
    protected void process(AddUpdateCommand cmd, SolrQueryRequest req, 
SolrQueryResponse rsp) {
        String value = cmd.getReq().getParams().get(NAME + ".xxx");
{code}
and remove old set of dynamic fields from the full document according to the 
parameters in *SolrQueryRequest* and ignore newly added fields.

Is there possibility that somehow code here will not work during replay and we 
loose behavior that adds this processor?

h4. Possible workaround for our case:
If so we can introduce workaround when we add special technical field in schema 
that will contain command for removing old set of dynamic fields. But we will 
not index this technical field. So our post-processor will work only with data 
from *SolrInputDocument*.

Is this workaround will handle current situation around replaying of updates? 
Or there some cases when all post-processors completely ignored even in default 
chain?

h3. Additionally
To the idea of [~elyograg], is it possible to move routing code out from 
*DistributedUpdateProcessor*? So all processors that comes after this routing 
processor will be executed on the proper node?
If so when we can move out Atomic Update processing from 
*DistributedUpdateProcessor* and it will be executed on node that have proper 
data.

> Transaction log does not store the update chain (or req params?) used for 
> updates
> ---------------------------------------------------------------------------------
>
>                 Key: SOLR-8030
>                 URL: https://issues.apache.org/jira/browse/SOLR-8030
>             Project: Solr
>          Issue Type: Bug
>          Components: SolrCloud
>    Affects Versions: 5.3
>            Reporter: Ludovic Boutros
>            Priority: Major
>         Attachments: SOLR-8030.patch
>
>
> Transaction Log does not store the update chain, or any other details from 
> the original update request such as the request params, used during updates.
> Therefore tLog uses the default update chain, and a synthetic request, during 
> log replay.
> If we implement custom update logic with multiple distinct update chains that 
> use custom processors after DistributedUpdateProcessor, or if the default 
> chain uses processors whose behavior depends on other request params, then 
> log replay may be incorrect.
> Potentially problematic scenerios (need test cases):
> * DBQ where the main query string uses local param variables that refer to 
> other request params
> * custom Update chain set as {{default="true"}} using something like 
> StatelessScriptUpdateProcessorFactory after DUP where the script depends on 
> request params.
> * multiple named update chains with diff processors configured after DUP and 
> specific requests sent to diff chains -- ex: ParseDateProcessor w/ custom 
> formats configured after DUP in some special chains, but not in the default 
> chain



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-8030) Transaction log does not store the update chain (or req params?) used for updates

Reply via email to