[jira] [Commented] (SOLR-3535) Add block support for XMLLoader

Hoss Man (JIRA) Wed, 13 Jun 2012 14:55:44 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13294678#comment-13294678
 ]


Hoss Man commented on SOLR-3535:
--------------------------------

bq. I don't feel that this rich model is covered with single level parent-child 
well.

who said anything about a "single level" ? .. if SolrInputDocument can have a 
List<SolrInputDocument> of children, then those children can have other 
children, etc..

bq. PK field is a blocker for transparent handling scoped docs by the current 
processors. i.e. I don't think it's mandatory to provide PK field for every 
child document (most time it's useless and redundant info)

Agreed, but i don't see how it's a blocker - if the the children hang off of 
the top most parent, then as long as that parent has a uniqueKey, all of the 
distributed stuff (and any update processors that care about uniqueKey) should 
be fine ... processors that want to be aware of sub-documents might have to 
worry about it, and we have to think through how deletes by id should work (so 
that children are automaticly removed and not inherited by the ajacent parent 
doc) but those are going to issues that need thought through/solved regardless 
of how we model the nested docs in the processor chain API.

bq. field update processors can work wrong if the same field name is present in 
several scopes - name clash between different relations/scopes

a) that seems like an argument in favor of continuing to give the processors a 
single top level SolrInputDocument with all of it's children hanging off of it 
in a hierarchy, instead of adding a new AddBlockCommand that contains an 
flatened list of documents -- because the processors won't have any way of 
knowing if/when to treat some docs differently.

b) like other things i mentioned earlier, that really seems like a secondary 
concern -- for many use cases either the fields names will be distinct, or can 
be made distinct for the purposes of using this feature.  Update processors can 
(eventually) be made smarter to know to only operate on certain documents by 
"type" but any solution like that that would work on a sequential list of 
documents like in your "AddBlockCommand" suggestion could also work on a true 
hierarchy of SOlrInputDocuments (where it would have the acutal hierarchy to 
help inform it's behavior)

bq. why new api/property is necessary? is solrInputDoc.addField("skus", new 
Object[]{sku1, sku2, sku3}) not enough?

Are you suggesting we model child documents as objects (SolrInputDocuments i 
guess?) in a special field? ... what if i put child documents in multiple 
fields? would that signify the different types of child?  how would solr model 
that in the (lucene) Documents when giving them to the InddexWriter?  How would 
solr know how to order the children in from multiple fields/lists when creating 
the block?  Wouldn't the "type of child" information be better living in the 
child documents itself?  (particularly since that "type" information needs to 
be in the child documents anyway so that the filter query for a BJQ can be 
specified.)  

It also seems like it would require code that wants to know what children exist 
in a document to do a lot of work to find that out (need to iterate ever field 
in the SolrInputDocument and do reflection to see if they are child-documents 
or not)

Another concern off the top of my head is that a lot of existing code 
(including any custom update processors people might have) would assume those 
child documents are multivaluved field values and would probably break -- hence 
a new method on SolrInputDocument seems wiser (code that doens't know about may 
not do what you want, but at least it won't break it)

bq. there is a *pre*processors chain which deal with scoped documents and 
flatten them - there should be two of them: block-join (bjq counterpart); 
denormalizer (grouping counterpart); fk-copier for query-time join;

i don't really understand the need for this.  i'm at a complete loss as what 
you mean by "fk-copier for query-time join", but your suggestion for a new type 
of processor chain that can flatten/denormalize documents seems like it could 
easily be implemented using the existing UpdateProcessorChain code -- assuming 
we let SolrInputDocuments have other SolrInputDocuments as children.  Couldn't 
you just write a new "FlattenDocumentUpdateProcessor" such that anytime it gets 
a SolrInputDocument with children, it creates new AddDocCommands containing 
those children (adding whatever flattened fields from the parent that it wants) 
and executes them?

bq. for distributed processor AddBlockCommand should have PK - it's 
preprocessors' duty

but that doesn't address the issues yonik and i raised about all of the 
distributed update & transaction log code that already exists revolving around 
forwarding *documents* and recording their unique key.  What is the advantage 
of introducing a new AddBlockCommand that also has to have a unique key, and 
would need to be forwarded around atomically when we could just use the top 
level parent document with all of the existing distributed update code as is?
                
> Add block support for XMLLoader
> -------------------------------
>
>                 Key: SOLR-3535
>                 URL: https://issues.apache.org/jira/browse/SOLR-3535
>             Project: Solr
>          Issue Type: Sub-task
>          Components: update
>    Affects Versions: 4.1, 5.0
>            Reporter: Mikhail Khludnev
>            Priority: Minor
>         Attachments: SOLR-3535.patch
>
>
> I'd like to add the following update xml message:
> <add-block>
>     <doc>....</doc>
>     <doc>....</doc>
> </add-block>
> out of scope for now: 
> * other update formats
> * update log support (NRT), should not be a big deal
> * overwrite feature support for block updates - it's more complicated, I'll 
> tell you why
> Alt
> * wdyt about adding attribute to the current tag {pre}<add block="true">{pre} 
> * or we can establish RunBlockUpdateProcessor which treat every <add> 
> ....</add> as a block.
> *Test is included!!*
> How you'd suggest to improve the patch?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3535) Add block support for XMLLoader

Reply via email to