[ https://issues.apache.org/jira/browse/SOLR-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13294678#comment-13294678 ]
Hoss Man commented on SOLR-3535: -------------------------------- bq. I don't feel that this rich model is covered with single level parent-child well. who said anything about a "single level" ? .. if SolrInputDocument can have a List<SolrInputDocument> of children, then those children can have other children, etc.. bq. PK field is a blocker for transparent handling scoped docs by the current processors. i.e. I don't think it's mandatory to provide PK field for every child document (most time it's useless and redundant info) Agreed, but i don't see how it's a blocker - if the the children hang off of the top most parent, then as long as that parent has a uniqueKey, all of the distributed stuff (and any update processors that care about uniqueKey) should be fine ... processors that want to be aware of sub-documents might have to worry about it, and we have to think through how deletes by id should work (so that children are automaticly removed and not inherited by the ajacent parent doc) but those are going to issues that need thought through/solved regardless of how we model the nested docs in the processor chain API. bq. field update processors can work wrong if the same field name is present in several scopes - name clash between different relations/scopes a) that seems like an argument in favor of continuing to give the processors a single top level SolrInputDocument with all of it's children hanging off of it in a hierarchy, instead of adding a new AddBlockCommand that contains an flatened list of documents -- because the processors won't have any way of knowing if/when to treat some docs differently. b) like other things i mentioned earlier, that really seems like a secondary concern -- for many use cases either the fields names will be distinct, or can be made distinct for the purposes of using this feature. Update processors can (eventually) be made smarter to know to only operate on certain documents by "type" but any solution like that that would work on a sequential list of documents like in your "AddBlockCommand" suggestion could also work on a true hierarchy of SOlrInputDocuments (where it would have the acutal hierarchy to help inform it's behavior) bq. why new api/property is necessary? is solrInputDoc.addField("skus", new Object[]{sku1, sku2, sku3}) not enough? Are you suggesting we model child documents as objects (SolrInputDocuments i guess?) in a special field? ... what if i put child documents in multiple fields? would that signify the different types of child? how would solr model that in the (lucene) Documents when giving them to the InddexWriter? How would solr know how to order the children in from multiple fields/lists when creating the block? Wouldn't the "type of child" information be better living in the child documents itself? (particularly since that "type" information needs to be in the child documents anyway so that the filter query for a BJQ can be specified.) It also seems like it would require code that wants to know what children exist in a document to do a lot of work to find that out (need to iterate ever field in the SolrInputDocument and do reflection to see if they are child-documents or not) Another concern off the top of my head is that a lot of existing code (including any custom update processors people might have) would assume those child documents are multivaluved field values and would probably break -- hence a new method on SolrInputDocument seems wiser (code that doens't know about may not do what you want, but at least it won't break it) bq. there is a *pre*processors chain which deal with scoped documents and flatten them - there should be two of them: block-join (bjq counterpart); denormalizer (grouping counterpart); fk-copier for query-time join; i don't really understand the need for this. i'm at a complete loss as what you mean by "fk-copier for query-time join", but your suggestion for a new type of processor chain that can flatten/denormalize documents seems like it could easily be implemented using the existing UpdateProcessorChain code -- assuming we let SolrInputDocuments have other SolrInputDocuments as children. Couldn't you just write a new "FlattenDocumentUpdateProcessor" such that anytime it gets a SolrInputDocument with children, it creates new AddDocCommands containing those children (adding whatever flattened fields from the parent that it wants) and executes them? bq. for distributed processor AddBlockCommand should have PK - it's preprocessors' duty but that doesn't address the issues yonik and i raised about all of the distributed update & transaction log code that already exists revolving around forwarding *documents* and recording their unique key. What is the advantage of introducing a new AddBlockCommand that also has to have a unique key, and would need to be forwarded around atomically when we could just use the top level parent document with all of the existing distributed update code as is? > Add block support for XMLLoader > ------------------------------- > > Key: SOLR-3535 > URL: https://issues.apache.org/jira/browse/SOLR-3535 > Project: Solr > Issue Type: Sub-task > Components: update > Affects Versions: 4.1, 5.0 > Reporter: Mikhail Khludnev > Priority: Minor > Attachments: SOLR-3535.patch > > > I'd like to add the following update xml message: > <add-block> > <doc>....</doc> > <doc>....</doc> > </add-block> > out of scope for now: > * other update formats > * update log support (NRT), should not be a big deal > * overwrite feature support for block updates - it's more complicated, I'll > tell you why > Alt > * wdyt about adding attribute to the current tag {pre}<add block="true">{pre} > * or we can establish RunBlockUpdateProcessor which treat every <add> > ....</add> as a block. > *Test is included!!* > How you'd suggest to improve the patch? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org