[ 
https://issues.apache.org/jira/browse/SOLR-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13186263#comment-13186263
 ] 

Mikhail Khludnev commented on SOLR-2947:
----------------------------------------

bq. ...why is it sufficient to get the children from only the first element of 
that iterator? 

* yes it is. just because DIH code works only with threads=1. assert 
entityProcessorWrapper.size()==1. but if we set threads to 2 or more, every 
TEPW will have own collection of children' entity runners (and every ER will 
have own EP). I address it in SOLR-3011 see my explanation above 
"DocBuilder.java/object instantiating". 

bq. are there any situations where the iterator may not return any elements?

* EntityRunner.<init> declares size of this list 1 by default. if user 
configured it to 0 EntityRunner.run() method will fail due to 
entityProcessorWrapper.get(0).  

bq.  at first i thought this was because all of the 
ThreadedEntityProcessorWrapper instances were initialized with identical 
EntityProcessors

done in SOLR-3011 patch. 
{code}
for (int i = 0; i < threads; i++) {
        thepw = new ThreadedEntityProcessorWrapper(       ...
                childrenRunners, i);           <-- identical EntityProcessors
        entityProcessorWrapper.add(thepw);
}
{code}

bq. Is this something that's just a hack that only works with 1 thread 
until/unless SOLR-3011 is resolved?

in according to subj it's a fix for a bug introduced by SOLR-2384. threads=2 
didn't work before, but now threads=1 doesn't work too. I removed wrong EP 
destroying, and implement destroying which works now for threads=1, and will 
work for threads=10 after SOLR-3011.

bq. will this break things for people using multithreaded DIH without nested 
entities?

test passed, you know. it even will destroy root EP properly because the 
collecting of ERs in doFullDump() starts from placing root ER into result. 

bq.  Either way: a comment clarifying that monstrosity would be a good idea.

I can't distinguish which monstrosity you refer to. 



                
> DIH caching bug - EntityRunner destroys child entity processor
> --------------------------------------------------------------
>
>                 Key: SOLR-2947
>                 URL: https://issues.apache.org/jira/browse/SOLR-2947
>             Project: Solr
>          Issue Type: Sub-task
>          Components: contrib - DataImportHandler
>    Affects Versions: 4.0
>            Reporter: Mikhail Khludnev
>              Labels: noob
>             Fix For: 4.0
>
>         Attachments: SOLR-2947.patch, SOLR-2947.patch, SOLR-2947.patch, 
> SOLR-2947.patch, dih-cache-destroy-on-threads-fix.patch, 
> dih-cache-threads-enabling-bug.patch
>
>
> My intention is fix multithread import with SQL cache. Here is the 2nd stage. 
> If I enable DocBuilder.EntityRunner flow even for single thread, it breaks 
> the pretty basic functionality: parent-child join.
> the reason is [line 473 
> entityProcessor.destroy();|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/DocBuilder.java?revision=1201659&view=markup]
>  breaks children entityProcessor.
> see attachement comments for more details. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to