[ https://issues.apache.org/jira/browse/SOLR-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mikhail Khludnev updated SOLR-2947: ----------------------------------- Attachment: SOLR-2947.patch Ok. here is the patch, which fixes issue with destroy() and problem with multiple threads and CachedSqlEntityProcessor. h3.Code h4.Context.java, ContextImpl.java * removed SCOPE_DOC constant. I can't find any usages. Old impl isn't thread safe. We can implement it thread safe if you want. Let me know if it's necessary. * Pay attention that ContextImpl.putVal() *ignores the scope provided*. It should be tracked separately let me know if you like me to raise it. h4.DataImporter.java I added DocBuilder.destroy() to stop thread pool after all work is done. I'm bothered by testCase's warns about "thread leaks" h4.DIHCacheSupport.java it just introduces a getter. But I generated diff against uncommitted SOLR-2961, so line numbers can be wrong, let me know I re-diff it. h4.DocBuilder.java * EntityRunner stops create EntityProcessors and obtains it from constructor args * proper destroying EntityProcessors * EntityRunner.docWrapper is removed as not-thread-safe. it's passed explicitly by method arguments * EntityRunner.entityEnded was't thread-safe too. moved into ThreadedEntityProcessorWrapper * object instantiating was drastically amended to be threadsafe ** single EntityRunner per Entity ** single EntityProcessor per EntityRunner ** N ThreadedEntityProcessorWrapper per EntityRunner uses its' EntityProcessor as delegate ** where N is number of threads specified at root entity (threads attr is prohibited for child entities) ** ThreadedEntityProcessorWrapper are numbered by their positions in EntityRunner's tepw list ** parent entity's ThreadedEntityProcessorWrapper always hits children's tepw with the same number as its' own * parent entity's ThreadedEntityProcessorWrapper always hits children's tepw by plain Java synchronous call (w/o thread pool) h4.EntityProcessor.java,EntityProcessorBase.java isPaged() property has been introduced h4.EntityProcessorWrapper.java protected transformRow() has been extracted from applyTransformer(). I need to reuse transformers logic for the paged flow but applyTransformer() has side-effect on rowcache field. h4.ThreadedEntityProcessorWrapper.java in addition to all refactorings above (instantiating and field move). it contains the core idea of multithred cached entity processor: * after tepw obtains access to thread-unaware delegate entityProcessor it need to pull whole page - all children records belong to the current parent, * whole page is transformed and put into tepw.rowcahce, where they will be pulled later by the parent entity tepw h3.Tests h4.TestThreaded.java added full space test for CachedSqlEP for no, 1, 2, 10 (keep in mind 1 thread don't equal to no-threads) h4.TestEphemeralCache.java add double destroy() check EntityProcessors h4.dataimport-cache-ephemeral.xml specifies 10 threads and add double destroy() EntityProcessors > DIH caching bug - EntityRunner destroys child entity processor > -------------------------------------------------------------- > > Key: SOLR-2947 > URL: https://issues.apache.org/jira/browse/SOLR-2947 > Project: Solr > Issue Type: Sub-task > Components: contrib - DataImportHandler > Affects Versions: 4.0 > Reporter: Mikhail Khludnev > Labels: noob > Fix For: 4.0 > > Attachments: SOLR-2947.patch, SOLR-2947.patch, SOLR-2947.patch, > dih-cache-destroy-on-threads-fix.patch, dih-cache-threads-enabling-bug.patch > > > My intention is fix multithread import with SQL cache. Here is the 2nd stage. > If I enable DocBuilder.EntityRunner flow even for single thread, it breaks > the pretty basic functionality: parent-child join. > the reason is [line 473 > entityProcessor.destroy();|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/DocBuilder.java?revision=1201659&view=markup] > breaks children entityProcessor. > see attachement comments for more details. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org