[ https://issues.apache.org/jira/browse/SOLR-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mikhail Khludnev updated SOLR-3011: ----------------------------------- Attachment: SOLR-3011.patch Ok. I'm attaching refreshed path for core multithreading DIH issue: SOLR-3011.patch. h3.Code h4.DataImporter.java I added DocBuilder.destroy() to stop thread pool after all work is done. I'm bothered by testCase's warns about "thread leaks" h4.DocBuilder.java * EntityRunner give up create EntityProcessors and obtains it from constructor args * proper destroying of EntityProcessors * EntityRunner.docWrapper is removed as not-thread-safe. it's passed explicitly by method arguments * EntityRunner.entityEnded was't thread-safe too. moved into ThreadedEntityProcessorWrapper * object instantiating was drastically amended to be threadsafe ** single EntityRunner per Entity ** single EntityProcessor per EntityRunner ** N ThreadedEntityProcessorWrapper per EntityRunner uses its' EntityProcessor as delegate ** where N is number of threads specified at root entity (threads attr is prohibited for child entities) ** ThreadedEntityProcessorWrapper are numbered by their positions in EntityRunner's tepw list ** parent entity's ThreadedEntityProcessorWrapper always hits children's tepw with the same number as its' own * parent entity's ThreadedEntityProcessorWrapper always hits children's tepw by plain Java synchronous call (w/o thread pool) h4.EntityProcessorWrapper.java protected transformRow() has been extracted from applyTransformer(). I need to reuse transformers logic for the paged flow but applyTransformer() has side-effect on rowcache field. h4.ThreadedEntityProcessorWrapper.java in addition to all refactorings above (instantiating and field move). it contains the core idea of multithred cached entity processor: * after tepw obtains access to thread-unaware delegate entityProcessor it need to pull whole page - all children rows belong to the current parent roe, * whole page is transformed and put into tepw.rowcahce, where they will be pulled later by the parent entity tepw * important point is condition which enables the paged mode. I beleve any children entiry should be processed in paged mode. see TEPW.nextRow() var retrieveWholePage h3.Tests h4.TestThreaded.java I've got that this test doesn't cover cached entity processor (where="xid=x.id") and doesn't cover N+1 usage ("... where y.xid=${x.id}"). There were single child row per parent. I added both usages with all threads attribute cases. h1. TBD * I have some suspicions in Context.SCOPE_DOC. * even after this patch multithread DIH suffer from SOLR-2961, SOLR-2804. I need this patch applied to unlock them. * it's almost impossible to apply on 3.5. Whole SOLR-2382 with fixes should be ported before. Thanks > DIH MultiThreaded bug > --------------------- > > Key: SOLR-3011 > URL: https://issues.apache.org/jira/browse/SOLR-3011 > Project: Solr > Issue Type: Sub-task > Components: contrib - DataImportHandler > Affects Versions: 3.5, 4.0 > Reporter: Mikhail Khludnev > Priority: Minor > Fix For: 4.0 > > Attachments: SOLR-3011.patch, SOLR-3011.patch > > > current DIH design is not thread safe. see last comments at SOLR-2382 and > SOLR-2947. I'm going to provide the patch makes DIH core threadsafe. Mostly > it's a SOLR-2947 patch from 28th Dec. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org