[ 
https://issues.apache.org/jira/browse/SOLR-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Khludnev updated SOLR-3011:
-----------------------------------

    Attachment: SOLR-3011.patch

Ok. I'm attaching refreshed path for core multithreading DIH issue: 
SOLR-3011.patch.

h3.Code

h4.DataImporter.java 

I added DocBuilder.destroy() to stop thread pool after all work is done. I'm 
bothered by testCase's warns about "thread leaks"

h4.DocBuilder.java 

* EntityRunner give up create EntityProcessors and obtains it from constructor 
args
* proper destroying of EntityProcessors
* EntityRunner.docWrapper is removed as not-thread-safe. it's passed explicitly 
by method arguments
* EntityRunner.entityEnded was't thread-safe too. moved into 
ThreadedEntityProcessorWrapper
* object instantiating was drastically amended to be threadsafe 
** single EntityRunner per Entity
** single EntityProcessor per EntityRunner
** N ThreadedEntityProcessorWrapper per EntityRunner uses its' EntityProcessor 
as delegate
** where N is number of threads specified at root entity (threads attr is 
prohibited for child entities)
** ThreadedEntityProcessorWrapper are numbered by their positions in 
EntityRunner's tepw list
** parent entity's ThreadedEntityProcessorWrapper always hits children's tepw 
with the same number as its' own
* parent entity's ThreadedEntityProcessorWrapper always hits children's tepw by 
plain Java synchronous call (w/o thread pool)

h4.EntityProcessorWrapper.java
protected transformRow() has been extracted from applyTransformer(). I need to 
reuse transformers logic for the paged flow but applyTransformer() has 
side-effect on rowcache field.

h4.ThreadedEntityProcessorWrapper.java 
in addition to all refactorings above (instantiating and field move). it 
contains the core idea of multithred cached entity processor:
* after tepw obtains access to thread-unaware delegate entityProcessor it need 
to pull whole page - all children rows belong to the current parent roe, 
* whole page is transformed and put into tepw.rowcahce, where they will be 
pulled later by the parent entity tepw
* important point is condition which enables the paged mode. I beleve any 
children entiry should be processed in paged mode. see TEPW.nextRow() var 
retrieveWholePage 

h3.Tests

h4.TestThreaded.java 
I've got that this test doesn't cover cached entity processor 
(where="xid=x.id") and doesn't cover N+1 usage ("... where y.xid=${x.id}"). 
There were single child row per parent. I added both usages with all threads 
attribute cases.  

h1. TBD
* I have some suspicions in Context.SCOPE_DOC. 

* even after this patch multithread DIH suffer from SOLR-2961, SOLR-2804. I 
need this patch applied to unlock them. 
* it's almost impossible to apply on 3.5. Whole SOLR-2382 with fixes should be 
ported before.

Thanks

                
> DIH MultiThreaded bug
> ---------------------
>
>                 Key: SOLR-3011
>                 URL: https://issues.apache.org/jira/browse/SOLR-3011
>             Project: Solr
>          Issue Type: Sub-task
>          Components: contrib - DataImportHandler
>    Affects Versions: 3.5, 4.0
>            Reporter: Mikhail Khludnev
>            Priority: Minor
>             Fix For: 4.0
>
>         Attachments: SOLR-3011.patch, SOLR-3011.patch
>
>
> current DIH design is not thread safe. see last comments at SOLR-2382 and 
> SOLR-2947. I'm going to provide the patch makes DIH core threadsafe. Mostly 
> it's a SOLR-2947 patch from 28th Dec. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to