[
https://issues.apache.org/jira/browse/SOLR-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mikhail Khludnev updated SOLR-3011:
-----------------------------------
Attachment: SOLR-3011.patch
Ok. I'm attaching refreshed path for core multithreading DIH issue:
SOLR-3011.patch.
h3.Code
h4.DataImporter.java
I added DocBuilder.destroy() to stop thread pool after all work is done. I'm
bothered by testCase's warns about "thread leaks"
h4.DocBuilder.java
* EntityRunner give up create EntityProcessors and obtains it from constructor
args
* proper destroying of EntityProcessors
* EntityRunner.docWrapper is removed as not-thread-safe. it's passed explicitly
by method arguments
* EntityRunner.entityEnded was't thread-safe too. moved into
ThreadedEntityProcessorWrapper
* object instantiating was drastically amended to be threadsafe
** single EntityRunner per Entity
** single EntityProcessor per EntityRunner
** N ThreadedEntityProcessorWrapper per EntityRunner uses its' EntityProcessor
as delegate
** where N is number of threads specified at root entity (threads attr is
prohibited for child entities)
** ThreadedEntityProcessorWrapper are numbered by their positions in
EntityRunner's tepw list
** parent entity's ThreadedEntityProcessorWrapper always hits children's tepw
with the same number as its' own
* parent entity's ThreadedEntityProcessorWrapper always hits children's tepw by
plain Java synchronous call (w/o thread pool)
h4.EntityProcessorWrapper.java
protected transformRow() has been extracted from applyTransformer(). I need to
reuse transformers logic for the paged flow but applyTransformer() has
side-effect on rowcache field.
h4.ThreadedEntityProcessorWrapper.java
in addition to all refactorings above (instantiating and field move). it
contains the core idea of multithred cached entity processor:
* after tepw obtains access to thread-unaware delegate entityProcessor it need
to pull whole page - all children rows belong to the current parent roe,
* whole page is transformed and put into tepw.rowcahce, where they will be
pulled later by the parent entity tepw
* important point is condition which enables the paged mode. I beleve any
children entiry should be processed in paged mode. see TEPW.nextRow() var
retrieveWholePage
h3.Tests
h4.TestThreaded.java
I've got that this test doesn't cover cached entity processor
(where="xid=x.id") and doesn't cover N+1 usage ("... where y.xid=${x.id}").
There were single child row per parent. I added both usages with all threads
attribute cases.
h1. TBD
* I have some suspicions in Context.SCOPE_DOC.
* even after this patch multithread DIH suffer from SOLR-2961, SOLR-2804. I
need this patch applied to unlock them.
* it's almost impossible to apply on 3.5. Whole SOLR-2382 with fixes should be
ported before.
Thanks
> DIH MultiThreaded bug
> ---------------------
>
> Key: SOLR-3011
> URL: https://issues.apache.org/jira/browse/SOLR-3011
> Project: Solr
> Issue Type: Sub-task
> Components: contrib - DataImportHandler
> Affects Versions: 3.5, 4.0
> Reporter: Mikhail Khludnev
> Priority: Minor
> Fix For: 4.0
>
> Attachments: SOLR-3011.patch, SOLR-3011.patch
>
>
> current DIH design is not thread safe. see last comments at SOLR-2382 and
> SOLR-2947. I'm going to provide the patch makes DIH core threadsafe. Mostly
> it's a SOLR-2947 patch from 28th Dec.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]