[ 
https://issues.apache.org/jira/browse/SOLR-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Khludnev updated SOLR-2947:
-----------------------------------

    Attachment: SOLR-2947.patch

Ok. here is the patch, which fixes issue with destroy() and problem with 
multiple threads and CachedSqlEntityProcessor.

h3.Code

h4.Context.java, ContextImpl.java 

* removed SCOPE_DOC constant. I can't find any usages. Old impl isn't thread 
safe. We can implement it thread safe if you want. Let me know if it's 
necessary.
* Pay attention that ContextImpl.putVal() *ignores the scope provided*. It 
should be tracked separately let me know if you like me to raise it.

h4.DataImporter.java 

I added DocBuilder.destroy() to stop thread pool after all work is done. I'm 
bothered by testCase's warns about "thread leaks"

h4.DIHCacheSupport.java

it just introduces a getter. But I generated diff against uncommitted 
SOLR-2961, so line numbers can be wrong, let me know I re-diff it.

h4.DocBuilder.java 

* EntityRunner stops create EntityProcessors and obtains it from constructor 
args
* proper destroying EntityProcessors
* EntityRunner.docWrapper is removed as not-thread-safe. it's passed explicitly 
by method arguments
* EntityRunner.entityEnded was't thread-safe too. moved into 
ThreadedEntityProcessorWrapper
* object instantiating was drastically amended to be threadsafe 
** single EntityRunner per Entity
** single EntityProcessor per EntityRunner
** N ThreadedEntityProcessorWrapper per EntityRunner uses its' EntityProcessor 
as delegate
** where N is number of threads specified at root entity (threads attr is 
prohibited for child entities)
** ThreadedEntityProcessorWrapper are numbered by their positions in 
EntityRunner's tepw list
** parent entity's ThreadedEntityProcessorWrapper always hits children's tepw 
with the same number as its' own
* parent entity's ThreadedEntityProcessorWrapper always hits children's tepw by 
plain Java synchronous call (w/o thread pool)

h4.EntityProcessor.java,EntityProcessorBase.java 
isPaged() property has been introduced

h4.EntityProcessorWrapper.java
protected transformRow() has been extracted from applyTransformer(). I need to 
reuse transformers logic for the paged flow but applyTransformer() has 
side-effect on rowcache field.

h4.ThreadedEntityProcessorWrapper.java 
in addition to all refactorings above (instantiating and field move). it 
contains the core idea of multithred cached entity processor:
* after tepw obtains access to thread-unaware delegate entityProcessor it need 
to pull whole page - all children records belong to the current parent, 
* whole page is transformed and put into tepw.rowcahce, where they will be 
pulled later by the parent entity tepw

h3.Tests

h4.TestThreaded.java 
added full space test for CachedSqlEP for no, 1, 2, 10 (keep in mind 1 thread 
don't equal to no-threads)

h4.TestEphemeralCache.java 
add double destroy() check EntityProcessors

h4.dataimport-cache-ephemeral.xml
specifies 10 threads and add double destroy() EntityProcessors

 


                
> DIH caching bug - EntityRunner destroys child entity processor
> --------------------------------------------------------------
>
>                 Key: SOLR-2947
>                 URL: https://issues.apache.org/jira/browse/SOLR-2947
>             Project: Solr
>          Issue Type: Sub-task
>          Components: contrib - DataImportHandler
>    Affects Versions: 4.0
>            Reporter: Mikhail Khludnev
>              Labels: noob
>             Fix For: 4.0
>
>         Attachments: SOLR-2947.patch, SOLR-2947.patch, SOLR-2947.patch, 
> dih-cache-destroy-on-threads-fix.patch, dih-cache-threads-enabling-bug.patch
>
>
> My intention is fix multithread import with SQL cache. Here is the 2nd stage. 
> If I enable DocBuilder.EntityRunner flow even for single thread, it breaks 
> the pretty basic functionality: parent-child join.
> the reason is [line 473 
> entityProcessor.destroy();|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/DocBuilder.java?revision=1201659&view=markup]
>  breaks children entityProcessor.
> see attachement comments for more details. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to