[
https://issues.apache.org/jira/browse/SOLR-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228649#comment-13228649
]
Mikhail Khludnev edited comment on SOLR-3011 at 3/13/12 8:03 PM:
-----------------------------------------------------------------
James,
bq. So it seems that for this to work, not only does the core (DocBuilder etc)
need to be thread-safe, but every component in a given DIH configuration needs
to be also.
For me it's doubtful statement. I believe that it's possible to have bunch of
threadUnsafe classes synchronized by some smart orchestrator.
bq. There also is quite a bit of code duplication in DocBuilder and classes
Yep. Agree, ThrdEPWrapper is a FullImport only DocBuilder code dupe.
bq. Mikhail, you've just noticed that MockDataSource was not designed to test a
multi-threaded scenario in a valid fashion.
not really, they just an odd mocks. With real DS every time you get a full
resulset from the beginning, but after you reach eof in MockDS's resultset,
re-querying gets you the same eof.
bq. Take a look at TestDocBuilderThreaded.
I've never seen it actually.
bq. 1. Keep 3.x as-is, and make any quick fixes to threads for common use-cases
there, as possible.
No any quick fixes for any "common" use-cases is possible. I'm sure.
bq. 2. In 4.0 (or a separate branch), remove threading from DIH.
I suggest an opposite way:
* be honest with users and remove "threads" from 3.6. Zero impact here. Nobody
use it. It just doesn't work.
* as well I already spend enormous efforts for fixing in it 4.0. I hope I will
complete the fix anyway. (it will live at github at least). Btw, the reason why
I fix 4.0 is SOLR-2382. Actually I wait sometime before it was commited.
bq. 4. Make DocBuilder, etc threadsafe. 5. Create a marker interface or
annotation
I don't see how it's possible and be really helpful.
bq. The SOLR-3011 patches work on 4.x .. But I can probably help with porting
(some of?) this patch back to 3.x.
Petr found a case where the patch doesn't work. After (if) I've done it, all
commits around SOLR-2382 can be cherrypicked to 3.x. Porting fix w/o
DIHCacheSupport will take more time.
In parallel with my proposals above, I think we really need to start a design
of new Ultimate DIH. I propose
# to pick up usecases (you are experienced in extreme caching, I did a
throughput maximization via async producer-consumer, Peter will give us his
cases, etc)
# sketch a design in plant uml, check that it's bulletproof
# cut it onto pieces, scrum by crowd
Btw, isn't there something like DIH, maybe we can just reuse some other OSS
tool, or library instead of write it ourselves. Some time ago I've heard about
something like Kettle. Don't really know what it is.
was (Author: mkhludnev):
James,
bq. So it seems that for this to work, not only does the core (DocBuilder etc)
need to be thread-safe, but every component in a given DIH configuration needs
to be also.
For me it's doubtful statement. I believe it's possible to have bunch of
threadUnsafe classes synchronized by some smart orchestrator.
bq. There also is quite a bit of code duplication in DocBuilder and classes
Yep. Agree, ThrdEPWrapper is a FullImport only DocBuilder code dupe.
bq. Mikhail, you've just noticed that MockDataSource was not designed to test a
multi-threaded scenario in a valid fashion.
not really, they just an odd mocks. With real DS every time you get a full
resulset from the beginning, but after you reach eof in MockDS's resultset,
re-querying gets you the same eof.
bq. Take a look at TestDocBuilderThreaded.
I've never seen it actually.
bq. 1. Keep 3.x as-is, and make any quick fixes to threads for common use-cases
there, as possible.
No any quick fixes for any "common" use-cases is possible. I'm sure.
bq. 2. In 4.0 (or a separate branch), remove threading from DIH.
I suggest an opposite way:
* be honest with users and remove "threads" from 3.6. Zero impact here. Nobody
use it. It just doesn't work.
* as well I already spend enormous efforts for fixing in it 4.0. I hope I will
complete the fix anyway. (it will live at github at least). Btw, the reason why
I fix 4.0 is SOLR-2382. Actually I wait sometime before it was completed.
bq. 4. Make DocBuilder, etc threadsafe. 5. Create a marker interface or
annotation
I don't see how it's possible and be really helpful.
bq. The SOLR-3011 patches work on 4.x .. But I can probably help with porting
(some of?) this patch back to 3.x.
Petr found a case where the patch doesn't work. After (if) I done it, all
commits around SOLR-2382 can be cherrypicked to 3.x. Porting fix w/o
DIHCacheSupport will take more time.
In to my opposite proposals above, I think we really need to start a design of
new Ultimate DIH. I propose
# to pick up usecases (you are experienced in extreme caching, I did a
throughput maximization via async producer-consumer, Peter will give us his
cases, etc)
# sketch a design in plant uml, check that it's bullet proof
# cut in
> DIH MultiThreaded bug
> ---------------------
>
> Key: SOLR-3011
> URL: https://issues.apache.org/jira/browse/SOLR-3011
> Project: Solr
> Issue Type: Sub-task
> Components: contrib - DataImportHandler
> Affects Versions: 3.5, 4.0
> Reporter: Mikhail Khludnev
> Priority: Minor
> Fix For: 4.0
>
> Attachments: SOLR-3011.patch, SOLR-3011.patch,
> patch-3011-EntityProcessorBase-iterator.patch,
> patch-3011-EntityProcessorBase-iterator.patch
>
>
> current DIH design is not thread safe. see last comments at SOLR-2382 and
> SOLR-2947. I'm going to provide the patch makes DIH core threadsafe. Mostly
> it's a SOLR-2947 patch from 28th Dec.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]