I think it is deprecated in Solr 8.x and will disappear.

You can use Apache manifoldcf or a custom software to introduce parallelism. 

> Am 07.01.2020 um 11:50 schrieb aanno.trash <aanno.tr...@gmail.com>:
> 
> Hello,
> 
> I looked a bit into the code of DIH (solr dataimporthandler and
> dataimporthandler-extra). I wonder what is the state of this code. It is
> in a 'contrib' folder and seems to work (and maintained). But is there
> ongoing development (e.g. additional features)?
> 
> The reason I'm asking is that I'm in a project where DIH is used.
> However, the import is very slow, especially into a solr cluster. I
> glanced over the code for my case and it looks like DIH is only
> single-threaded. I guess that changing DIH to support multi-threading on
> the 'root' (top level) entity should result in a dramatic performance boost.
> 
> Hence I hacked DIH a bit. To get started, I concentrated on the 'tika'
> example case with a bunch of private PDFs and only for a 'full-import'.
> From this (dirty) experiment, a multi-threaded DIH seems to be possible.
> However, some bigger code changes are needed. This is a incomplete list:
> 
> * Make VariableResolver immutable and change its interface/contract
> * All EntityProcessors seems to be written with only a single-thread in
> mind. I circumvented the problem by (a) supporting a clone operation and
> (b) cloning the EntityProcessors for each thread.
> * To get the code more handy, I introduced several interfaces where only
> complete abstract classes has been around before (Context, DataSource,
> DIHProperties, EntityProcessor, ...). Perhaps this in not absolutely
> needed but has simplified the refactoring substantially.
> 
> So this is my question: Would you consider the contribution of a BIG DIH
> change for merging into the project? Or is DIH just dead and should go
> away soon? And if you would consider the contribution, would it be best
> with several small changes or with a 'big-bang' pull request? Would you
> consider the contribution even if some features of DIH are dropped?
> (From my experiment, a very hot candidate to drop is the
> XPathEntityProcessor.)
> 
> Kind regards,
> 
> aanno2
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to