I think it is deprecated in Solr 8.x and will disappear.
You can use Apache manifoldcf or a custom software to introduce parallelism. > Am 07.01.2020 um 11:50 schrieb aanno.trash <aanno.tr...@gmail.com>: > > Hello, > > I looked a bit into the code of DIH (solr dataimporthandler and > dataimporthandler-extra). I wonder what is the state of this code. It is > in a 'contrib' folder and seems to work (and maintained). But is there > ongoing development (e.g. additional features)? > > The reason I'm asking is that I'm in a project where DIH is used. > However, the import is very slow, especially into a solr cluster. I > glanced over the code for my case and it looks like DIH is only > single-threaded. I guess that changing DIH to support multi-threading on > the 'root' (top level) entity should result in a dramatic performance boost. > > Hence I hacked DIH a bit. To get started, I concentrated on the 'tika' > example case with a bunch of private PDFs and only for a 'full-import'. > From this (dirty) experiment, a multi-threaded DIH seems to be possible. > However, some bigger code changes are needed. This is a incomplete list: > > * Make VariableResolver immutable and change its interface/contract > * All EntityProcessors seems to be written with only a single-thread in > mind. I circumvented the problem by (a) supporting a clone operation and > (b) cloning the EntityProcessors for each thread. > * To get the code more handy, I introduced several interfaces where only > complete abstract classes has been around before (Context, DataSource, > DIHProperties, EntityProcessor, ...). Perhaps this in not absolutely > needed but has simplified the refactoring substantially. > > So this is my question: Would you consider the contribution of a BIG DIH > change for merging into the project? Or is DIH just dead and should go > away soon? And if you would consider the contribution, would it be best > with several small changes or with a 'big-bang' pull request? Would you > consider the contribution even if some features of DIH are dropped? > (From my experiment, a very hot candidate to drop is the > XPathEntityProcessor.) > > Kind regards, > > aanno2 > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org