[ 
https://issues.apache.org/jira/browse/SOLR-4799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13651923#comment-13651923
 ] 

James Dyer commented on SOLR-4799:
----------------------------------

Mikhail,

Let me clarify that DIH is not mostly considered a "playground tool".  It 
performs very well and has a rich feature-set.  We use this in production to 
import millions of documents each day, with each document consisting of fields 
from 50+ data sources.  For simpler imports, it is a quick and easy way to get 
your data into Solr and run imports.  Many many installations use this in 
production and it works well in many cases.

That said, the codebase has suffered from years of neglect.  Over time people 
have been more willing to add features rather than refactor.  A lot of the code 
needs to be simpified, re-worked, less-important features removed, etc.  The 
tests need further improvement as well.

Your idea has great merit.  I think this would be an awesome feature to have in 
DIH.  I've wished for it before.  But I personally tend to shy away from 
committing big features to DIH because the code is not stable enough in my 
opinion.  I even have features in JIRA that I've developed and use in 
Production but feel uneasy about committing until more refactoring and test 
improvement work is done.
                
> SQLEntityProcessor for zipper join
> ----------------------------------
>
>                 Key: SOLR-4799
>                 URL: https://issues.apache.org/jira/browse/SOLR-4799
>             Project: Solr
>          Issue Type: New Feature
>          Components: contrib - DataImportHandler
>            Reporter: Mikhail Khludnev
>            Priority: Minor
>              Labels: dih
>
> DIH is mostly considered as a playground tool, and real usages end up with 
> SolrJ. I want to contribute few improvements target DIH performance.
> This one provides performant approach for joining SQL Entities with miserable 
> memory at contrast to 
> http://wiki.apache.org/solr/DataImportHandler#CachedSqlEntityProcessor  
> The idea is:
> * parent table is explicitly ordered by it’s PK in SQL
> * children table is explicitly ordered by parent_id FK in SQL
> * children entity processor joins ordered resultsets by ‘zipper’ algorithm.
> Do you think it’s worth to contribute it into DIH?
> cc: [~goksron] [~jdyer]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to