[ 
https://issues.apache.org/jira/browse/SOLR-4799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14028206#comment-14028206
 ] 

Mikhail Khludnev commented on SOLR-4799:
----------------------------------------

There are a plenty of sibling point discussed here, let me keep one more. I 
checked one thing with Kettle ETL (Pentaho). the main problem with Kettle is 
Eclipse based IDE UI. Giving the DIH replatforming, we expect some Web UI for 
DSL editing. I found sibling project 
[CDA|http://www.webdetails.pt/ctools/cda.html#cda_editor], which is looking 
pretty much like this. Here is the summary: 
- the project itself seems modular enough (CBF), hence we can slice some pieces 
for using in DIH2.0
- CDA is just a data access - whatever to JSON via HTTP GET
- thus, it lacks of final indexing steps (via POST or xxxSolrServer);
- also, it lacks of long lasting command framework (it's a trivial thread with 
interruption and status flags; not a much deal, but nothing for free there)
- it shows pretty cute usage of ETL primitives (and I still think that Kettle 
guts are much powerful than Morflines'): it uses xml DSL to configure Kettle 
steps and run data export as ETL process. 

> SQLEntityProcessor for zipper join
> ----------------------------------
>
>                 Key: SOLR-4799
>                 URL: https://issues.apache.org/jira/browse/SOLR-4799
>             Project: Solr
>          Issue Type: New Feature
>          Components: contrib - DataImportHandler
>            Reporter: Mikhail Khludnev
>            Priority: Minor
>              Labels: dih
>         Attachments: SOLR-4799.patch
>
>
> DIH is mostly considered as a playground tool, and real usages end up with 
> SolrJ. I want to contribute few improvements target DIH performance.
> This one provides performant approach for joining SQL Entities with miserable 
> memory at contrast to 
> http://wiki.apache.org/solr/DataImportHandler#CachedSqlEntityProcessor  
> The idea is:
> * parent table is explicitly ordered by it’s PK in SQL
> * children table is explicitly ordered by parent_id FK in SQL
> * children entity processor joins ordered resultsets by ‘zipper’ algorithm.
> Do you think it’s worth to contribute it into DIH?
> cc: [~goksron] [~jdyer]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to