[ 
https://issues.apache.org/jira/browse/SQOOP-1072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757877#comment-13757877
 ] 

Jarek Jarcec Cecho commented on SQOOP-1072:
-------------------------------------------

I've started investigating this one and I would like to share my thoughts with 
other developers to get additional feedback.

I'm thinking about introducing a new second level citizen object called HIO 
(hadoop input output). Such objects would be something similar to a small 
connector, they would have independent configurations, validations and 
upgraders. Each HIO would cover one specific Input (export) or output (import) 
on hadoop side. For example I would imagine HDFS, HCatalog, HBase or Hive HIO 
implementations. I'm thinking of HIO implementations as a second level 
citizens, because I would not expect users or developers to be creating a new 
HIO often. Yet I believe that clear separation of each HIO implementation into 
separate maven module encapsulating the functionality will help us to achieve 
better readable and maintainable code (e.g unlike Sqoop 1.x). Unlike connectors 
I would expect that HIO will be more tightly integrated with Sqoop internals 
and will become more internal abstraction than something entirely exposed to 
the end user.

Having said all the nice words, I do not have on my mind simple path how to 
achieve that. Sqoop currently have only one framework entity encapsulating all 
configuration, validations and upgrades. We could potentially load all HIO 
modules on server start up and merge them into one structure that will be then 
used everywhere else. However I would assume that such merge could be quite 
tricky - we would have to ensure that form names are unique and validations 
with upgrades could easily become a nightmare. On the bride side, such merge 
would require quite isolated changes, so the initial implementation would be 
most likely quite simple. Another approach would be to make the HIO real second 
level citizen promoting the structures everywhere - e.g. represent them 
separately in the repository, let user explicitly choose which HIO should be 
used in a job (protocol + client change), etc... This second approach would be 
very intrusive as almost every aspect of Sqoop would have to altered. On the 
other side I would expect that we would end up with much cleaner design as all 
top level entities would be clearly separated.

I would be interested to hear thoughts of other contributors to see what path 
would be preferable. I'll be more than happy to put together more formal 
proposal for the aggressive path if necessary.
                
> Sqoop2: Abstract Input/Output interfaces
> ----------------------------------------
>
>                 Key: SQOOP-1072
>                 URL: https://issues.apache.org/jira/browse/SQOOP-1072
>             Project: Sqoop
>          Issue Type: Improvement
>    Affects Versions: 1.99.2
>            Reporter: Jarek Jarcec Cecho
>            Assignee: Jarek Jarcec Cecho
>             Fix For: 2.0.0
>
>
> The input/output interfaces like {{Text}} or {{SequenceFile}} are currently 
> hardcoded and are present through entire code base. It would be great to 
> abstract the I/O module similarly as we are doing in connectors and push 
> appropriate code to separate modules.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to