[jira] [Commented] (FALCON-36) Ability to ingest data from databases

Venkat Ramachandran (JIRA) Thu, 23 Jul 2015 14:37:34 -0700

    [ 
https://issues.apache.org/jira/browse/FALCON-36?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14639551#comment-14639551
 ]


Venkat Ramachandran commented on FALCON-36:
-------------------------------------------

[~ajayyadava] thanks for the comments. 

1. re-using the types now
2. fields is optional. 
    * if fields is missing -> all columns. 
    * if includes is specified, then subset of columns will be projected
    * if excludes is specified, then all columns except the specified will be 
projected
3. Feed import will be tied to a cluster. 
    * User can make this cluster as source and another as target to leverage 
replication to copy.   
    * Alternatively (not reason why), a user can add import policy to all the 
cluster in a feed. 
    * This will cause one import job running per feed cluster from the 
specified database 
    * This will cause over load of database and data inconsistency. It's better 
to import on the source cluster and replicate.
4. We are going with database as the entity since datasource is so generic that 
we can't enforce validations and support specific capabilities. We will add 
kafkabroker as another entity.
5. Just following the convention of cluster. Description is an attribute. Also 
have a tags as an element. 
6. Fixed the documentation in tags column 
7. Will add the drive to database.xml example in the next patch
8. The driver specified will be passed on to underlying implementation (in this 
case Sqoop) to load the jars. 
9. fixed the database.xml example.
10. Type identifies what type of database - mysql, oracle etc in order to take 
advantage of specific features and driver support. 
11. Version is carried over from cluster interface definition, but in this 
entity it is not needed since the driver will supply the correct verion of the 
jar to be used along with the class name. removing it 

I'm redoing the patch to use this new XSD and need to add some test case. I 
will upload the patch in the next 2 days with or without test cases for initial 
review. 

> Ability to ingest data from databases
> -------------------------------------
>
>                 Key: FALCON-36
>                 URL: https://issues.apache.org/jira/browse/FALCON-36
>             Project: Falcon
>          Issue Type: Improvement
>          Components: acquisition
>    Affects Versions: 0.3
>            Reporter: Venkatesh Seetharam
>            Assignee: Venkat Ramachandran
>         Attachments: FALCON-36.patch, FALCON-36.rebase.patch, 
> FALCON-36.review.patch, Falcon Data Ingestion - Proposal.docx, 
> falcon-36.xsd.patch.1
>
>
> Attempt to address data import from RDBMS into hadoop and export of data from 
> Hadoop into RDBMS. The plan is to use sqoop 1.x to materialize data motion 
> from/to RDBMS to/from HDFS. Hive will not be integrated in the first pass 
> until Falcon has a first class integration with HCatalog.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FALCON-36) Ability to ingest data from databases

Reply via email to