[
https://issues.apache.org/jira/browse/FALCON-36?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14639551#comment-14639551
]
Venkat Ramachandran commented on FALCON-36:
-------------------------------------------
[~ajayyadava] thanks for the comments.
1. re-using the types now
2. fields is optional.
* if fields is missing -> all columns.
* if includes is specified, then subset of columns will be projected
* if excludes is specified, then all columns except the specified will be
projected
3. Feed import will be tied to a cluster.
* User can make this cluster as source and another as target to leverage
replication to copy.
* Alternatively (not reason why), a user can add import policy to all the
cluster in a feed.
* This will cause one import job running per feed cluster from the
specified database
* This will cause over load of database and data inconsistency. It's better
to import on the source cluster and replicate.
4. We are going with database as the entity since datasource is so generic that
we can't enforce validations and support specific capabilities. We will add
kafkabroker as another entity.
5. Just following the convention of cluster. Description is an attribute. Also
have a tags as an element.
6. Fixed the documentation in tags column
7. Will add the drive to database.xml example in the next patch
8. The driver specified will be passed on to underlying implementation (in this
case Sqoop) to load the jars.
9. fixed the database.xml example.
10. Type identifies what type of database - mysql, oracle etc in order to take
advantage of specific features and driver support.
11. Version is carried over from cluster interface definition, but in this
entity it is not needed since the driver will supply the correct verion of the
jar to be used along with the class name. removing it
I'm redoing the patch to use this new XSD and need to add some test case. I
will upload the patch in the next 2 days with or without test cases for initial
review.
> Ability to ingest data from databases
> -------------------------------------
>
> Key: FALCON-36
> URL: https://issues.apache.org/jira/browse/FALCON-36
> Project: Falcon
> Issue Type: Improvement
> Components: acquisition
> Affects Versions: 0.3
> Reporter: Venkatesh Seetharam
> Assignee: Venkat Ramachandran
> Attachments: FALCON-36.patch, FALCON-36.rebase.patch,
> FALCON-36.review.patch, Falcon Data Ingestion - Proposal.docx,
> falcon-36.xsd.patch.1
>
>
> Attempt to address data import from RDBMS into hadoop and export of data from
> Hadoop into RDBMS. The plan is to use sqoop 1.x to materialize data motion
> from/to RDBMS to/from HDFS. Hive will not be integrated in the first pass
> until Falcon has a first class integration with HCatalog.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)