[
https://issues.apache.org/jira/browse/FALCON-1240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14631744#comment-14631744
]
Venkat Ramachandran commented on FALCON-1240:
---------------------------------------------
HCAT Related discussion:
Srikanth,
While writing to HCAT, you mentioned the concrete implementation (i.e. Sqoop)
should use Falcon provided facilities to write to HCAT.
But, Sqoop extracts the data from database and directly writes to HCAT/Hive
providing all the needed partition keys — we won’t get a stream from sqoop
first of all.
Also, I think Sqoop examines each row and maps the row based on a column to a
partition in HCatalog (dymanic partition and is done by Hcatalog) — Venkat,
please confirm.
With this assumption, how can we utilize Falcon facilities to write to HCAT. If
we by pass and make Sqoop to do it, are there any issues? Will all the aspects
of lifecycle work?
Thanks,
Venky
---------------------------------------
During the call we discussed about figuring out the hcat target based on the
feed definition in falcon. The suggestion wasn't to pull the data via sqoop and
have additional work performed by falcon to push this into hcat. Since falcon
supports the concept of catalog based storage for feeds, you have all the
necessary information to complete the import into hive directly via sqoop
without having to redundantly declare any info relating to the hcat table in
feed definition are elsewhere.
Regards
Srikanth Sundarrajan
-------------------------------------------------
When we have a catalog storage, the data ingestion would pick the target to be
a hcatalog table and the static partition keys can be deduced from the storage
description. That is good. Sqoop from 1.4.5 allows multiple static
partition keys.
What I have a conflict of thought in my mind since our conversation yesterday
is with the filtering aspect (which may be something that Venky may have had in
his mind with his question). In my view, Falcon primarily moves data without
doing any sort of change to the data it receives. Modifying data/transforming
data will take Falcon into supporting new paradigms which would need more
architectural thought on the infrastructure to build and expose.
The SQL predicate usage is different in the sense that we let the SQL engine
provide the data and hence it is not as if the Falcon runtime works with it.
I think we should table the filtering support for now.
Thoughts?
Venkat
-------------------------------------------------------------
One other thing, these are important discussions for Falcon. It would be ideal
if we can allow other folks such as Venkatesh, Pallavi, Shwetha, Sowmya ... to
chime in if they have views on this. Does it makes sense to move this
discussions to public list ?
Regards
Srikanth Sundarrajan
> Data Import and Export
> -----------------------
>
> Key: FALCON-1240
> URL: https://issues.apache.org/jira/browse/FALCON-1240
> Project: Falcon
> Issue Type: New Feature
> Components: acquisition
> Reporter: Venkat Ramachandran
> Assignee: Venkat Ramachandran
> Attachments: Falcon Data Ingestion - Proposal.docx
>
>
> JIRA to track Data Import and Export design and implementation discussions
> Attaching proposal to start with.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)