[jira] [Commented] (FALCON-1240) Data Import and Export

Venkat Ramachandran (JIRA) Fri, 17 Jul 2015 12:03:24 -0700

    [ 
https://issues.apache.org/jira/browse/FALCON-1240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14631744#comment-14631744
 ]


Venkat Ramachandran commented on FALCON-1240:
---------------------------------------------

HCAT Related discussion: 

Srikanth,

While writing to HCAT, you mentioned the concrete implementation (i.e. Sqoop) 
should use Falcon provided facilities to write to HCAT. 
But, Sqoop extracts the data from database and directly writes to HCAT/Hive 
providing all the needed partition keys — we won’t get a stream from sqoop 
first of all.

Also, I think Sqoop examines each row and maps the row based on a column to a 
partition in HCatalog (dymanic partition and is done by Hcatalog) — Venkat, 
please confirm.

With this assumption, how can we utilize Falcon facilities to write to HCAT. If 
we by pass and make Sqoop to do it, are there any issues? Will all the aspects 
of lifecycle work?

Thanks,
Venky

---------------------------------------

During the call we discussed about figuring out the hcat target based on the 
feed definition in falcon. The suggestion wasn't to pull the data via sqoop and 
have additional work performed by falcon to push this into hcat. Since falcon 
supports the concept of catalog based storage for feeds, you have all the 
necessary information to complete the import into hive directly via sqoop 
without having to redundantly declare any info relating to the hcat table in 
feed definition are elsewhere.

Regards
Srikanth Sundarrajan

-------------------------------------------------

When we have a catalog storage, the data ingestion would pick the target to be 
a hcatalog table and the static partition keys can be deduced from the storage 
description.   That is good.   Sqoop from 1.4.5 allows multiple static 
partition keys.

What I have a conflict of thought in my mind since our conversation yesterday 
is with the filtering aspect (which may be something that Venky may have had in 
his mind with his question).  In my view, Falcon primarily moves data without 
doing any sort of change to the data it receives.   Modifying data/transforming 
data will take Falcon into supporting new paradigms which would need more 
architectural thought on the infrastructure to build and expose.

The SQL predicate usage is different in the sense that we let the SQL engine 
provide the data and hence it is not as if the Falcon runtime works with it.    
I think we should table the filtering support for now.
   Thoughts?
Venkat

-------------------------------------------------------------

One other thing, these are important discussions for Falcon. It would be ideal 
if we can allow other folks such as Venkatesh, Pallavi, Shwetha, Sowmya ... to 
chime in if they have views on this. Does it makes sense to move this 
discussions to public list ? 

Regards
Srikanth Sundarrajan



> Data Import and Export 
> -----------------------
>
>                 Key: FALCON-1240
>                 URL: https://issues.apache.org/jira/browse/FALCON-1240
>             Project: Falcon
>          Issue Type: New Feature
>          Components: acquisition
>            Reporter: Venkat Ramachandran
>            Assignee: Venkat Ramachandran
>         Attachments: Falcon Data Ingestion - Proposal.docx
>
>
> JIRA to track Data Import and Export design and implementation discussions
> Attaching proposal to start with.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FALCON-1240) Data Import and Export

Reply via email to