[jira] [Commented] (FLUME-2938) JDBC Source

Lior Zeno (JIRA) Fri, 01 Jul 2016 09:18:20 -0700

    [ 
https://issues.apache.org/jira/browse/FLUME-2938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15359206#comment-15359206
 ]


Lior Zeno commented on FLUME-2938:
----------------------------------

I'll start by stating that I'm not a Sqoop expert.

It's true that this source will have to pull for new events, unlike other 
sources that get new events by push. However, I'm still not sure how can Sqoop 
provide the functionality that Flume provides. Flume offers much more target 
options, simple transformations that do not require a MapReduce job, and so on.

I believe that Flume should provide this functionality. This ticket does not 
compete with Sqoop, it is not intended for batch computations (transformations) 
on data from relational databases, but a simple mechanism to transfer data from 
JDBC to any other source in small batches.



> JDBC Source
> -----------
>
>                 Key: FLUME-2938
>                 URL: https://issues.apache.org/jira/browse/FLUME-2938
>             Project: Flume
>          Issue Type: New Feature
>          Components: Sinks+Sources
>    Affects Versions: v1.8.0
>            Reporter: Lior Zeno
>             Fix For: v1.8.0
>
>
> The idea is to allow migrating data from SQL stores to NoSQL stores or HDFS 
> for archiving purposes.
> This source will get a statement to execute and a scheduling policy. It will 
> be able to fetch timestamped data by performing range queries on a 
> configurable field (this can fetch data with incremental id as well). For 
> fault-tolerance, the last fetched value can be checkpointed to a file.
> Dealing with large datasets can be done via the fetch_size parameter. (Ref: 
> https://docs.oracle.com/cd/A87860_01/doc/java.817/a83724/resltse5.htm)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLUME-2938) JDBC Source

Reply via email to