Hi
how about

1.  have a process that read the data from your sqlserver and dumps it as a
file into a directory on your hd
2. use spark-streanming to read data from that directory  and store it into
hdfs

perhaps there is some sort of spark 'connectors' that allows you to read
data from a db directly so you dont need to go via spk streaming?


hth










On Tue, Jun 7, 2016 at 3:09 PM, Ajay Chander <itsche...@gmail.com> wrote:

> Hi Spark users,
>
> Right now we are using spark for everything(loading the data from
> sqlserver, apply transformations, save it as permanent tables in hive) in
> our environment. Everything is being done in one spark application.
>
> The only thing we do before we launch our spark application through
> oozie is, to load the data from edge node to hdfs(it is being triggered
> through a ssh action from oozie to run shell script on edge node).
>
> My question is,  there's any way we can accomplish edge-to-hdfs copy
> through spark ? So that everything is done in one spark DAG and lineage
> graph?
>
> Any pointers are highly appreciated. Thanks
>
> Regards,
> Aj
>

Reply via email to