Hi Spark users,

Right now we are using spark for everything(loading the data from
sqlserver, apply transformations, save it as permanent tables in hive) in
our environment. Everything is being done in one spark application.

The only thing we do before we launch our spark application through
oozie is, to load the data from edge node to hdfs(it is being triggered
through a ssh action from oozie to run shell script on edge node).

My question is,  there's any way we can accomplish edge-to-hdfs copy
through spark ? So that everything is done in one spark DAG and lineage
graph?

Any pointers are highly appreciated. Thanks

Regards,
Aj

Reply via email to