Re: importing data into hdfs/spark using Informatica ETL tool

2016-11-10 Thread Mich Talebzadeh
age or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > > -- > From: Mic

Re: importing data into hdfs/spark using Informatica ETL tool

2016-11-09 Thread Michael Segel
Oozie, a product only a mad Russian would love. ;-) Just say no to hive. Go from Flat to Parquet. (This sounds easy, but there’s some work that has to occur…) Sorry for being cryptic, Mich’s question is pretty much generic for anyone building a data lake so it ends up overlapping with some work

Re: importing data into hdfs/spark using Informatica ETL tool

2016-11-09 Thread Mich Talebzadeh
Thanks guys, Sounds like let Informatica get the data out of RDBMS and create mapping to flat files that will be delivered to a directory visible by HDFS host. Then push the csv files into HDFS. then there are number of options to work on: 1. run cron or oozie to get data out of HDFS (or

Re: importing data into hdfs/spark using Informatica ETL tool

2016-11-09 Thread Jörn Franke
Basically you mention the options. However, there are several ways how informatica can extract (or store) from/to rdbms. If the native option is not available then you need to go via JDBC as you have described. Alternatively (but only if it is worth it) you can schedule fetching of the files

Re: importing data into hdfs/spark using Informatica ETL tool

2016-11-09 Thread ayan guha
Yes, it can be done and a standard practice. I would suggest a mixed approach: use Informatica to create files in hdfs and have hive staging tables as external tables on those directories. Then that point onwards use spark. Hth Ayan On 10 Nov 2016 04:00, "Mich Talebzadeh"

Re: importing data into hdfs/spark using Informatica ETL tool

2016-11-09 Thread Mich Talebzadeh
Thanks Mike for insight. This is a request landed on us which is rather unusual. As I understand Informatica is an ETL tool. Most of these are glorified Sqoop with GUI where you define your source and target. In a normal day Informatica takes data out of an RDBMS like Oracle table and lands it

importing data into hdfs/spark using Informatica ETL tool

2016-11-09 Thread Mich Talebzadeh
Hi, I am exploring the idea of flexibility with importing multiple RDBMS tables using Informatica that customer has into HDFS. I don't want to use connectivity tools from Informatica to Hive etc. So this is what I have in mind 1. If possible get the tables data out using Informatica and