age or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> --
> From: Mic
Oozie, a product only a mad Russian would love. ;-)
Just say no to hive. Go from Flat to Parquet.
(This sounds easy, but there’s some work that has to occur…)
Sorry for being cryptic, Mich’s question is pretty much generic for anyone
building a data lake so it ends up overlapping with some work
Thanks guys,
Sounds like let Informatica get the data out of RDBMS and create mapping to
flat files that will be delivered to a directory visible by HDFS host. Then
push the csv files into HDFS. then there are number of options to work on:
1. run cron or oozie to get data out of HDFS (or
Basically you mention the options. However, there are several ways how
informatica can extract (or store) from/to rdbms. If the native option is not
available then you need to go via JDBC as you have described.
Alternatively (but only if it is worth it) you can schedule fetching of the
files
Yes, it can be done and a standard practice. I would suggest a mixed
approach: use Informatica to create files in hdfs and have hive staging
tables as external tables on those directories. Then that point onwards use
spark.
Hth
Ayan
On 10 Nov 2016 04:00, "Mich Talebzadeh"
Thanks Mike for insight.
This is a request landed on us which is rather unusual.
As I understand Informatica is an ETL tool. Most of these are glorified
Sqoop with GUI where you define your source and target.
In a normal day Informatica takes data out of an RDBMS like Oracle table
and lands it
Hi,
I am exploring the idea of flexibility with importing multiple RDBMS tables
using Informatica that customer has into HDFS.
I don't want to use connectivity tools from Informatica to Hive etc.
So this is what I have in mind
1. If possible get the tables data out using Informatica and