Re: Hadoop ETLing with Flink

Robert Metzger Mon, 20 Apr 2015 06:23:23 -0700

Hi Stefan,

you can use Flink to load data into HDFS.
The CSV reader is suited for reading delimiter separated text files into
the system. But you can also read data from a lot of other sources (avro,
jdbc, mongodb, hcatalog).


We don't have any utilities to make writing to HCatalog very easy, but you
can certainly write to HCatalog with Flink's Hadoop OutputFormat wrappers:
http://ci.apache.org/projects/flink/flink-docs-master/hadoop_compatibility.html#using-hadoop-outputformats

Here is some documentation on how to use the Hcatalog output format:
https://cwiki.apache.org/confluence/display/Hive/HCatalog+InputOutput

You probably have to do something like:

HCatOutputFormat.setOutput(job, OutputJobInfo.create(dbName,
outputTableName, null));
HCatSchema s = HCatOutputFormat.getTableSchema(job);
HCatOutputFormat.setSchema(job, s);



Let me know if you need more help writing to Hcatalog.




On Mon, Apr 20, 2015 at 1:29 PM, Papp, Stefan <stefan.p...@teradata.com>
wrote:

> Hi,
>
>
> I want  load CSV files into a Hadoop cluster. How could I do that with
> Flink?
>
> I know, I can load data into a CsvReader and then iterate over rows and
> transform them. Is there an easy way to store the results into
> HDFS+HCatalog within Flink?
>
> Thank you!
>
> Stefan Papp
> Lead Hadoop Consultant
>
> Teradata GmbH
> Mobile: +43 664 22 08 616
> stefan.p...@teradata.com<mailto:stefan.p...@teradata.com>
> teradata.com<http://www.teradata.com/>
>
> This e-mail is from Teradata Corporation and may contain information that
> is confidential or proprietary. If you are not the intended recipient, do
> not read, copy or distribute the e-mail or any attachments. Instead, please
> notify the sender and delete the e-mail and any attachments. Thank you.
> Please consider the environment before printing.
>
>

Re: Hadoop ETLing with Flink

Reply via email to