+ Flink Users
From: Yun Tang
Sent: Monday, January 28, 2019 19:46
To: Soheil Pourbafrani
Subject: Re: How to infer table schema from Avro file
Hi Soheil
You should provide your generated Avro record class as the type of
AvroInputFormat not Avro's GenericRecord class. Take an example, if your
generated record named 'Nation', the correct way to create input should be:
AvroInputFormat test = new AvroInputFormat<>(
new Path("PathToAvroFile)
, Nation.class);
By doing this, Flink would recognize your input format as 'PojoType' not
'GenericType' which only has one field. And the field of columns would be
inferred automatically
Best
Yun Tang
From: Soheil Pourbafrani
Sent: Monday, January 28, 2019 5:54
To: user
Subject: How to infer table schema from Avro file
Hi, I load an Avro file in a Flink Dataset:
AvroInputFormat test = new AvroInputFormat(
new Path("PathToAvroFile)
, GenericRecord.class);
DataSet DS = env.createInput(test);
usersDS.print();
and here are the results of printing DS:
{"N_NATIONKEY": 14, "N_NAME": "KENYA", "N_REGIONKEY": 0, "N_COMMENT": " pending
excuses haggle furiously deposits. pending, express pinto beans wake fluffily
past t"}
{"N_NATIONKEY": 15, "N_NAME": "MOROCCO", "N_REGIONKEY": 0, "N_COMMENT": "rns.
blithely bold courts among the closely regular packages use furiously bold
platelets?"}
{"N_NATIONKEY": 16, "N_NAME": "MOZAMBIQUE", "N_REGIONKEY": 0, "N_COMMENT": "s.
ironic, unusual asymptotes wake blithely r"}
{"N_NATIONKEY": 17, "N_NAME": "PERU", "N_REGIONKEY": 1, "N_COMMENT":
"platelets. blithely pending dependencies use fluffily across the even pinto
beans. carefully silent accoun"}
{"N_NATIONKEY": 18, "N_NAME": "CHINA", "N_REGIONKEY": 2, "N_COMMENT": "c
dependencies. furiously express notornis sleep slyly regular accounts. ideas
sleep. depos"}
{"N_NATIONKEY": 19, "N_NAME": "ROMANIA", "N_REGIONKEY": 3, "N_COMMENT": "ular
asymptotes are about the furious multipliers. express dependencies nag above
the ironically ironic account"}
{"N_NATIONKEY": 20, "N_NAME": "SAUDI ARABIA", "N_REGIONKEY": 4, "N_COMMENT":
"ts. silent requests haggle. closely express packages sleep across the
blithely"}
Now I want to create a table from DS Dataset with the exactly the same schema
of Avro file, I mean columns should be N_NATIONKEY, N_NAME, N_REGIONKEY, and
N_COMMENT.
I know using the line:
tableEnv.registerDataSet("tbTest", usersDS, "field1, field2, ...");
I can create a table and set the columns, but I want the columns to be inferred
automatically from data. Is it possible?
I tried
tableEnv.registerDataSet("tbTest", DS);
but it creates a table with the schema:
root
|-- f0: GenericType