Re: How to infer table schema from Avro file

2019-01-28 Thread Yun Tang
+ Flink Users

From: Yun Tang 
Sent: Monday, January 28, 2019 19:46
To: Soheil Pourbafrani
Subject: Re: How to infer table schema from Avro file

Hi Soheil

You should provide your generated Avro record class as the type of 
AvroInputFormat not Avro's GenericRecord class. Take an example, if your 
generated record named 'Nation', the correct way to create input should be:


AvroInputFormat test = new AvroInputFormat<>(
new Path("PathToAvroFile)
, Nation.class);

By doing this, Flink would recognize your input format as 'PojoType' not 
'GenericType' which only has one field. And the field of columns would be 
inferred automatically

Best
Yun Tang

From: Soheil Pourbafrani 
Sent: Monday, January 28, 2019 5:54
To: user
Subject: How to infer table schema from Avro file

Hi, I load an Avro file in a Flink Dataset:


AvroInputFormat test = new AvroInputFormat(
new Path("PathToAvroFile)
, GenericRecord.class);
DataSet DS = env.createInput(test);

usersDS.print();

and here are the results of printing DS:
{"N_NATIONKEY": 14, "N_NAME": "KENYA", "N_REGIONKEY": 0, "N_COMMENT": " pending 
excuses haggle furiously deposits. pending, express pinto beans wake fluffily 
past t"}
{"N_NATIONKEY": 15, "N_NAME": "MOROCCO", "N_REGIONKEY": 0, "N_COMMENT": "rns. 
blithely bold courts among the closely regular packages use furiously bold 
platelets?"}
{"N_NATIONKEY": 16, "N_NAME": "MOZAMBIQUE", "N_REGIONKEY": 0, "N_COMMENT": "s. 
ironic, unusual asymptotes wake blithely r"}
{"N_NATIONKEY": 17, "N_NAME": "PERU", "N_REGIONKEY": 1, "N_COMMENT": 
"platelets. blithely pending dependencies use fluffily across the even pinto 
beans. carefully silent accoun"}
{"N_NATIONKEY": 18, "N_NAME": "CHINA", "N_REGIONKEY": 2, "N_COMMENT": "c 
dependencies. furiously express notornis sleep slyly regular accounts. ideas 
sleep. depos"}
{"N_NATIONKEY": 19, "N_NAME": "ROMANIA", "N_REGIONKEY": 3, "N_COMMENT": "ular 
asymptotes are about the furious multipliers. express dependencies nag above 
the ironically ironic account"}
{"N_NATIONKEY": 20, "N_NAME": "SAUDI ARABIA", "N_REGIONKEY": 4, "N_COMMENT": 
"ts. silent requests haggle. closely express packages sleep across the 
blithely"}

Now I want to create a table from DS Dataset with the exactly the same schema 
of Avro file, I mean columns should be N_NATIONKEY, N_NAME, N_REGIONKEY, and 
N_COMMENT.

I know using the line:


tableEnv.registerDataSet("tbTest", usersDS, "field1, field2, ...");

I can create a table and set the columns, but I want the columns to be inferred 
automatically from data. Is it possible?
I tried

tableEnv.registerDataSet("tbTest", DS);

but it creates a table with the schema:
root
 |-- f0: GenericType


How to infer table schema from Avro file

2019-01-27 Thread Soheil Pourbafrani
Hi, I load an Avro file in a Flink Dataset:

AvroInputFormat test = new AvroInputFormat(
new Path("PathToAvroFile)
, GenericRecord.class);
DataSet DS = env.createInput(test);

usersDS.print();

and here are the results of printing DS:
{"N_NATIONKEY": 14, "N_NAME": "KENYA", "N_REGIONKEY": 0, "N_COMMENT": "
pending excuses haggle furiously deposits. pending, express pinto beans
wake fluffily past t"}
{"N_NATIONKEY": 15, "N_NAME": "MOROCCO", "N_REGIONKEY": 0, "N_COMMENT":
"rns. blithely bold courts among the closely regular packages use furiously
bold platelets?"}
{"N_NATIONKEY": 16, "N_NAME": "MOZAMBIQUE", "N_REGIONKEY": 0, "N_COMMENT":
"s. ironic, unusual asymptotes wake blithely r"}
{"N_NATIONKEY": 17, "N_NAME": "PERU", "N_REGIONKEY": 1, "N_COMMENT":
"platelets. blithely pending dependencies use fluffily across the even
pinto beans. carefully silent accoun"}
{"N_NATIONKEY": 18, "N_NAME": "CHINA", "N_REGIONKEY": 2, "N_COMMENT": "c
dependencies. furiously express notornis sleep slyly regular accounts.
ideas sleep. depos"}
{"N_NATIONKEY": 19, "N_NAME": "ROMANIA", "N_REGIONKEY": 3, "N_COMMENT":
"ular asymptotes are about the furious multipliers. express dependencies
nag above the ironically ironic account"}
{"N_NATIONKEY": 20, "N_NAME": "SAUDI ARABIA", "N_REGIONKEY": 4,
"N_COMMENT": "ts. silent requests haggle. closely express packages sleep
across the blithely"}

Now I want to create a table from DS Dataset with the exactly the same
schema of Avro file, I mean columns should be N_NATIONKEY, N_NAME,
N_REGIONKEY, and N_COMMENT.

I know using the line:

tableEnv.registerDataSet("tbTest", usersDS, "field1, field2, ...");

I can create a table and set the columns, but I want the columns to be
inferred automatically from data. Is it possible?
I tried

tableEnv.registerDataSet("tbTest", DS);

but it creates a table with the schema:
root
 |-- f0: GenericType