Hi :
I want to ask question about 'avro.schema.url'. I have a partitioned table
with huge number of partitions like following
CREATE TABLE episodes_partitioned
PARTITIONED BY (doctor_pt INT)
ROW FORMAT
SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS
INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
TBLPROPERTIES (
'avro.schema.url'='hdfs:///user/YOURUSER/examples/schema/twitter.avsc'
);
I found that several methods will call
AvroSerdeUtils.determineSchemaOrThrowException, if defined “'avro.schema.url',
it will call getSchemaFromFS to get schema which causes huge rpc call because
for every partition it will call getSchemaFromFS. So my question is is there
any better way to avoid this except defining avro.schema.literal in create
table sql.
Method calls AvroSerdeUtils.determineSchemaOrThrowException:
at
org.apache.hadoop.hive.serde2.avro.AvroSerdeUtils.determineSchemaOrThrowException(AvroSerdeUtils.java:109)
at
org.apache.hadoop.hive.serde2.avro.AvroSerDe.determineSchemaOrReturnErrorSchema(AvroSerDe.java:191)
at
org.apache.hadoop.hive.serde2.avro.AvroSerDe.initialize(AvroSerDe.java:110)
at
org.apache.hadoop.hive.serde2.avro.AvroSerDe.initialize(AvroSerDe.java:83)
at
org.apache.hadoop.hive.serde2.SerDeUtils.initializeSerDe(SerDeUtils.java:540)
at
org.apache.hadoop.hive.ql.plan.PartitionDesc.getDeserializer(PartitionDesc.java:184)
at
org.apache.hadoop.hive.ql.exec.MapOperator.getConvertedOI(MapOperator.java:295)
at
org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:423)
at
org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:106)
at
sun.reflect.NativeMethodAccessorImpl.invoke0(NativeMethodAccessorImpl.java:-1)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
a
AvroSerdeUtils#determineSchemaOrThrowException:
public static Schema determineSchemaOrThrowException(Configuration conf,
Properties properties)
throws IOException, AvroSerdeException {
…..
try {
Schema s = getSchemaFromFS(schemaString, conf); // if define
avro.schema.url, need to get SchemaFrom hdfs
if (s == null) {
//in case schema is not a file system
return AvroSerdeUtils.getSchemaFor(new URL(schemaString));
}
return s;
} catch (IOException ioe) {
throw new AvroSerdeException("Unable to read schema from given path: " +
schemaString, ioe);
} catch (URISyntaxException urie) {
throw new AvroSerdeException("Unable to read schema from given path: " +
schemaString, urie);
}
…..
}
Can anyone can help view the avro schema problem, thanks!
Best Regards
ZhangLiyun/Kelly Zhang