Hi : I want to ask question about 'avro.schema.url'. I have a partitioned table with huge number of partitions like following
CREATE TABLE episodes_partitioned PARTITIONED BY (doctor_pt INT) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' TBLPROPERTIES ( 'avro.schema.url'='hdfs:///user/YOURUSER/examples/schema/twitter.avsc' ); I found that several methods will call AvroSerdeUtils.determineSchemaOrThrowException, if defined “'avro.schema.url', it will call getSchemaFromFS to get schema which causes huge rpc call because for every partition it will call getSchemaFromFS. So my question is is there any better way to avoid this except defining avro.schema.literal in create table sql. Method calls AvroSerdeUtils.determineSchemaOrThrowException: at org.apache.hadoop.hive.serde2.avro.AvroSerdeUtils.determineSchemaOrThrowException(AvroSerdeUtils.java:109) at org.apache.hadoop.hive.serde2.avro.AvroSerDe.determineSchemaOrReturnErrorSchema(AvroSerDe.java:191) at org.apache.hadoop.hive.serde2.avro.AvroSerDe.initialize(AvroSerDe.java:110) at org.apache.hadoop.hive.serde2.avro.AvroSerDe.initialize(AvroSerDe.java:83) at org.apache.hadoop.hive.serde2.SerDeUtils.initializeSerDe(SerDeUtils.java:540) at org.apache.hadoop.hive.ql.plan.PartitionDesc.getDeserializer(PartitionDesc.java:184) at org.apache.hadoop.hive.ql.exec.MapOperator.getConvertedOI(MapOperator.java:295) at org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:423) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:106) at sun.reflect.NativeMethodAccessorImpl.invoke0(NativeMethodAccessorImpl.java:-1) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) a AvroSerdeUtils#determineSchemaOrThrowException: public static Schema determineSchemaOrThrowException(Configuration conf, Properties properties) throws IOException, AvroSerdeException { ….. try { Schema s = getSchemaFromFS(schemaString, conf); // if define avro.schema.url, need to get SchemaFrom hdfs if (s == null) { //in case schema is not a file system return AvroSerdeUtils.getSchemaFor(new URL(schemaString)); } return s; } catch (IOException ioe) { throw new AvroSerdeException("Unable to read schema from given path: " + schemaString, ioe); } catch (URISyntaxException urie) { throw new AvroSerdeException("Unable to read schema from given path: " + schemaString, urie); } ….. } Can anyone can help view the avro schema problem, thanks! Best Regards ZhangLiyun/Kelly Zhang