Hello, Kelly Zhang. I have had to struggle with this as well. This should have been fixed as part of HIVE-14792. This should be available in Hive 3.x and the head of branch-2 (but not the 2.3 release :/). What version are you seeing this problem on?
If 3.x, one should be able to enable the optimization via " set hive.optimize.update.table.properties.from.serde=true; ". If 2.x, one might need to port HIVE-14792 over. (This should be an easy port.) Mithun On Sun, Aug 19, 2018 at 2:51 PM Zhang, Liyun <lzhan...@ebay.com> wrote: > Hi : > > I want to ask question about 'avro.schema.url'. I have a partitioned > table with huge number of partitions like following > > > > > > CREATE TABLE episodes_partitioned > > PARTITIONED BY (doctor_pt INT) > > ROW FORMAT > > SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' > > STORED AS > > INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' > > OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' > TBLPROPERTIES ( > 'avro.schema.url'='hdfs:///user/YOURUSER/examples/schema/twitter.avsc' > ); > > > > > > > I found that several methods will call > AvroSerdeUtils.determineSchemaOrThrowException, if defined > “'avro.schema.url', it will call getSchemaFromFS to get schema which causes > huge rpc call because for every partition it will call getSchemaFromFS. > So my question is is there any better way to avoid this except defining > avro.schema.literal in create table sql. > > > Method calls AvroSerdeUtils.determineSchemaOrThrowException: > > at > org.apache.hadoop.hive.serde2.avro.AvroSerdeUtils.determineSchemaOrThrowException(AvroSerdeUtils.java:109) > at > org.apache.hadoop.hive.serde2.avro.AvroSerDe.determineSchemaOrReturnErrorSchema(AvroSerDe.java:191) > at > org.apache.hadoop.hive.serde2.avro.AvroSerDe.initialize(AvroSerDe.java:110) > at > org.apache.hadoop.hive.serde2.avro.AvroSerDe.initialize(AvroSerDe.java:83) > at > org.apache.hadoop.hive.serde2.SerDeUtils.initializeSerDe(SerDeUtils.java:540) > at > org.apache.hadoop.hive.ql.plan.PartitionDesc.getDeserializer(PartitionDesc.java:184) > at > org.apache.hadoop.hive.ql.exec.MapOperator.getConvertedOI(MapOperator.java:295) > at > org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:423) > at > org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:106) > at > sun.reflect.NativeMethodAccessorImpl.invoke0(NativeMethodAccessorImpl.java:-1) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > a > > AvroSerdeUtils#determineSchemaOrThrowException: > > > public static Schema determineSchemaOrThrowException(Configuration conf, > Properties properties) > throws IOException, AvroSerdeException { > ….. > > try { > Schema s = getSchemaFromFS(schemaString, conf); // if define > avro.schema.url, need to get SchemaFrom hdfs > if (s == null) { > //in case schema is not a file system > return AvroSerdeUtils.getSchemaFor(new URL(schemaString)); > } > return s; > } catch (IOException ioe) { > throw new AvroSerdeException("Unable to read schema from given path: " > + schemaString, ioe); > } catch (URISyntaxException urie) { > throw new AvroSerdeException("Unable to read schema from given path: " > + schemaString, urie); > } > > > > ….. > } > > > > Can anyone can help view the avro schema problem, thanks! > > Best Regards > ZhangLiyun/Kelly Zhang > >