Hello, Kelly Zhang.

I have had to struggle with this as well. This should have been fixed as
part of HIVE-14792. This should be available in Hive 3.x and the head of
branch-2 (but not the 2.3 release :/).  What version are you seeing this
problem on?

If 3.x, one should be able to enable the optimization via " set
hive.optimize.update.table.properties.from.serde=true; ".
If 2.x, one might need to port HIVE-14792 over. (This should be an easy
port.)

Mithun

On Sun, Aug 19, 2018 at 2:51 PM Zhang, Liyun <lzhan...@ebay.com> wrote:

> Hi :
>
> I want to ask question about 'avro.schema.url'.  I have a partitioned
> table with huge number of partitions like following
>
>
>
>
>
> CREATE TABLE episodes_partitioned
>
> PARTITIONED BY (doctor_pt INT)
>
> ROW FORMAT
>
> SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
>
> STORED AS
>
> INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
>
> OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
> TBLPROPERTIES (
> 'avro.schema.url'='hdfs:///user/YOURUSER/examples/schema/twitter.avsc'
> );
>
>
>
>
>
>
>    I found that several methods will call
> AvroSerdeUtils.determineSchemaOrThrowException, if defined
> “'avro.schema.url', it will call getSchemaFromFS to get schema which causes
> huge rpc call because for every partition it will call  getSchemaFromFS.
> So my question is  is there any better way to avoid this except defining
> avro.schema.literal in create table sql.
>
>
> Method calls AvroSerdeUtils.determineSchemaOrThrowException:
>
>  at
> org.apache.hadoop.hive.serde2.avro.AvroSerdeUtils.determineSchemaOrThrowException(AvroSerdeUtils.java:109)
>                  at
> org.apache.hadoop.hive.serde2.avro.AvroSerDe.determineSchemaOrReturnErrorSchema(AvroSerDe.java:191)
>                  at
> org.apache.hadoop.hive.serde2.avro.AvroSerDe.initialize(AvroSerDe.java:110)
>                  at
> org.apache.hadoop.hive.serde2.avro.AvroSerDe.initialize(AvroSerDe.java:83)
>                  at
> org.apache.hadoop.hive.serde2.SerDeUtils.initializeSerDe(SerDeUtils.java:540)
>                  at
> org.apache.hadoop.hive.ql.plan.PartitionDesc.getDeserializer(PartitionDesc.java:184)
>                  at
> org.apache.hadoop.hive.ql.exec.MapOperator.getConvertedOI(MapOperator.java:295)
>                  at
> org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:423)
>                  at
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:106)
>                  at
> sun.reflect.NativeMethodAccessorImpl.invoke0(NativeMethodAccessorImpl.java:-1)
>                  at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>                  a
>
> AvroSerdeUtils#determineSchemaOrThrowException:
>
>
>   public static Schema determineSchemaOrThrowException(Configuration conf,
> Properties properties)
>         throws IOException, AvroSerdeException {
>   …..
>
>   try {
>     Schema s = getSchemaFromFS(schemaString, conf);  // if define
> avro.schema.url, need to get SchemaFrom hdfs
>     if (s == null) {
>       //in case schema is not a file system
>       return AvroSerdeUtils.getSchemaFor(new URL(schemaString));
>     }
>     return s;
>   } catch (IOException ioe) {
>     throw new AvroSerdeException("Unable to read schema from given path: "
> + schemaString, ioe);
>   } catch (URISyntaxException urie) {
>     throw new AvroSerdeException("Unable to read schema from given path: "
> + schemaString, urie);
>   }
>
>
>
> …..
> }
>
>
>
> Can anyone can help view the avro schema problem, thanks!
>
> Best Regards
> ZhangLiyun/Kelly Zhang
>
>

Reply via email to