[ https://issues.apache.org/jira/browse/HIVE-21940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Laszlo Bodor updated HIVE-21940: -------------------------------- Attachment: HIVE-21940.repro.patch > Metastore: Postgres text <-> clob mismatch for PARTITION_PARAMS/PARAM_VALUE > --------------------------------------------------------------------------- > > Key: HIVE-21940 > URL: https://issues.apache.org/jira/browse/HIVE-21940 > Project: Hive > Issue Type: Bug > Affects Versions: 3.2.0 > Reporter: Laszlo Bodor > Assignee: Laszlo Bodor > Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-21940.repro.patch > > > The issue is reproducible on a cluster with postgres metastore db by the > following statements: > {code} > USE default; > drop table if exists my_table; > create external table my_table (col1 int, col3 int) partitioned by (col2 > string) STORED AS TEXTFILE; > insert into my_table VALUES(11,201,"F"); > SELECT pp.* FROM sys.partition_params pp join sys.partitions p on p.part_id = > pp.part_id join sys.tbls t on t.tbl_id = p.tbl_id where t.tbl_name = > "my_table"; > {code} > sys query results in: > {code} > +-------------+------------------------+-----------------+ > | pp.part_id | pp.param_key | pp.param_value | > +-------------+------------------------+-----------------+ > | 151 | rawDataSize | 28629 | > | 151 | numRows | 28628 | > | 151 | transient_lastDdlTime | 28627 | > | 151 | COLUMN_STATS_ACCURATE | 28626 | > | 151 | numFiles | 28625 | > | 151 | totalSize | 28622 | > +-------------+------------------------+-----------------+ > {code} > Seems like (propably) since HIVE-20833/HIVE-20221 there is an inconvenience > while using PARTITION_PARAMS/PARAM_VALUE, because in postgres there is no > such type as CLOB, and metastore simply saves large object ids into this > field. More interesting is that the large object can be resolved in some > codepaths. In case of a describe for partition it works correctly: > {code} > describe formatted my_table partition (col2='F'); > ... > | Partition Parameters: | NULL > | NULL | > | | COLUMN_STATS_ACCURATE > | > {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"col1\":\"true\",\"col3\":\"true\"}} > | > | | numFiles > | 1 | > | | numRows > | 1 | > | | rawDataSize > | 6 | > | | totalSize > | 7 | > | | transient_lastDdlTime > | 1561976024 | > | | NULL > | NULL | > {code} > But in case of a direct metastore query (from hive's sys schema, but the same > result for direct postgres), it shows the result above (see sys query > output). This is an issue when hive treats these ids as they were real > values, but they are obviously not correct, and this causes various failures > (e.g. using serde parameter serialization.format=28392) > param_value values above are large object ids, according to pg_dump > | 151 | COLUMN_STATS_ACCURATE | 28626 | > {code} > SELECT pg_catalog.lo_open('28626', 131072); > SELECT pg_catalog.lowrite(0, > '\x7b2242415349435f5354415453223a2274727565222c22434f4c554d4e5f5354415453223a7b22636f6c31223a2274727565222c22636f6c33223a2274727565227d7d'); > SELECT pg_catalog.lo_close(0); > {code} > decoded large object value: > {code} > {"BASIC_STATS":"true","COLUMN_STATS":{"col1":"true","col3":"true"}} > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)