Hi Sungwoo Park,

I'm sorry for the late reply to this old email.
We are attempting to upgrade Hive MetaStore from Hive1 to Hive3, and
noticed that the response of the Hive3 MetaStore is very slow.
We suspect that HIVE-14187 might be causing this slowness.
Could you tell me if you have resolved this problem? Are there still any
problems when you enable CachedStore?

Regards,
- Takanobu

2018年6月13日(水) 0:37 Sungwoo Park <glap...@gmail.com>:

> Hello Hive users,
>
> I am experience a problem with MetaStore in Hive 3.0.
>
> 1. Start MetaStore
> with 
> hive.metastore.rawstore.impl=org.apache.hadoop.hive.metastore.ObjectStore.
>
> 2. Generate TPC-DS data.
>
> 3. TPC-DS queries run okay and produce correct results. E.g., from query 1:
> +-------------------+
> |   c_customer_id   |
> +-------------------+
> | AAAAAAAAAAAACHAA  |
> | AAAAAAAAAAAADCAA  |
> | AAAAAAAAAAAADDAA  |
> ...
> | AAAAAAAAAAAILIAA  |
> +-------------------+
> 100 rows selected (69.901 seconds)
>
> However, the query compilation takes long (
> https://issues.apache.org/jira/browse/HIVE-16520).
>
> 4. Now, restart MetaStore with
> hive.metastore.rawstore.impl=org.apache.hadoop.hive.metastore.cache.CachedStore.
>
> 5. TPC-DS queries run okay, but produce wrong results. E.g, from query 1:
> +----------------+
> | c_customer_id  |
> +----------------+
> +----------------+
> No rows selected (37.448 seconds)
>
> What I noticed is that with hive.metastore.rawstore.impl=CachedStore,
> HiveServer2 produces such log messages:
>
> 2018-06-12T23:50:04,223  WARN [b3041385-0290-492f-aef8-c0249de328ad
> HiveServer2-Handler-Pool: Thread-59] calcite.RelOptHiveTable: No Stats for
> tpcds_bin_partitioned_orc_1000@date_dim, Columns: d_date_sk, d_year
> 2018-06-12T23:50:04,223  INFO [b3041385-0290-492f-aef8-c0249de328ad
> HiveServer2-Handler-Pool: Thread-59] SessionState: No Stats for
> tpcds_bin_partitioned_orc_1000@date_dim, Columns: d_date_sk, d_year
> 2018-06-12T23:50:04,225  WARN [b3041385-0290-492f-aef8-c0249de328ad
> HiveServer2-Handler-Pool: Thread-59] calcite.RelOptHiveTable: No Stats for
> tpcds_bin_partitioned_orc_1000@store, Columns: s_state, s_store_sk
> 2018-06-12T23:50:04,225  INFO [b3041385-0290-492f-aef8-c0249de328ad
> HiveServer2-Handler-Pool: Thread-59] SessionState: No Stats for
> tpcds_bin_partitioned_orc_1000@store, Columns: s_state, s_store_sk
> 2018-06-12T23:50:04,226  WARN [b3041385-0290-492f-aef8-c0249de328ad
> HiveServer2-Handler-Pool: Thread-59] calcite.RelOptHiveTable: No Stats for
> tpcds_bin_partitioned_orc_1000@customer, Columns: c_customer_sk,
> c_customer_id
> 2018-06-12T23:50:04,226  INFO [b3041385-0290-492f-aef8-c0249de328ad
> HiveServer2-Handler-Pool: Thread-59] SessionState: No Stats for
> tpcds_bin_partitioned_orc_1000@customer, Columns: c_customer_sk,
> c_customer_id
>
> 2018-06-12T23:50:05,158 ERROR [b3041385-0290-492f-aef8-c0249de328ad
> HiveServer2-Handler-Pool: Thread-59] annotation.StatsRulesProcFactory:
> Invalid column stats: No of nulls > cardinality
> 2018-06-12T23:50:05,159 ERROR [b3041385-0290-492f-aef8-c0249de328ad
> HiveServer2-Handler-Pool: Thread-59] annotation.StatsRulesProcFactory:
> Invalid column stats: No of nulls > cardinality
> 2018-06-12T23:50:05,160 ERROR [b3041385-0290-492f-aef8-c0249de328ad
> HiveServer2-Handler-Pool: Thread-59] annotation.StatsRulesProcFactory:
> Invalid column stats: No of nulls > cardinality
>
> However, even after computing column stats, queries still return wrong
> results, despite the fact that the above log messages disappear.
>
> I guess I am missing some configuration parameters (because I imported
> hive-site.xml from Hive 2). Any suggestion would be appreciated.
>
> Thanks a lot,
>
> --- Sungwoo Park
>
>

Reply via email to