Thanks! On Sun, Apr 1, 2018 at 11:58 PM, Jeszy <[email protected]> wrote:
> It's addressed in IMPALA-5615. > > On 2018. Apr 2., Mon at 7:56, Jim Apple <[email protected]> wrote: > > > I feel like I saw a similar JIRA and patch recently. Is this addressed In > > another ticket? > > > > If not, it feels like a P2 to me: it’s not exactly incorrect, but I > expect > > it means that some calls to COMPUTE STATS would decrease query > performance > > in a very avoidable way. > > > > ---------- Forwarded message --------- > > From: H Milyakov (JIRA) <[email protected]> > > Date: Wed, Mar 7, 2018 at 4:57 AM > > Subject: [jira] [Created] (IMPALA-6620) Compute incremental stats for > > groups of partitions does not update stats correctly > > To: <[email protected]> > > > > > > H Milyakov created IMPALA-6620: > > ---------------------------------- > > > > Summary: Compute incremental stats for groups of partitions > > does not update stats correctly > > Key: IMPALA-6620 > > URL: https://issues.apache.org/jira/browse/IMPALA-6620 > > Project: IMPALA > > Issue Type: Bug > > Components: Catalog > > Affects Versions: Impala 2.8.0 > > Environment: Impala - v2.8.0-cdh5.11.1 > > We are using Hive Metastore Database embedded (by cloudera) > > It's postgres 8.4.20 > > OS: Centos > > Reporter: H Milyakov > > > > > > Executing COMPUTE INCREMENTAL STATS `table` PARTITION (`partition > clause`) > > does not compute statistics correctly (computes 0) when `partition > clause` > > matches more than one partition. > > > > Executing the same command when `partition clause` matches just a single > > partition > > results in statistics being computed correctly (non 0 and non -1). > > > > The issue was observed on our production cluster for a table with 40 000 > > partitions and 20 columns. > > I have copied the table to separate isolated cluster and observed the > same > > behaviour. > > We use Impala 2.8.0 in Cloudera CDH 5.11 > > > > The issue could be simulated with the following: > > 1. CREATE TABLE my_test_table ( some_ints BIGINT ) > > PARTITIONED BY ( part_1 BIGINT, part_2 STRING ) > > STORED AS PARQUET; > > > > 2. The only column 'some_ints' is populated so that there are 10 000 > > different partitions (part_1, part_2). > > Total number of records in the table does not matter and could be same > as > > the number of different partitions. > > > > 3. Then running the compute incremental as described above simulates the > > issue. > > > > > > Did anybody faced similar issue or does have more info on the case? > > > > > > > > -- > > This message was sent by Atlassian JIRA > > (v7.6.3#76005) > > >
