It's addressed in IMPALA-5615.

On 2018. Apr 2., Mon at 7:56, Jim Apple <[email protected]> wrote:

> I feel like I saw a similar JIRA and patch recently. Is this addressed In
> another ticket?
>
> If not, it feels like a P2 to me: it’s not exactly incorrect, but I expect
> it means that some calls to COMPUTE STATS would decrease query performance
> in a very avoidable way.
>
> ---------- Forwarded message ---------
> From: H Milyakov (JIRA) <[email protected]>
> Date: Wed, Mar 7, 2018 at 4:57 AM
> Subject: [jira] [Created] (IMPALA-6620) Compute incremental stats for
> groups of partitions does not update stats correctly
> To: <[email protected]>
>
>
> H Milyakov created IMPALA-6620:
> ----------------------------------
>
>              Summary: Compute incremental stats for groups of partitions
> does not update stats correctly
>                  Key: IMPALA-6620
>                  URL: https://issues.apache.org/jira/browse/IMPALA-6620
>              Project: IMPALA
>           Issue Type: Bug
>           Components: Catalog
>     Affects Versions: Impala 2.8.0
>          Environment: Impala - v2.8.0-cdh5.11.1
> We are using Hive Metastore Database embedded (by cloudera)
> It's postgres 8.4.20
> OS: Centos
>             Reporter: H Milyakov
>
>
> Executing COMPUTE INCREMENTAL STATS `table` PARTITION (`partition clause`)
> does not compute statistics correctly (computes 0) when `partition clause`
> matches more than one partition.
>
> Executing the same command when `partition clause` matches just a single
> partition
> results in statistics being computed correctly (non 0 and non -1).
>
> The issue was observed on our production cluster for a table with 40 000
> partitions and 20 columns.
> I have copied the table to separate isolated cluster and observed the same
> behaviour.
> We use Impala 2.8.0 in Cloudera CDH 5.11
>
> The issue could be simulated with the following:
>  1. CREATE TABLE my_test_table ( some_ints BIGINT )
>  PARTITIONED BY ( part_1 BIGINT, part_2 STRING )
>  STORED AS PARQUET;
>
>  2. The only column 'some_ints' is populated so that there are 10 000
> different partitions (part_1, part_2).
>  Total number of records in the table does not matter and could be same as
> the number of different partitions.
>
>  3. Then running the compute incremental as described above simulates the
> issue.
>
>
> Did anybody faced similar issue or does have more info on the case?
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v7.6.3#76005)
>

Reply via email to