Re: [jira] [Created] (IMPALA-6620) Compute incremental stats for groups of partitions does not update stats correctly

Jim Apple Mon, 02 Apr 2018 08:59:43 -0700

Thanks!

On Sun, Apr 1, 2018 at 11:58 PM, Jeszy <[email protected]> wrote:


> It's addressed in IMPALA-5615.
>
> On 2018. Apr 2., Mon at 7:56, Jim Apple <[email protected]> wrote:
>
> > I feel like I saw a similar JIRA and patch recently. Is this addressed In
> > another ticket?
> >
> > If not, it feels like a P2 to me: it’s not exactly incorrect, but I
> expect
> > it means that some calls to COMPUTE STATS would decrease query
> performance
> > in a very avoidable way.
> >
> > ---------- Forwarded message ---------
> > From: H Milyakov (JIRA) <[email protected]>
> > Date: Wed, Mar 7, 2018 at 4:57 AM
> > Subject: [jira] [Created] (IMPALA-6620) Compute incremental stats for
> > groups of partitions does not update stats correctly
> > To: <[email protected]>
> >
> >
> > H Milyakov created IMPALA-6620:
> > ----------------------------------
> >
> >              Summary: Compute incremental stats for groups of partitions
> > does not update stats correctly
> >                  Key: IMPALA-6620
> >                  URL: https://issues.apache.org/jira/browse/IMPALA-6620
> >              Project: IMPALA
> >           Issue Type: Bug
> >           Components: Catalog
> >     Affects Versions: Impala 2.8.0
> >          Environment: Impala - v2.8.0-cdh5.11.1
> > We are using Hive Metastore Database embedded (by cloudera)
> > It's postgres 8.4.20
> > OS: Centos
> >             Reporter: H Milyakov
> >
> >
> > Executing COMPUTE INCREMENTAL STATS `table` PARTITION (`partition
> clause`)
> > does not compute statistics correctly (computes 0) when `partition
> clause`
> > matches more than one partition.
> >
> > Executing the same command when `partition clause` matches just a single
> > partition
> > results in statistics being computed correctly (non 0 and non -1).
> >
> > The issue was observed on our production cluster for a table with 40 000
> > partitions and 20 columns.
> > I have copied the table to separate isolated cluster and observed the
> same
> > behaviour.
> > We use Impala 2.8.0 in Cloudera CDH 5.11
> >
> > The issue could be simulated with the following:
> >  1. CREATE TABLE my_test_table ( some_ints BIGINT )
> >  PARTITIONED BY ( part_1 BIGINT, part_2 STRING )
> >  STORED AS PARQUET;
> >
> >  2. The only column 'some_ints' is populated so that there are 10 000
> > different partitions (part_1, part_2).
> >  Total number of records in the table does not matter and could be same
> as
> > the number of different partitions.
> >
> >  3. Then running the compute incremental as described above simulates the
> > issue.
> >
> >
> > Did anybody faced similar issue or does have more info on the case?
> >
> >
> >
> > --
> > This message was sent by Atlassian JIRA
> > (v7.6.3#76005)
> >
>

Re: [jira] [Created] (IMPALA-6620) Compute incremental stats for groups of partitions does not update stats correctly

Reply via email to