[jira] [Updated] (HIVE-16290) Stats: StatsRulesProcFactory::evaluateComparator estimates are wrong when minValue == filterValue

2017-04-03 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-16290:

Fix Version/s: (was: 2.3.0)
   3.0.0

> Stats: StatsRulesProcFactory::evaluateComparator estimates are wrong when 
> minValue == filterValue
> -
>
> Key: HIVE-16290
> URL: https://issues.apache.org/jira/browse/HIVE-16290
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: HIVE-16290.1.patch, HIVE-16290.2.patch
>
>
> Issue: 
> =
> In {{StatsRulesProcFactory::evaluateCompator}}, when {{minValue}} is >= 
> filtered {{value}}, it should return all rows. Currently, it returns 
> {{numRows/3}}. This causes lesser number of reducers to be spun up in 
> queries. E.g Q79 in TPC-DS.
> E.g: TPC-DS store table stats:
> =
> {noformat}
> hive --orcfiledump 
> hdfs://nn:8020/apps/hive/warehouse/tpcds_bin_partitioned_orc_1000.db/store/00_0
> Stripe Statistics:
>   Stripe 1:
> Column 0: count: 1002 hasNull: false
> Column 1: count: 1002 hasNull: false min: 1 max: 1002 sum: 502503
> Column 2: count: 1002 hasNull: false min: AABA max: 
> PPBA sum: 16032
> Column 3: count: 1002 hasNull: false min:  max: 2001-03-13 sum: 9950
> Column 4: count: 1002 hasNull: false min:  max: 2001-03-12 sum: 5010
> Column 5: count: 273 hasNull: true min: 2450820 max: 2451313 sum: 
> 669141525
> Column 6: count: 1002 hasNull: false min:  max: pri sum: 3916
> Column 7: count: 994 hasNull: true min: 200 max: 300 sum: 249970
> Column 8: count: 996 hasNull: true min: 5002549 max: 9997773 sum: 
> 7382689071
> Column 9: count: 1002 hasNull: false min:  max: 8AM-8AM sum: 7088
> select compute_stats(s_employee_count, 16) from store;
> {"columntype":"Long","min":200,"max":300,"countnulls":8,"numdistinctvalues":63,"ndvbitvector":"{0,
>  1, 2, 3, 4, 5, 11, 12}{0, 1, 2, 3, 6}{0, 1, 2, 3, 4, 5, 7, 11}{0, 1, 2, 3, 
> 4, 5, 7}{0, 1, 2, 3, 4, 5, 6}{0, 1, 2, 3, 4, 5, 8}{0, 1, 2, 3, 4}{0, 1, 2, 3, 
> 4, 5, 7, 9}{0, 1, 2, 3, 4}{0}{0, 1, 2, 3, 4, 5, 7}{0, 1, 2, 3, 4, 5, 6, 7}{0, 
> 1, 2, 3, 4, 8, 9, 14}{0, 1, 2, 3, 5}{0, 1, 2, 3, 4, 5, 6, 7}{0, 1, 2, 3, 4, 
> 5, 6, 8}"}
> {noformat}
> {noformat}
> explain select count(s_store_sk) from store where s_number_employees > 200 
> and s_number_employees < 295;
> {noformat}
> Above query would first apply 1002/3 = 334 for {{s_number_employees > 200}} 
> and then 334 / 3 = 111 for {{s_number_employees < 295}}. Ideally it should 
> return all 1002 rows for filter {{s_number_employees > 200}}.
> In TPC-DS Q79, this causes too less reduce tasks to be spun up causing 
> runtime delays.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16290) Stats: StatsRulesProcFactory::evaluateComparator estimates are wrong when minValue == filterValue

2017-04-02 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-16290:

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.3.0
   Status: Resolved  (was: Patch Available)

Thanks [~gopalv]. Committed to master.

> Stats: StatsRulesProcFactory::evaluateComparator estimates are wrong when 
> minValue == filterValue
> -
>
> Key: HIVE-16290
> URL: https://issues.apache.org/jira/browse/HIVE-16290
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Fix For: 2.3.0
>
> Attachments: HIVE-16290.1.patch, HIVE-16290.2.patch
>
>
> Issue: 
> =
> In {{StatsRulesProcFactory::evaluateCompator}}, when {{minValue}} is >= 
> filtered {{value}}, it should return all rows. Currently, it returns 
> {{numRows/3}}. This causes lesser number of reducers to be spun up in 
> queries. E.g Q79 in TPC-DS.
> E.g: TPC-DS store table stats:
> =
> {noformat}
> hive --orcfiledump 
> hdfs://nn:8020/apps/hive/warehouse/tpcds_bin_partitioned_orc_1000.db/store/00_0
> Stripe Statistics:
>   Stripe 1:
> Column 0: count: 1002 hasNull: false
> Column 1: count: 1002 hasNull: false min: 1 max: 1002 sum: 502503
> Column 2: count: 1002 hasNull: false min: AABA max: 
> PPBA sum: 16032
> Column 3: count: 1002 hasNull: false min:  max: 2001-03-13 sum: 9950
> Column 4: count: 1002 hasNull: false min:  max: 2001-03-12 sum: 5010
> Column 5: count: 273 hasNull: true min: 2450820 max: 2451313 sum: 
> 669141525
> Column 6: count: 1002 hasNull: false min:  max: pri sum: 3916
> Column 7: count: 994 hasNull: true min: 200 max: 300 sum: 249970
> Column 8: count: 996 hasNull: true min: 5002549 max: 9997773 sum: 
> 7382689071
> Column 9: count: 1002 hasNull: false min:  max: 8AM-8AM sum: 7088
> select compute_stats(s_employee_count, 16) from store;
> {"columntype":"Long","min":200,"max":300,"countnulls":8,"numdistinctvalues":63,"ndvbitvector":"{0,
>  1, 2, 3, 4, 5, 11, 12}{0, 1, 2, 3, 6}{0, 1, 2, 3, 4, 5, 7, 11}{0, 1, 2, 3, 
> 4, 5, 7}{0, 1, 2, 3, 4, 5, 6}{0, 1, 2, 3, 4, 5, 8}{0, 1, 2, 3, 4}{0, 1, 2, 3, 
> 4, 5, 7, 9}{0, 1, 2, 3, 4}{0}{0, 1, 2, 3, 4, 5, 7}{0, 1, 2, 3, 4, 5, 6, 7}{0, 
> 1, 2, 3, 4, 8, 9, 14}{0, 1, 2, 3, 5}{0, 1, 2, 3, 4, 5, 6, 7}{0, 1, 2, 3, 4, 
> 5, 6, 8}"}
> {noformat}
> {noformat}
> explain select count(s_store_sk) from store where s_number_employees > 200 
> and s_number_employees < 295;
> {noformat}
> Above query would first apply 1002/3 = 334 for {{s_number_employees > 200}} 
> and then 334 / 3 = 111 for {{s_number_employees < 295}}. Ideally it should 
> return all 1002 rows for filter {{s_number_employees > 200}}.
> In TPC-DS Q79, this causes too less reduce tasks to be spun up causing 
> runtime delays.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16290) Stats: StatsRulesProcFactory::evaluateComparator estimates are wrong when minValue == filterValue

2017-03-25 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-16290:
---
Attachment: HIVE-16290.2.patch

> Stats: StatsRulesProcFactory::evaluateComparator estimates are wrong when 
> minValue == filterValue
> -
>
> Key: HIVE-16290
> URL: https://issues.apache.org/jira/browse/HIVE-16290
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Rajesh Balamohan
>Assignee: Gopal V
>Priority: Minor
> Attachments: HIVE-16290.1.patch, HIVE-16290.2.patch
>
>
> Issue: 
> =
> In {{StatsRulesProcFactory::evaluateCompator}}, when {{minValue}} is >= 
> filtered {{value}}, it should return all rows. Currently, it returns 
> {{numRows/3}}. This causes lesser number of reducers to be spun up in 
> queries. E.g Q79 in TPC-DS.
> E.g: TPC-DS store table stats:
> =
> {noformat}
> hive --orcfiledump 
> hdfs://nn:8020/apps/hive/warehouse/tpcds_bin_partitioned_orc_1000.db/store/00_0
> Stripe Statistics:
>   Stripe 1:
> Column 0: count: 1002 hasNull: false
> Column 1: count: 1002 hasNull: false min: 1 max: 1002 sum: 502503
> Column 2: count: 1002 hasNull: false min: AABA max: 
> PPBA sum: 16032
> Column 3: count: 1002 hasNull: false min:  max: 2001-03-13 sum: 9950
> Column 4: count: 1002 hasNull: false min:  max: 2001-03-12 sum: 5010
> Column 5: count: 273 hasNull: true min: 2450820 max: 2451313 sum: 
> 669141525
> Column 6: count: 1002 hasNull: false min:  max: pri sum: 3916
> Column 7: count: 994 hasNull: true min: 200 max: 300 sum: 249970
> Column 8: count: 996 hasNull: true min: 5002549 max: 9997773 sum: 
> 7382689071
> Column 9: count: 1002 hasNull: false min:  max: 8AM-8AM sum: 7088
> select compute_stats(s_employee_count, 16) from store;
> {"columntype":"Long","min":200,"max":300,"countnulls":8,"numdistinctvalues":63,"ndvbitvector":"{0,
>  1, 2, 3, 4, 5, 11, 12}{0, 1, 2, 3, 6}{0, 1, 2, 3, 4, 5, 7, 11}{0, 1, 2, 3, 
> 4, 5, 7}{0, 1, 2, 3, 4, 5, 6}{0, 1, 2, 3, 4, 5, 8}{0, 1, 2, 3, 4}{0, 1, 2, 3, 
> 4, 5, 7, 9}{0, 1, 2, 3, 4}{0}{0, 1, 2, 3, 4, 5, 7}{0, 1, 2, 3, 4, 5, 6, 7}{0, 
> 1, 2, 3, 4, 8, 9, 14}{0, 1, 2, 3, 5}{0, 1, 2, 3, 4, 5, 6, 7}{0, 1, 2, 3, 4, 
> 5, 6, 8}"}
> {noformat}
> {noformat}
> explain select count(s_store_sk) from store where s_number_employees > 200 
> and s_number_employees < 295;
> {noformat}
> Above query would first apply 1002/3 = 334 for {{s_number_employees > 200}} 
> and then 334 / 3 = 111 for {{s_number_employees < 295}}. Ideally it should 
> return all 1002 rows for filter {{s_number_employees > 200}}.
> In TPC-DS Q79, this causes too less reduce tasks to be spun up causing 
> runtime delays.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16290) Stats: StatsRulesProcFactory::evaluateComparator estimates are wrong when minValue == filterValue

2017-03-24 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-16290:

Status: Patch Available  (was: Open)

> Stats: StatsRulesProcFactory::evaluateComparator estimates are wrong when 
> minValue == filterValue
> -
>
> Key: HIVE-16290
> URL: https://issues.apache.org/jira/browse/HIVE-16290
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-16290.1.patch
>
>
> Issue: 
> =
> In {{StatsRulesProcFactory::evaluateCompator}}, when {{minValue}} is >= 
> filtered {{value}}, it should return all rows. Currently, it returns 
> {{numRows/3}}. This causes lesser number of reducers to be spun up in 
> queries. E.g Q79 in TPC-DS.
> E.g: TPC-DS store table stats:
> =
> {noformat}
> hive --orcfiledump 
> hdfs://nn:8020/apps/hive/warehouse/tpcds_bin_partitioned_orc_1000.db/store/00_0
> Stripe Statistics:
>   Stripe 1:
> Column 0: count: 1002 hasNull: false
> Column 1: count: 1002 hasNull: false min: 1 max: 1002 sum: 502503
> Column 2: count: 1002 hasNull: false min: AABA max: 
> PPBA sum: 16032
> Column 3: count: 1002 hasNull: false min:  max: 2001-03-13 sum: 9950
> Column 4: count: 1002 hasNull: false min:  max: 2001-03-12 sum: 5010
> Column 5: count: 273 hasNull: true min: 2450820 max: 2451313 sum: 
> 669141525
> Column 6: count: 1002 hasNull: false min:  max: pri sum: 3916
> Column 7: count: 994 hasNull: true min: 200 max: 300 sum: 249970
> Column 8: count: 996 hasNull: true min: 5002549 max: 9997773 sum: 
> 7382689071
> Column 9: count: 1002 hasNull: false min:  max: 8AM-8AM sum: 7088
> select compute_stats(s_employee_count, 16) from store;
> {"columntype":"Long","min":200,"max":300,"countnulls":8,"numdistinctvalues":63,"ndvbitvector":"{0,
>  1, 2, 3, 4, 5, 11, 12}{0, 1, 2, 3, 6}{0, 1, 2, 3, 4, 5, 7, 11}{0, 1, 2, 3, 
> 4, 5, 7}{0, 1, 2, 3, 4, 5, 6}{0, 1, 2, 3, 4, 5, 8}{0, 1, 2, 3, 4}{0, 1, 2, 3, 
> 4, 5, 7, 9}{0, 1, 2, 3, 4}{0}{0, 1, 2, 3, 4, 5, 7}{0, 1, 2, 3, 4, 5, 6, 7}{0, 
> 1, 2, 3, 4, 8, 9, 14}{0, 1, 2, 3, 5}{0, 1, 2, 3, 4, 5, 6, 7}{0, 1, 2, 3, 4, 
> 5, 6, 8}"}
> {noformat}
> {noformat}
> explain select count(s_store_sk) from store where s_number_employees > 200 
> and s_number_employees < 295;
> {noformat}
> Above query would first apply 1002/3 = 334 for {{s_number_employees > 200}} 
> and then 334 / 3 = 111 for {{s_number_employees < 295}}. Ideally it should 
> return all 1002 rows for filter {{s_number_employees > 200}}.
> In TPC-DS Q79, this causes too less reduce tasks to be spun up causing 
> runtime delays.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16290) Stats: StatsRulesProcFactory::evaluateComparator estimates are wrong when minValue == filterValue

2017-03-24 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-16290:

Description: 
Issue: 
=
In {{StatsRulesProcFactory::evaluateCompator}}, when {{minValue}} is >= 
filtered {{value}}, it should return all rows. Currently, it returns 
{{numRows/3}}. This causes lesser number of reducers to be spun up in queries. 
E.g Q79 in TPC-DS.


E.g: TPC-DS store table stats:
=
{noformat}
hive --orcfiledump 
hdfs://nn:8020/apps/hive/warehouse/tpcds_bin_partitioned_orc_1000.db/store/00_0
Stripe Statistics:
  Stripe 1:
Column 0: count: 1002 hasNull: false
Column 1: count: 1002 hasNull: false min: 1 max: 1002 sum: 502503
Column 2: count: 1002 hasNull: false min: AABA max: 
PPBA sum: 16032
Column 3: count: 1002 hasNull: false min:  max: 2001-03-13 sum: 9950
Column 4: count: 1002 hasNull: false min:  max: 2001-03-12 sum: 5010
Column 5: count: 273 hasNull: true min: 2450820 max: 2451313 sum: 669141525
Column 6: count: 1002 hasNull: false min:  max: pri sum: 3916
Column 7: count: 994 hasNull: true min: 200 max: 300 sum: 249970
Column 8: count: 996 hasNull: true min: 5002549 max: 9997773 sum: 7382689071
Column 9: count: 1002 hasNull: false min:  max: 8AM-8AM sum: 7088

select compute_stats(s_employee_count, 16) from store;

{"columntype":"Long","min":200,"max":300,"countnulls":8,"numdistinctvalues":63,"ndvbitvector":"{0,
 1, 2, 3, 4, 5, 11, 12}{0, 1, 2, 3, 6}{0, 1, 2, 3, 4, 5, 7, 11}{0, 1, 2, 3, 4, 
5, 7}{0, 1, 2, 3, 4, 5, 6}{0, 1, 2, 3, 4, 5, 8}{0, 1, 2, 3, 4}{0, 1, 2, 3, 4, 
5, 7, 9}{0, 1, 2, 3, 4}{0}{0, 1, 2, 3, 4, 5, 7}{0, 1, 2, 3, 4, 5, 6, 7}{0, 1, 
2, 3, 4, 8, 9, 14}{0, 1, 2, 3, 5}{0, 1, 2, 3, 4, 5, 6, 7}{0, 1, 2, 3, 4, 5, 6, 
8}"}
{noformat}

{noformat}
explain select count(s_store_sk) from store where s_number_employees > 200 and 
s_number_employees < 295;
{noformat}

Above query would first apply 1002/3 = 334 for {{s_number_employees > 200}} and 
then 334 / 3 = 111 for {{s_number_employees < 295}}. Ideally it should return 
all 1002 rows for filter {{s_number_employees > 200}}.


In TPC-DS Q79, this causes too less reduce tasks to be spun up causing runtime 
delays.

  was:
Issue: 
=
In {{StatsRulesProcFactory::evaluateCompator}}, when {{minValue}} is >= 
filtered {{value}}, it should return all rows. Currently, it returns 
{{numRows/3}}. This lesser number of reducers to be spun up in queries. E.g Q79 
in TPC-DS.


E.g: TPC-DS store table stats:
=
{noformat}
hive --orcfiledump 
hdfs://nn:8020/apps/hive/warehouse/tpcds_bin_partitioned_orc_1000.db/store/00_0
Stripe Statistics:
  Stripe 1:
Column 0: count: 1002 hasNull: false
Column 1: count: 1002 hasNull: false min: 1 max: 1002 sum: 502503
Column 2: count: 1002 hasNull: false min: AABA max: 
PPBA sum: 16032
Column 3: count: 1002 hasNull: false min:  max: 2001-03-13 sum: 9950
Column 4: count: 1002 hasNull: false min:  max: 2001-03-12 sum: 5010
Column 5: count: 273 hasNull: true min: 2450820 max: 2451313 sum: 669141525
Column 6: count: 1002 hasNull: false min:  max: pri sum: 3916
Column 7: count: 994 hasNull: true min: 200 max: 300 sum: 249970
Column 8: count: 996 hasNull: true min: 5002549 max: 9997773 sum: 7382689071
Column 9: count: 1002 hasNull: false min:  max: 8AM-8AM sum: 7088

select compute_stats(s_employee_count, 16) from store;

{"columntype":"Long","min":200,"max":300,"countnulls":8,"numdistinctvalues":63,"ndvbitvector":"{0,
 1, 2, 3, 4, 5, 11, 12}{0, 1, 2, 3, 6}{0, 1, 2, 3, 4, 5, 7, 11}{0, 1, 2, 3, 4, 
5, 7}{0, 1, 2, 3, 4, 5, 6}{0, 1, 2, 3, 4, 5, 8}{0, 1, 2, 3, 4}{0, 1, 2, 3, 4, 
5, 7, 9}{0, 1, 2, 3, 4}{0}{0, 1, 2, 3, 4, 5, 7}{0, 1, 2, 3, 4, 5, 6, 7}{0, 1, 
2, 3, 4, 8, 9, 14}{0, 1, 2, 3, 5}{0, 1, 2, 3, 4, 5, 6, 7}{0, 1, 2, 3, 4, 5, 6, 
8}"}
{noformat}

{noformat}
explain select count(s_store_sk) from store where s_number_employees > 200 and 
s_number_employees < 295;
{noformat}

Above query would first apply 1002/3 = 334 for {{s_number_employees > 200}} and 
then 334 / 3 = 111 for {{s_number_employees < 295}}. Ideally it should return 
all 1002 rows for filter {{s_number_employees > 200}}.


In TPC-DS Q79, this causes too less reduce tasks to be spun up causing runtime 
delays.


> Stats: StatsRulesProcFactory::evaluateComparator estimates are wrong when 
> minValue == filterValue
> -
>
> Key: HIVE-16290
> URL: https://issues.apache.org/jira/browse/HIVE-16290
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-16290.1.patch
>
>
> Issue: 
> =
> In 

[jira] [Updated] (HIVE-16290) Stats: StatsRulesProcFactory::evaluateComparator estimates are wrong when minValue == filterValue

2017-03-24 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-16290:

Attachment: HIVE-16290.1.patch

> Stats: StatsRulesProcFactory::evaluateComparator estimates are wrong when 
> minValue == filterValue
> -
>
> Key: HIVE-16290
> URL: https://issues.apache.org/jira/browse/HIVE-16290
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-16290.1.patch
>
>
> Issue: 
> =
> In {{StatsRulesProcFactory::evaluateCompator}}, when {{minValue}} is >= 
> filtered {{value}}, it should return all rows. Currently, it returns 
> {{numRows/3}}. This lesser number of reducers to be spun up in queries. E.g 
> Q79 in TPC-DS.
> E.g: TPC-DS store table stats:
> =
> {noformat}
> hive --orcfiledump 
> hdfs://nn:8020/apps/hive/warehouse/tpcds_bin_partitioned_orc_1000.db/store/00_0
> Stripe Statistics:
>   Stripe 1:
> Column 0: count: 1002 hasNull: false
> Column 1: count: 1002 hasNull: false min: 1 max: 1002 sum: 502503
> Column 2: count: 1002 hasNull: false min: AABA max: 
> PPBA sum: 16032
> Column 3: count: 1002 hasNull: false min:  max: 2001-03-13 sum: 9950
> Column 4: count: 1002 hasNull: false min:  max: 2001-03-12 sum: 5010
> Column 5: count: 273 hasNull: true min: 2450820 max: 2451313 sum: 
> 669141525
> Column 6: count: 1002 hasNull: false min:  max: pri sum: 3916
> Column 7: count: 994 hasNull: true min: 200 max: 300 sum: 249970
> Column 8: count: 996 hasNull: true min: 5002549 max: 9997773 sum: 
> 7382689071
> Column 9: count: 1002 hasNull: false min:  max: 8AM-8AM sum: 7088
> select compute_stats(s_employee_count, 16) from store;
> {"columntype":"Long","min":200,"max":300,"countnulls":8,"numdistinctvalues":63,"ndvbitvector":"{0,
>  1, 2, 3, 4, 5, 11, 12}{0, 1, 2, 3, 6}{0, 1, 2, 3, 4, 5, 7, 11}{0, 1, 2, 3, 
> 4, 5, 7}{0, 1, 2, 3, 4, 5, 6}{0, 1, 2, 3, 4, 5, 8}{0, 1, 2, 3, 4}{0, 1, 2, 3, 
> 4, 5, 7, 9}{0, 1, 2, 3, 4}{0}{0, 1, 2, 3, 4, 5, 7}{0, 1, 2, 3, 4, 5, 6, 7}{0, 
> 1, 2, 3, 4, 8, 9, 14}{0, 1, 2, 3, 5}{0, 1, 2, 3, 4, 5, 6, 7}{0, 1, 2, 3, 4, 
> 5, 6, 8}"}
> {noformat}
> {noformat}
> explain select count(s_store_sk) from store where s_number_employees > 200 
> and s_number_employees < 295;
> {noformat}
> Above query would first apply 1002/3 = 334 for {{s_number_employees > 200}} 
> and then 334 / 3 = 111 for {{s_number_employees < 295}}. Ideally it should 
> return all 1002 rows for filter {{s_number_employees > 200}}.
> In TPC-DS Q79, this causes too less reduce tasks to be spun up causing 
> runtime delays.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)