[jira] [Commented] (IMPALA-8534) Enable data cache by default for end-to-end containerised tests

2019-07-29 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16895752#comment-16895752
 ] 

ASF subversion and git services commented on IMPALA-8534:
-

Commit 88da6fd421a9449d372de77aae61a33197f4d3c2 in impala's branch 
refs/heads/master from Tim Armstrong
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=88da6fd ]

IMPALA-8534: data cache for dockerised tests

This adds support for the data cache in dockerised clusters in
start-impala-cluster.py. It is handled similarly to the
log directories - we ensure that a separate data cache
directory is created for each container, then mount
it at /opt/impala/cache inside the container.

This is then enabled by default for the dockerised tests.

Testing:
Did a dockerised test run.

Change-Id: I2c75d4a5c1eea7a540d051bb175537163dec0e29
Reviewed-on: http://gerrit.cloudera.org:8080/13934
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Enable data cache by default for end-to-end containerised tests
> ---
>
> Key: IMPALA-8534
> URL: https://issues.apache.org/jira/browse/IMPALA-8534
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Tim Armstrong
>Priority: Major
>
> Following on from IMPALA-8121, I don't think we can enable the data cache by 
> default, since it depends on what volumes are available to the container at 
> runtime. But we should definitely enable it for tests.
> [~kwho] said 
> {quote}When I tested with the data cache enabled in a mini-cluster with 3 
> node using the default scale of workload, I ran with 500 MB with 1 partition 
> by running
>  start-impala-cluster.py --data_cache_dir=/tmp --data_cache_size=500MB
> You can also a pre-existing directory as the startup flag of Impala like
> --data_cache=/tmp/data-cache-0:500MB
> {quote}
> start-impala-cluster.py already mounts some host directories into the 
> container, so we could either do the same for the data cache, or just depend 
> on the container root filesystem (which is likely to be slow, unfortunately).



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8807) OPTIMIZE_PARTITION_KEY_SCANS works in more cases than documented

2019-07-29 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16895751#comment-16895751
 ] 

ASF subversion and git services commented on IMPALA-8807:
-

Commit b6b45c06656276edc90928c0bbb95c93e4a04f6f in impala's branch 
refs/heads/master from Tim Armstrong
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=b6b45c0 ]

IMPALA-8807: fix OPTIMIZE_PARTITION_KEY_SCANS docs

The docs were inaccurate about the cases in which the optimisation
applied. Happily, it actually works in a much wider set of cases.

Change-Id: I8909b23bfe2b90470fc559fbc01f1e3aa3caa85d
Reviewed-on: http://gerrit.cloudera.org:8080/13949
Reviewed-by: Alex Rodoni 
Tested-by: Impala Public Jenkins 


> OPTIMIZE_PARTITION_KEY_SCANS works in more cases than documented
> 
>
> Key: IMPALA-8807
> URL: https://issues.apache.org/jira/browse/IMPALA-8807
> Project: IMPALA
>  Issue Type: Bug
>  Components: Docs
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
>  Labels: docs
> Fix For: Impala 3.3.0
>
>
> This came up here 
> https://community.cloudera.com/t5/Support-Questions/Avoiding-hdfs-scan-when-querying-only-partition-columns/m-p/93337#M57192%3Feid=1=1
> Our docs say
> {quote}
> This optimization does not apply if the queries contain any WHERE, GROUP BY, 
> or HAVING clause. The relevant queries should only compute the minimum, 
> maximum, or number of distinct values for the partition key columns across 
> the whole table.
> {quote}
> This is false. Here's  query illustrating it working with all three things:
> {noformat}
> [localhost:21000] default> set optimize_partition_key_scans=true; explain 
> select max(ss_sold_date_sk) from tpcds_parquet.store_sales where 
> ss_sold_date_sk % 10 = 0 group by ss_sold_date_sk having max(ss_sold_date_sk) 
> > 1000;
> OPTIMIZE_PARTITION_KEY_SCANS set to true
> Query: explain select max(ss_sold_date_sk) from tpcds_parquet.store_sales 
> where ss_sold_date_sk % 10 = 0 group by ss_sold_date_sk having 
> max(ss_sold_date_sk) > 1000
> ++
> | Explain String |
> ++
> | Max Per-Host Resource Reservation: Memory=1.94MB Threads=1 |
> | Per-Host Resource Estimates: Memory=10MB   |
> | Codegen disabled by planner|
> ||
> | PLAN-ROOT SINK |
> | |  |
> | 01:AGGREGATE [FINALIZE]|
> | |  output: max(ss_sold_date_sk)|
> | |  group by: ss_sold_date_sk   |
> | |  having: max(ss_sold_date_sk) > 1000 |
> | |  row-size=8B cardinality=182 |
> | |  |
> | 00:UNION   |
> |constant-operands=182   |
> |row-size=4B cardinality=182 |
> ++
> Fetched 15 row(s) in 0.11s
> {noformat}
> We should reword this to be correct.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-8807) OPTIMIZE_PARTITION_KEY_SCANS works in more cases than documented

2019-07-29 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-8807.
---
   Resolution: Fixed
Fix Version/s: Impala 3.3.0

> OPTIMIZE_PARTITION_KEY_SCANS works in more cases than documented
> 
>
> Key: IMPALA-8807
> URL: https://issues.apache.org/jira/browse/IMPALA-8807
> Project: IMPALA
>  Issue Type: Bug
>  Components: Docs
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
>  Labels: docs
> Fix For: Impala 3.3.0
>
>
> This came up here 
> https://community.cloudera.com/t5/Support-Questions/Avoiding-hdfs-scan-when-querying-only-partition-columns/m-p/93337#M57192%3Feid=1=1
> Our docs say
> {quote}
> This optimization does not apply if the queries contain any WHERE, GROUP BY, 
> or HAVING clause. The relevant queries should only compute the minimum, 
> maximum, or number of distinct values for the partition key columns across 
> the whole table.
> {quote}
> This is false. Here's  query illustrating it working with all three things:
> {noformat}
> [localhost:21000] default> set optimize_partition_key_scans=true; explain 
> select max(ss_sold_date_sk) from tpcds_parquet.store_sales where 
> ss_sold_date_sk % 10 = 0 group by ss_sold_date_sk having max(ss_sold_date_sk) 
> > 1000;
> OPTIMIZE_PARTITION_KEY_SCANS set to true
> Query: explain select max(ss_sold_date_sk) from tpcds_parquet.store_sales 
> where ss_sold_date_sk % 10 = 0 group by ss_sold_date_sk having 
> max(ss_sold_date_sk) > 1000
> ++
> | Explain String |
> ++
> | Max Per-Host Resource Reservation: Memory=1.94MB Threads=1 |
> | Per-Host Resource Estimates: Memory=10MB   |
> | Codegen disabled by planner|
> ||
> | PLAN-ROOT SINK |
> | |  |
> | 01:AGGREGATE [FINALIZE]|
> | |  output: max(ss_sold_date_sk)|
> | |  group by: ss_sold_date_sk   |
> | |  having: max(ss_sold_date_sk) > 1000 |
> | |  row-size=8B cardinality=182 |
> | |  |
> | 00:UNION   |
> |constant-operands=182   |
> |row-size=4B cardinality=182 |
> ++
> Fetched 15 row(s) in 0.11s
> {noformat}
> We should reword this to be correct.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8807) OPTIMIZE_PARTITION_KEY_SCANS works in more cases than documented

2019-07-29 Thread Tim Armstrong (JIRA)
Tim Armstrong created IMPALA-8807:
-

 Summary: OPTIMIZE_PARTITION_KEY_SCANS works in more cases than 
documented
 Key: IMPALA-8807
 URL: https://issues.apache.org/jira/browse/IMPALA-8807
 Project: IMPALA
  Issue Type: Bug
  Components: Docs
Reporter: Tim Armstrong
Assignee: Tim Armstrong


This came up here 
https://community.cloudera.com/t5/Support-Questions/Avoiding-hdfs-scan-when-querying-only-partition-columns/m-p/93337#M57192%3Feid=1=1

Our docs say

{quote}
This optimization does not apply if the queries contain any WHERE, GROUP BY, or 
HAVING clause. The relevant queries should only compute the minimum, maximum, 
or number of distinct values for the partition key columns across the whole 
table.
{quote}

This is false. Here's  query illustrating it working with all three things:
{noformat}
[localhost:21000] default> set optimize_partition_key_scans=true; explain 
select max(ss_sold_date_sk) from tpcds_parquet.store_sales where 
ss_sold_date_sk % 10 = 0 group by ss_sold_date_sk having max(ss_sold_date_sk) > 
1000;
OPTIMIZE_PARTITION_KEY_SCANS set to true
Query: explain select max(ss_sold_date_sk) from tpcds_parquet.store_sales where 
ss_sold_date_sk % 10 = 0 group by ss_sold_date_sk having max(ss_sold_date_sk) > 
1000
++
| Explain String |
++
| Max Per-Host Resource Reservation: Memory=1.94MB Threads=1 |
| Per-Host Resource Estimates: Memory=10MB   |
| Codegen disabled by planner|
||
| PLAN-ROOT SINK |
| |  |
| 01:AGGREGATE [FINALIZE]|
| |  output: max(ss_sold_date_sk)|
| |  group by: ss_sold_date_sk   |
| |  having: max(ss_sold_date_sk) > 1000 |
| |  row-size=8B cardinality=182 |
| |  |
| 00:UNION   |
|constant-operands=182   |
|row-size=4B cardinality=182 |
++
Fetched 15 row(s) in 0.11s
{noformat}

We should reword this to be correct.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8798) TestAutoScaling does not work on erasure-coded files

2019-07-29 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16895661#comment-16895661
 ] 

ASF subversion and git services commented on IMPALA-8798:
-

Commit 8099911fd7886e02cdb727ddebc4ace19f2201bc in impala's branch 
refs/heads/master from Lars Volker
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=8099911 ]

IMPALA-8802: Switch to pgrep for graceful shutdown helper

Some places discourage the use of pidof and favor pgrep instead. This
change switches usage to the latter in the graceful shutdown helper
introduced in IMPALA-8798.

Change-Id: Iaa8cc7112002a98c42b4dcfbe30b99ae0cfadf83
Reviewed-on: http://gerrit.cloudera.org:8080/13945
Reviewed-by: Tim Armstrong 
Tested-by: Impala Public Jenkins 


> TestAutoScaling does not work on erasure-coded files
> 
>
> Key: IMPALA-8798
> URL: https://issues.apache.org/jira/browse/IMPALA-8798
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 3.3.0
>Reporter: Lars Volker
>Assignee: Lars Volker
>Priority: Critical
>  Labels: scalability
>
> TestAutoScaling uses the ConcurrentWorkload class, which does not set the 
> required query option to support scanning erasure-coded files. We should 
> disable the test for those cases.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8802) Switch to pgrep for graceful_shutdown_backends.sh

2019-07-29 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16895660#comment-16895660
 ] 

ASF subversion and git services commented on IMPALA-8802:
-

Commit 8099911fd7886e02cdb727ddebc4ace19f2201bc in impala's branch 
refs/heads/master from Lars Volker
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=8099911 ]

IMPALA-8802: Switch to pgrep for graceful shutdown helper

Some places discourage the use of pidof and favor pgrep instead. This
change switches usage to the latter in the graceful shutdown helper
introduced in IMPALA-8798.

Change-Id: Iaa8cc7112002a98c42b4dcfbe30b99ae0cfadf83
Reviewed-on: http://gerrit.cloudera.org:8080/13945
Reviewed-by: Tim Armstrong 
Tested-by: Impala Public Jenkins 


> Switch to pgrep for graceful_shutdown_backends.sh
> -
>
> Key: IMPALA-8802
> URL: https://issues.apache.org/jira/browse/IMPALA-8802
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 3.3.0
>Reporter: Lars Volker
>Assignee: Lars Volker
>Priority: Major
>
> IMPALA-8798 added a script with a call to {{pidof}}. However, {{pgrep}} seems 
> generally preferred (https://mywiki.wooledge.org/BadUtils) and we should 
> switch to it.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8549) Add support for scanning DEFLATE text files

2019-07-29 Thread Ethan (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan updated IMPALA-8549:
--
Description: 
Several Hadoop tools (e.g. Hive, MapReduce, etc.) support reading and writing 
text files stored using zlib / deflate (results in files such as 
{{00_0.deflate}}). Impala currently does not support reading {{.deflate}} 
text files and returns errors such as: {{ERROR: Scanner plugin 'DEFLATE' is not 
one of the enabled plugins: 'LZO'}}.

Moreover, the default compression codec in Hadoop is zlib / deflate (see 
{{o.a.h.io.compress.DefaultCodec}}). So when writing to a text table in Hive, 
if users set {{hive.exec.compress.output}} to true, then {{.deflate}} files 
will be written by default.

Impala does support zlib / deflate with other file formats though: Avro, 
RCFiles, SequenceFiles (see 
[https://impala.apache.org/docs/build/html/topics/impala_file_formats.html]).

Currently, the frontend assigns a compression type to a file depending on its 
extension. For instance, the functional_text_def database is stored as a file 
with a .deflate extension and is assigned the compression type DEFLATE. The 
HdfsTextScanner class receives this value and uses it directly to create a 
decompressor. The functional_\{avro,seq,rc}_databases are stored as files 
without extensions, so the frontend interprets their compression type as NONE. 
However, in the backend, each of their corresponding scanners implement custom 
logic of their own to read file headers and override the existing NONE 
compression type assigned to files with new values, such as DEFAULT or DEFLATE, 
so that they appropriate decompressor can be instantiated.

  was:
Several Hadoop tools (e.g. Hive, MapReduce, etc.) support reading and writing 
text files stored using zlib / deflate (results in files such as 
{{00_0.deflate}}). Impala currently does not support reading {{.deflate}} 
text files and returns errors such as: {{ERROR: Scanner plugin 'DEFLATE' is not 
one of the enabled plugins: 'LZO'}}.

Moreover, the default compression codec in Hadoop is zlib / deflate (see 
{{o.a.h.io.compress.DefaultCodec}}). So when writing to a text table in Hive, 
if users set {{hive.exec.compress.output}} to true, then {{.deflate}} files 
will be written by default.

Impala does support zlib / deflate with other file formats though: Avro, 
RCFiles, SequenceFiles (see 
[https://impala.apache.org/docs/build/html/topics/impala_file_formats.html]).

Currently, the frontend assigns a compression type to a file depending on its 
extension. For instance, the functional_text_def database is stored as a file 
with a .deflate extension and is assigned the compression type DEFLATE. The 
HdfsTextScanner class receives this value and uses it directly to create a 
decompressor. The functional_\{avro,seq,rc}_databases are stored as files 
without extensions, so the frontend interprets their compression type as NONE. 
However, in the backend, each of their corresponding scanners implement custom 
logic of their own to read file headers and override the existing NONE 
compression type assigned to files with new values, such as DEFAULT or DEFLATE.


> Add support for scanning DEFLATE text files
> ---
>
> Key: IMPALA-8549
> URL: https://issues.apache.org/jira/browse/IMPALA-8549
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Ethan
>Priority: Minor
>  Labels: ramp-up
>
> Several Hadoop tools (e.g. Hive, MapReduce, etc.) support reading and writing 
> text files stored using zlib / deflate (results in files such as 
> {{00_0.deflate}}). Impala currently does not support reading {{.deflate}} 
> text files and returns errors such as: {{ERROR: Scanner plugin 'DEFLATE' is 
> not one of the enabled plugins: 'LZO'}}.
> Moreover, the default compression codec in Hadoop is zlib / deflate (see 
> {{o.a.h.io.compress.DefaultCodec}}). So when writing to a text table in Hive, 
> if users set {{hive.exec.compress.output}} to true, then {{.deflate}} files 
> will be written by default.
> Impala does support zlib / deflate with other file formats though: Avro, 
> RCFiles, SequenceFiles (see 
> [https://impala.apache.org/docs/build/html/topics/impala_file_formats.html]).
> Currently, the frontend assigns a compression type to a file depending on its 
> extension. For instance, the functional_text_def database is stored as a file 
> with a .deflate extension and is assigned the compression type DEFLATE. The 
> HdfsTextScanner class receives this value and uses it directly to create a 
> decompressor. The functional_\{avro,seq,rc}_databases are stored as files 
> without extensions, so the frontend interprets their compression type as 
> NONE. However, in the backend, each of their corresponding 

[jira] [Updated] (IMPALA-8549) Add support for scanning DEFLATE text files

2019-07-29 Thread Ethan (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan updated IMPALA-8549:
--
Description: 
Several Hadoop tools (e.g. Hive, MapReduce, etc.) support reading and writing 
text files stored using zlib / deflate (results in files such as 
{{00_0.deflate}}). Impala currently does not support reading {{.deflate}} 
text files and returns errors such as: {{ERROR: Scanner plugin 'DEFLATE' is not 
one of the enabled plugins: 'LZO'}}.

Moreover, the default compression codec in Hadoop is zlib / deflate (see 
{{o.a.h.io.compress.DefaultCodec}}). So when writing to a text table in Hive, 
if users set {{hive.exec.compress.output}} to true, then {{.deflate}} files 
will be written by default.

Impala does support zlib / deflate with other file formats though: Avro, 
RCFiles, SequenceFiles (see 
[https://impala.apache.org/docs/build/html/topics/impala_file_formats.html]).

Currently, the frontend assigns a compression type to a file depending on its 
extension. For instance, the functional_text_def database is stored as a file 
with a .deflate extension and is assigned the compression type DEFLATE. The 
HdfsTextScanner class receives this value and uses it directly to create a 
decompressor. The functional_\{avro,seq,rc}_databases are stored as files 
without extensions, so the frontend interprets their compression type as NONE. 
However, in the backend, each of their corresponding scanners implement custom 
logic of their own to read file headers and override the existing NONE 
compression type assigned to files with new values, such as DEFAULT or DEFLATE.

  was:
Several Hadoop tools (e.g. Hive, MapReduce, etc.) support reading and writing 
text files stored using zlib / deflate (results in files such as 
{{00_0.deflate}}). Impala currently does not support reading {{.deflate}} 
text files and returns errors such as: {{ERROR: Scanner plugin 'DEFLATE' is not 
one of the enabled plugins: 'LZO'}}.

Moreover, the default compression codec in Hadoop is zlib / deflate (see 
{{o.a.h.io.compress.DefaultCodec}}). So when writing to a text table in Hive, 
if users set {{hive.exec.compress.output}} to true, then {{.deflate}} files 
will be written by default.

Impala does support zlib / deflate with other file formats though: Avro, 
RCFiles, SequenceFiles (see 
[https://impala.apache.org/docs/build/html/topics/impala_file_formats.html]).

Currently, the frontend assigns a compression type to a file depending on its 
extension. For instance, the functional_text_def database is stored as a file 
with a .deflate extension and is assigned the compression type DEFLATE. The 
HdfsTextScanner class receives this value and uses it directly to create a 
decompressor. The functional_\{avro,seq,rc}_databases are stored as files 
without extensions, so the frontend interprets their compression type as NONE. 
However, in the backend, each of their corresponding scanners implement custom 
logic of their own to read file headers and override the existing NONE 
compression type of files to new values, such as DEFAULT or DEFLATE.


> Add support for scanning DEFLATE text files
> ---
>
> Key: IMPALA-8549
> URL: https://issues.apache.org/jira/browse/IMPALA-8549
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Ethan
>Priority: Minor
>  Labels: ramp-up
>
> Several Hadoop tools (e.g. Hive, MapReduce, etc.) support reading and writing 
> text files stored using zlib / deflate (results in files such as 
> {{00_0.deflate}}). Impala currently does not support reading {{.deflate}} 
> text files and returns errors such as: {{ERROR: Scanner plugin 'DEFLATE' is 
> not one of the enabled plugins: 'LZO'}}.
> Moreover, the default compression codec in Hadoop is zlib / deflate (see 
> {{o.a.h.io.compress.DefaultCodec}}). So when writing to a text table in Hive, 
> if users set {{hive.exec.compress.output}} to true, then {{.deflate}} files 
> will be written by default.
> Impala does support zlib / deflate with other file formats though: Avro, 
> RCFiles, SequenceFiles (see 
> [https://impala.apache.org/docs/build/html/topics/impala_file_formats.html]).
> Currently, the frontend assigns a compression type to a file depending on its 
> extension. For instance, the functional_text_def database is stored as a file 
> with a .deflate extension and is assigned the compression type DEFLATE. The 
> HdfsTextScanner class receives this value and uses it directly to create a 
> decompressor. The functional_\{avro,seq,rc}_databases are stored as files 
> without extensions, so the frontend interprets their compression type as 
> NONE. However, in the backend, each of their corresponding scanners implement 
> custom logic of their own to read file headers 

[jira] [Updated] (IMPALA-8549) Add support for scanning DEFLATE text files

2019-07-29 Thread Ethan (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan updated IMPALA-8549:
--
Description: 
Several Hadoop tools (e.g. Hive, MapReduce, etc.) support reading and writing 
text files stored using zlib / deflate (results in files such as 
{{00_0.deflate}}). Impala currently does not support reading {{.deflate}} 
text files and returns errors such as: {{ERROR: Scanner plugin 'DEFLATE' is not 
one of the enabled plugins: 'LZO'}}.

Moreover, the default compression codec in Hadoop is zlib / deflate (see 
{{o.a.h.io.compress.DefaultCodec}}). So when writing to a text table in Hive, 
if users set {{hive.exec.compress.output}} to true, then {{.deflate}} files 
will be written by default.

Impala does support zlib / deflate with other file formats though: Avro, 
RCFiles, SequenceFiles (see 
[https://impala.apache.org/docs/build/html/topics/impala_file_formats.html]).

Currently, the frontend assigns a compression type to a file depending on its 
extension. For instance, the functional_text_def database is stored as a file 
with a .deflate extension and is assigned the compression type DEFLATE. The 
HdfsTextScanner class receives this value and uses it directly to create a 
decompressor. The functional_\{avro,seq,rc}_databases are stored as files 
without extensions, so the frontend interprets their compression type as NONE. 
However, in the backend, each of their corresponding scanners implement custom 
logic of their own to read file headers and override the existing NONE 
compression type of files to new values, such as DEFAULT or DEFLATE.

  was:
Several Hadoop tools (e.g. Hive, MapReduce, etc.) support reading and writing 
text files stored using zlib / deflate (results in files such as 
{{00_0.deflate}}). Impala currently does not support reading {{.deflate}} 
files and returns errors such as: {{ERROR: Scanner plugin 'DEFLATE' is not one 
of the enabled plugins: 'LZO'}}.

Moreover, the default compression codec in Hadoop is zlib / deflate (see 
{{o.a.h.io.compress.DefaultCodec}}). So when writing to a text table in Hive, 
if users set {{hive.exec.compress.output}} to true, then {{.deflate}} files 
will be written by default.

Impala does support zlib / deflate with other file formats though: Avro, 
RCFiles, SequenceFiles (see 
https://impala.apache.org/docs/build/html/topics/impala_file_formats.html).


> Add support for scanning DEFLATE text files
> ---
>
> Key: IMPALA-8549
> URL: https://issues.apache.org/jira/browse/IMPALA-8549
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Sahil Takiar
>Assignee: Ethan
>Priority: Minor
>  Labels: ramp-up
>
> Several Hadoop tools (e.g. Hive, MapReduce, etc.) support reading and writing 
> text files stored using zlib / deflate (results in files such as 
> {{00_0.deflate}}). Impala currently does not support reading {{.deflate}} 
> text files and returns errors such as: {{ERROR: Scanner plugin 'DEFLATE' is 
> not one of the enabled plugins: 'LZO'}}.
> Moreover, the default compression codec in Hadoop is zlib / deflate (see 
> {{o.a.h.io.compress.DefaultCodec}}). So when writing to a text table in Hive, 
> if users set {{hive.exec.compress.output}} to true, then {{.deflate}} files 
> will be written by default.
> Impala does support zlib / deflate with other file formats though: Avro, 
> RCFiles, SequenceFiles (see 
> [https://impala.apache.org/docs/build/html/topics/impala_file_formats.html]).
> Currently, the frontend assigns a compression type to a file depending on its 
> extension. For instance, the functional_text_def database is stored as a file 
> with a .deflate extension and is assigned the compression type DEFLATE. The 
> HdfsTextScanner class receives this value and uses it directly to create a 
> decompressor. The functional_\{avro,seq,rc}_databases are stored as files 
> without extensions, so the frontend interprets their compression type as 
> NONE. However, in the backend, each of their corresponding scanners implement 
> custom logic of their own to read file headers and override the existing NONE 
> compression type of files to new values, such as DEFAULT or DEFLATE.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Closed] (IMPALA-8510) Impala Doc: Recursively list files within transactional tables

2019-07-29 Thread Alex Rodoni (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Rodoni closed IMPALA-8510.
---
Resolution: Not A Bug

An implementation detail and not user-facing.

> Impala Doc: Recursively list files within transactional tables
> --
>
> Key: IMPALA-8510
> URL: https://issues.apache.org/jira/browse/IMPALA-8510
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Docs
>Reporter: Alex Rodoni
>Assignee: Alex Rodoni
>Priority: Major
>  Labels: future_release_doc, in_33
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Closed] (IMPALA-8489) TestRecoverPartitions.test_post_invalidate fails with IllegalStateException when HMS polling is enabled

2019-07-29 Thread Anurag Mantripragada (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anurag Mantripragada closed IMPALA-8489.

Resolution: Fixed

Committed to master.

> TestRecoverPartitions.test_post_invalidate fails with IllegalStateException 
> when HMS polling is enabled
> ---
>
> Key: IMPALA-8489
> URL: https://issues.apache.org/jira/browse/IMPALA-8489
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 3.3.0
>Reporter: Tim Armstrong
>Assignee: Anurag Mantripragada
>Priority: Critical
>  Labels: catalog-v2
>
> {noformat}
> metadata/test_recover_partitions.py:279: in test_post_invalidate
> "INSERT INTO TABLE %s PARTITION(i=002, p='p2') VALUES(4)" % FQ_TBL_NAME)
> common/impala_test_suite.py:620: in wrapper
> return function(*args, **kwargs)
> common/impala_test_suite.py:628: in execute_query_expect_success
> result = cls.__execute_query(impalad_client, query, query_options, user)
> common/impala_test_suite.py:722: in __execute_query
> return impalad_client.execute(query, user=user)
> common/impala_connection.py:180: in execute
> return self.__beeswax_client.execute(sql_stmt, user=user)
> beeswax/impala_beeswax.py:187: in execute
> handle = self.__execute_query(query_string.strip(), user=user)
> beeswax/impala_beeswax.py:364: in __execute_query
> self.wait_for_finished(handle)
> beeswax/impala_beeswax.py:385: in wait_for_finished
> raise ImpalaBeeswaxException("Query aborted:" + error_log, None)
> E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> EQuery aborted:IllegalArgumentException: no such partition id 6244
> {noformat}
> The failure is reproducible for me locally with catalog v2.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8806) Add metrics to improve observability of executor groups

2019-07-29 Thread Alex Rodoni (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16895608#comment-16895608
 ] 

Alex Rodoni commented on IMPALA-8806:
-

[~bikramjeet.vig] User-facing doc impact? For 3.3?

> Add metrics to improve observability of executor groups
> ---
>
> Key: IMPALA-8806
> URL: https://issues.apache.org/jira/browse/IMPALA-8806
> Project: IMPALA
>  Issue Type: Improvement
>Affects Versions: Impala 3.3.0
>Reporter: Bikramjeet Vig
>Assignee: Bikramjeet Vig
>Priority: Major
>  Labels: observability
>
> As a follow on IMPALA-8484, it makes sense to add some metrics to provide 
> better observability into the state of executor groups.
> Some metrics can be:
> - number of executor groups with any impalas in them
> - number of healthy executor groups
> - number of backends. Currently we have a python helper that calculates this 
> - get_num_known_live_backends, but it really should be a metric (we could 
> replace the test code with a metric if we did this).



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8806) Add metrics to improve observability of executor groups

2019-07-29 Thread Bikramjeet Vig (JIRA)
Bikramjeet Vig created IMPALA-8806:
--

 Summary: Add metrics to improve observability of executor groups
 Key: IMPALA-8806
 URL: https://issues.apache.org/jira/browse/IMPALA-8806
 Project: IMPALA
  Issue Type: Improvement
Affects Versions: Impala 3.3.0
Reporter: Bikramjeet Vig
Assignee: Bikramjeet Vig


As a follow on IMPALA-8484, it makes sense to add some metrics to provide 
better observability into the state of executor groups.

Some metrics can be:

- number of executor groups with any impalas in them
- number of healthy executor groups
- number of backends. Currently we have a python helper that calculates this - 
get_num_known_live_backends, but it really should be a metric (we could replace 
the test code with a metric if we did this).



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8799) Prometheus metrics should be prefixed with "impala_"

2019-07-29 Thread Tim Armstrong (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16895490#comment-16895490
 ] 

Tim Armstrong commented on IMPALA-8799:
---

[~arodoni_cloudera] we didn't document the original change - IMPALA-8560. I 
think it could be worth documenting the prior change since it's an integration 
point. We have a debug page /metrics_prometheus that implements the prometheus 
exposition format - 
https://prometheus.io/docs/instrumenting/exposition_formats/#text-based-format

> Prometheus metrics should be prefixed with "impala_"
> 
>
> Key: IMPALA-8799
> URL: https://issues.apache.org/jira/browse/IMPALA-8799
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
>  Labels: observability
> Fix For: Impala 3.3.0
>
>
> This is recommended by the Prometheus docs - 
> https://prometheus.io/docs/practices/naming/
> {quote}
> A metric name...
> ...must comply with the data model for valid characters.
> ...should have a (single-word) application prefix relevant to the domain 
> the metric belongs to. The prefix is sometimes referred to as namespace by 
> client libraries. For metrics specific to an application, the prefix is 
> usually the application name itself. Sometimes, however, metrics are more 
> generic, like standardized metrics exported by client libraries. Examples:
> prometheus_notifications_total (specific to the Prometheus server)
> process_cpu_seconds_total (exported by many client libraries)
> http_request_duration_seconds (for all HTTP requests)
> {quote}
> It is also awkward in tools like grafana to find impala metrics with the 
> current naming scheme.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8456) Impala Doc: Document HTTP based HS2/beeswax endpoints on coordinators

2019-07-29 Thread Alex Rodoni (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Rodoni updated IMPALA-8456:

Description: 
Also document IMPALA-8717
https://issues.apache.org/jira/browse/IMPALA-8783

  was:Also document IMPALA-8717


> Impala Doc: Document HTTP based HS2/beeswax endpoints on coordinators
> -
>
> Key: IMPALA-8456
> URL: https://issues.apache.org/jira/browse/IMPALA-8456
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Docs
>Reporter: Alex Rodoni
>Assignee: Alex Rodoni
>Priority: Major
>  Labels: future_release_doc, in_33
>
> Also document IMPALA-8717
> https://issues.apache.org/jira/browse/IMPALA-8783



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8799) Prometheus metrics should be prefixed with "impala_"

2019-07-29 Thread Alex Rodoni (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16895481#comment-16895481
 ] 

Alex Rodoni commented on IMPALA-8799:
-

[~tarmstrong] Is there a user-facing doc impact regarding this?

> Prometheus metrics should be prefixed with "impala_"
> 
>
> Key: IMPALA-8799
> URL: https://issues.apache.org/jira/browse/IMPALA-8799
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
>  Labels: observability
> Fix For: Impala 3.3.0
>
>
> This is recommended by the Prometheus docs - 
> https://prometheus.io/docs/practices/naming/
> {quote}
> A metric name...
> ...must comply with the data model for valid characters.
> ...should have a (single-word) application prefix relevant to the domain 
> the metric belongs to. The prefix is sometimes referred to as namespace by 
> client libraries. For metrics specific to an application, the prefix is 
> usually the application name itself. Sometimes, however, metrics are more 
> generic, like standardized metrics exported by client libraries. Examples:
> prometheus_notifications_total (specific to the Prometheus server)
> process_cpu_seconds_total (exported by many client libraries)
> http_request_duration_seconds (for all HTTP requests)
> {quote}
> It is also awkward in tools like grafana to find impala metrics with the 
> current naming scheme.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Issue Comment Deleted] (IMPALA-8789) Include script to trigger graceful shutdown in docker containers

2019-07-29 Thread Vihang Karajgaonkar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar updated IMPALA-8789:

Comment: was deleted

(was: Hi [~lv] I am seeing a test failure which is possibly related to this 
patch. 
https://master-02.jenkins.cloudera.com/job/impala-asf-master-core/1372/testReport/junit/custom_cluster.test_restart_services/TestGracefulShutdown/test_graceful_shutdown_script/

Can you please confirm?)

> Include script to trigger graceful shutdown in docker containers
> 
>
> Key: IMPALA-8789
> URL: https://issues.apache.org/jira/browse/IMPALA-8789
> Project: IMPALA
>  Issue Type: Improvement
>Affects Versions: Impala 3.3.0
>Reporter: Lars Volker
>Assignee: Lars Volker
>Priority: Major
>
> We should include a utility script in the docker containers to trigger a 
> graceful shutdown by sending SIGRTMIN to all impalads.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8789) Include script to trigger graceful shutdown in docker containers

2019-07-29 Thread Vihang Karajgaonkar (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16895478#comment-16895478
 ] 

Vihang Karajgaonkar commented on IMPALA-8789:
-

Hi [~lv] I am seeing a test failure which is possibly related to this patch. 
https://master-02.jenkins.cloudera.com/job/impala-asf-master-core/1372/testReport/junit/custom_cluster.test_restart_services/TestGracefulShutdown/test_graceful_shutdown_script/

Can you please confirm?

> Include script to trigger graceful shutdown in docker containers
> 
>
> Key: IMPALA-8789
> URL: https://issues.apache.org/jira/browse/IMPALA-8789
> Project: IMPALA
>  Issue Type: Improvement
>Affects Versions: Impala 3.3.0
>Reporter: Lars Volker
>Assignee: Lars Volker
>Priority: Major
>
> We should include a utility script in the docker containers to trigger a 
> graceful shutdown by sending SIGRTMIN to all impalads.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8456) Impala Doc: Document HTTP based HS2/beeswax endpoints on coordinators

2019-07-29 Thread Alex Rodoni (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Rodoni updated IMPALA-8456:

Description: Also document IMPALA-8717

> Impala Doc: Document HTTP based HS2/beeswax endpoints on coordinators
> -
>
> Key: IMPALA-8456
> URL: https://issues.apache.org/jira/browse/IMPALA-8456
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Docs
>Reporter: Alex Rodoni
>Assignee: Alex Rodoni
>Priority: Major
>  Labels: future_release_doc, in_33
>
> Also document IMPALA-8717



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8803) Coordinator should release admitted memory per-backend rather than per-query

2019-07-29 Thread Sahil Takiar (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16895456#comment-16895456
 ] 

Sahil Takiar commented on IMPALA-8803:
--

CC: [~tarmstr...@cloudera.com], [~kwho]

A few other points:
* Wondering if this is useful outside of the result spooling context; maybe 
there are queries where certain backends complete earlier than others? My first 
guess would be any Hash-Joins; the dimension table scan of a Hash-Join should 
complete earlier than the rest of the query? Assuming the dimension table scan 
fragment is running on its own backend.
* Considering the following approach for batching:
** Batching will be driven by the completion of {{BackendStates}}: the 
{{Coordinator}} will internally buffer a list of {{BackendStates}} that need to 
be released, when each backend completes, the {{Coordinator}} will check some 
number of conditions, and if those conditions are met, it will release the 
admitted memory for all buffered {{BackendStates}}
*** This is in contrast to adding a blocking queue of backends that need to be 
released, and adding a scheduled thread to periodically read from the queue and 
release the resources of any buffered backends
** Considering the following two conditions that would trigger the coordinator 
to release all of its buffered {{BackendStates}} (both would need to be true to 
trigger a call to admission control):
*** If "x" number of backends are buffered, then this condition returns true; 
"x" is initially set to num backends / 2, and it is exponentially decaying; so 
for a query running on 1000 nodes, you have an upper limit of {{log(num 
backends)}} calls to the admission controller
*** If "y" milliseconds have passed since the last time we released the 
buffered backends, this condition returns true ("y" can initially be set to 
1000); this avoids additional overhead of queries that complete relatively 
quickly, and perhaps do not take advantage of the result spooling benefits
** If a query is cancelled or hits an error, all remaining running backends 
will be released at once

> Coordinator should release admitted memory per-backend rather than per-query
> 
>
> Key: IMPALA-8803
> URL: https://issues.apache.org/jira/browse/IMPALA-8803
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
>
> When {{SPOOL_QUERY_RESULTS}} is true, the coordinator backend may be long 
> lived, even though all other backends for the query have completed. 
> Currently, the Coordinator only releases admitted memory when the entire 
> query has completed (include the coordinator fragment) - 
> https://github.com/apache/impala/blob/72c9370856d7436885adbee3e8da7e7d9336df15/be/src/runtime/coordinator.cc#L562
> In order to more aggressively return admitted memory, the coordinator should 
> release memory when each backend for a query completes, rather than waiting 
> for the entire query to complete.
> Releasing memory per backend should be batched because releasing admitted 
> memory in the admission controller requires obtaining a global lock and 
> refreshing the internal stats of the admission controller. Batching will help 
> mitigate any additional overhead from releasing admitted memory per backend.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-8783) Add Kerberos SPNEGO support to the http hs2 server

2019-07-29 Thread Thomas Tauber-Marshall (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Tauber-Marshall resolved IMPALA-8783.

   Resolution: Fixed
Fix Version/s: Impala 3.3.0

> Add Kerberos SPNEGO support to the http hs2 server
> --
>
> Key: IMPALA-8783
> URL: https://issues.apache.org/jira/browse/IMPALA-8783
> Project: IMPALA
>  Issue Type: Bug
>  Components: Clients
>Affects Versions: Impala 3.3.0
>Reporter: Thomas Tauber-Marshall
>Assignee: Thomas Tauber-Marshall
>Priority: Major
>  Labels: security
> Fix For: Impala 3.3.0
>
>
> IMPALA-8538 added support for http connections to the hs2 server along with 
> LDAP auth support. We should also add support for Kerberos auth via SPNEGO



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8804) Memory based admission control is always disabled if pool max mem is not set

2019-07-29 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-8804:
--
Affects Version/s: Impala 3.3.0
   Impala 3.1.0
   Impala 3.2.0

> Memory based admission control is always disabled if pool max mem is not set
> 
>
> Key: IMPALA-8804
> URL: https://issues.apache.org/jira/browse/IMPALA-8804
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 3.1.0, Impala 3.2.0, Impala 3.3.0
>Reporter: Tim Armstrong
>Priority: Major
>  Labels: admission-control
> Attachments: minicluster-fair-scheduler.xml, 
> minicluster-llama-site.xml
>
>
> Memory-based admission control doesn't kick in with the provided config files 
> where no max memory is configured for the pool. This is the documented 
> behaviour and not a bug - see 
> https://impala.apache.org/docs/build/html/topics/impala_admission.html. 
> However, it is inconvenient since you need to specify some max memory value 
> even if you don't want to limit the pool's share of the clusters resources 
> (or the cluster is variable in size).
> This is unfriendly. It is also confusing since there is no explicit way to 
> enable memory-based admission control.
> You can workaround by setting the pool max memory to a very high value. 
> To reproduce, start a minicluster with the provided configs. If you submit 
> multiple memory-intensive queries in parallel, they will never be queued.
> {noformat}
> start-impala-cluster.py 
> --impalad_args="-fair_scheduler_allocation_path=minicluster-fair-scheduler.xml
>  -llama_site_path=minicluster-llama-site.xml" 
> --impalad_args=-vmodule=admission-controller
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8804) Memory based admission control is always disabled if pool max mem is not set

2019-07-29 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-8804:
--
Component/s: Backend

> Memory based admission control is always disabled if pool max mem is not set
> 
>
> Key: IMPALA-8804
> URL: https://issues.apache.org/jira/browse/IMPALA-8804
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Tim Armstrong
>Priority: Major
>  Labels: admission-control
> Attachments: minicluster-fair-scheduler.xml, 
> minicluster-llama-site.xml
>
>
> Memory-based admission control doesn't kick in with the provided config files 
> where no max memory is configured for the pool. This is the documented 
> behaviour and not a bug - see 
> https://impala.apache.org/docs/build/html/topics/impala_admission.html. 
> However, it is inconvenient since you need to specify some max memory value 
> even if you don't want to limit the pool's share of the clusters resources 
> (or the cluster is variable in size).
> This is unfriendly. It is also confusing since there is no explicit way to 
> enable memory-based admission control.
> You can workaround by setting the pool max memory to a very high value. 
> To reproduce, start a minicluster with the provided configs. If you submit 
> multiple memory-intensive queries in parallel, they will never be queued.
> {noformat}
> start-impala-cluster.py 
> --impalad_args="-fair_scheduler_allocation_path=minicluster-fair-scheduler.xml
>  -llama_site_path=minicluster-llama-site.xml" 
> --impalad_args=-vmodule=admission-controller
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8804) Memory based admission control is always disabled if pool max mem is not set

2019-07-29 Thread Tim Armstrong (JIRA)
Tim Armstrong created IMPALA-8804:
-

 Summary: Memory based admission control is always disabled if pool 
max mem is not set
 Key: IMPALA-8804
 URL: https://issues.apache.org/jira/browse/IMPALA-8804
 Project: IMPALA
  Issue Type: Improvement
Reporter: Tim Armstrong
 Attachments: minicluster-fair-scheduler.xml, minicluster-llama-site.xml

Memory-based admission control doesn't kick in with the provided config files 
where no max memory is configured for the pool. This is the documented 
behaviour and not a bug - see 
https://impala.apache.org/docs/build/html/topics/impala_admission.html. 
However, it is inconvenient since you need to specify some max memory value 
even if you don't want to limit the pool's share of the clusters resources (or 
the cluster is variable in size).

This is unfriendly. It is also confusing since there is no explicit way to 
enable memory-based admission control.

You can workaround by setting the pool max memory to a very high value. 

To reproduce, start a minicluster with the provided configs. If you submit 
multiple memory-intensive queries in parallel, they will never be queued.
{noformat}
start-impala-cluster.py 
--impalad_args="-fair_scheduler_allocation_path=minicluster-fair-scheduler.xml 
-llama_site_path=minicluster-llama-site.xml" 
--impalad_args=-vmodule=admission-controller
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8803) Coordinator should release admitted memory per-backend rather than per-query

2019-07-29 Thread Sahil Takiar (JIRA)
Sahil Takiar created IMPALA-8803:


 Summary: Coordinator should release admitted memory per-backend 
rather than per-query
 Key: IMPALA-8803
 URL: https://issues.apache.org/jira/browse/IMPALA-8803
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Sahil Takiar
Assignee: Sahil Takiar


When {{SPOOL_QUERY_RESULTS}} is true, the coordinator backend may be long 
lived, even though all other backends for the query have completed. Currently, 
the Coordinator only releases admitted memory when the entire query has 
completed (include the coordinator fragment) - 
https://github.com/apache/impala/blob/72c9370856d7436885adbee3e8da7e7d9336df15/be/src/runtime/coordinator.cc#L562

In order to more aggressively return admitted memory, the coordinator should 
release memory when each backend for a query completes, rather than waiting for 
the entire query to complete.

Releasing memory per backend should be batched because releasing admitted 
memory in the admission controller requires obtaining a global lock and 
refreshing the internal stats of the admission controller. Batching will help 
mitigate any additional overhead from releasing admitted memory per backend.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8802) Switch to pgrep for graceful_shutdown_backends.sh

2019-07-29 Thread Lars Volker (JIRA)
Lars Volker created IMPALA-8802:
---

 Summary: Switch to pgrep for graceful_shutdown_backends.sh
 Key: IMPALA-8802
 URL: https://issues.apache.org/jira/browse/IMPALA-8802
 Project: IMPALA
  Issue Type: Bug
  Components: Infrastructure
Affects Versions: Impala 3.3.0
Reporter: Lars Volker
Assignee: Lars Volker


IMPALA-8798 added a script with a call to {{pidof}}. However, {{pgrep}} seems 
generally preferred (https://mywiki.wooledge.org/BadUtils) and we should switch 
to it.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8801) Add DATE type support to ORC scanner/writer

2019-07-29 Thread Attila Jeges (JIRA)
Attila Jeges created IMPALA-8801:


 Summary: Add DATE type support to ORC scanner/writer
 Key: IMPALA-8801
 URL: https://issues.apache.org/jira/browse/IMPALA-8801
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Attila Jeges






--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8800) Add DATE type support to Kudu scanner

2019-07-29 Thread Attila Jeges (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Jeges updated IMPALA-8800:
-
Summary: Add DATE type support to Kudu scanner  (was: Add support for Kudu 
DATE type)

> Add DATE type support to Kudu scanner
> -
>
> Key: IMPALA-8800
> URL: https://issues.apache.org/jira/browse/IMPALA-8800
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Attila Jeges
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-8800) Add DATE type support to Kudu scanner

2019-07-29 Thread Attila Jeges (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Jeges reassigned IMPALA-8800:


Assignee: Attila Jeges

> Add DATE type support to Kudu scanner
> -
>
> Key: IMPALA-8800
> URL: https://issues.apache.org/jira/browse/IMPALA-8800
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Attila Jeges
>Assignee: Attila Jeges
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8800) Add support for Kudu DATE type

2019-07-29 Thread Attila Jeges (JIRA)
Attila Jeges created IMPALA-8800:


 Summary: Add support for Kudu DATE type
 Key: IMPALA-8800
 URL: https://issues.apache.org/jira/browse/IMPALA-8800
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Attila Jeges






--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-8636) Implement INSERT for insert-only ACID tables

2019-07-29 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/IMPALA-8636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy resolved IMPALA-8636.
---
   Resolution: Fixed
Fix Version/s: Impala 3.3.0
   Impala 4.0

> Implement INSERT for insert-only ACID tables
> 
>
> Key: IMPALA-8636
> URL: https://issues.apache.org/jira/browse/IMPALA-8636
> Project: IMPALA
>  Issue Type: New Feature
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Critical
>  Labels: impala-acid
> Fix For: Impala 4.0, Impala 3.3.0
>
>
> Impala should support insertion for insert-only ACID tables.
> For this we need to allocate a write ID for the target table, and write the 
> data into the base/delta directories.
> INSERT operation should create a new delta directory with the allocated write 
> ID.
> INSERT OVERWRITE should create a new base directory with the allocated write 
> ID. This new base directory will only contain the data coming from this 
> operation.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-8198) Add DATE type support to Avro scanner/writer

2019-07-29 Thread Attila Jeges (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-8198 started by Attila Jeges.

> Add DATE type support to Avro scanner/writer
> 
>
> Key: IMPALA-8198
> URL: https://issues.apache.org/jira/browse/IMPALA-8198
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend, Frontend
>Reporter: Attila Jeges
>Assignee: Attila Jeges
>Priority: Major
>
> Implement Avro DATE type support.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-8679) Make the query options set in the dynamic resource pool/admission control un-overridable in the user session

2019-07-29 Thread Adriano (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adriano resolved IMPALA-8679.
-
   Resolution: Fixed
Fix Version/s: Impala 3.1.0

IMPALA-7349  is applicable here  (specifically for the memory settings).
With this feature the users cannot override the MEM_LIMIT set in the pool 
configuration causing potential query failures for memory limit exceeded).

[1] 
https://www.cloudera.com/documentation/enterprise/6/6.1/topics/impala_admission.html#admission_memory

> Make the query options set in the dynamic resource pool/admission control 
> un-overridable in the user session
> 
>
> Key: IMPALA-8679
> URL: https://issues.apache.org/jira/browse/IMPALA-8679
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Frontend
>Reporter: Adriano
>Priority: Major
> Fix For: Impala 3.1.0
>
>
> Issue description: Once the admission control is configured (with the  
> MAX_MEM_ESTIMATE_FOR_ADMISSION, MEM_LIMIT, etc,etc query options), if a user 
> bypass the default setting setting the query options in the session it can 
> cause queries failures in the pools configured (e.g decreasing 
> MAX_MEM_ESTIMATE_FOR_ADMISSION and increasing MEM_LIMIT).
> Improvement: It will be great to have a further checkboxes (with those 
> default values) like "do not allow user to override this value". 
> The value can be changed eventually by the query optimizer, but we do not 
> allow users to change MAX_MEM_ESTIMATE_FOR_ADMISSION at all. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-8799) Prometheus metrics should be prefixed with "impala_"

2019-07-29 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-8799.
---
   Resolution: Fixed
Fix Version/s: Impala 3.3.0

> Prometheus metrics should be prefixed with "impala_"
> 
>
> Key: IMPALA-8799
> URL: https://issues.apache.org/jira/browse/IMPALA-8799
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
>  Labels: observability
> Fix For: Impala 3.3.0
>
>
> This is recommended by the Prometheus docs - 
> https://prometheus.io/docs/practices/naming/
> {quote}
> A metric name...
> ...must comply with the data model for valid characters.
> ...should have a (single-word) application prefix relevant to the domain 
> the metric belongs to. The prefix is sometimes referred to as namespace by 
> client libraries. For metrics specific to an application, the prefix is 
> usually the application name itself. Sometimes, however, metrics are more 
> generic, like standardized metrics exported by client libraries. Examples:
> prometheus_notifications_total (specific to the Prometheus server)
> process_cpu_seconds_total (exported by many client libraries)
> http_request_duration_seconds (for all HTTP requests)
> {quote}
> It is also awkward in tools like grafana to find impala metrics with the 
> current naming scheme.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-8627) Re-enable catalog v2 in containers

2019-07-29 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-8627:
-

Assignee: Vihang Karajgaonkar  (was: Tim Armstrong)

> Re-enable catalog v2 in containers
> --
>
> Key: IMPALA-8627
> URL: https://issues.apache.org/jira/browse/IMPALA-8627
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Infrastructure
>Affects Versions: Impala 3.3.0
>Reporter: Tim Armstrong
>Assignee: Vihang Karajgaonkar
>Priority: Major
>  Labels: catalog-v2
> Fix For: Impala 3.3.0
>
>
> We also need to set --invalidate_tables_on_memory_pressure on the impalads 
> for that to be fully effective - the impalads send table usage info to the 
> catalogd



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org