[jira] [Commented] (IMPALA-8534) Enable data cache by default for end-to-end containerised tests
[ https://issues.apache.org/jira/browse/IMPALA-8534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16895752#comment-16895752 ] ASF subversion and git services commented on IMPALA-8534: - Commit 88da6fd421a9449d372de77aae61a33197f4d3c2 in impala's branch refs/heads/master from Tim Armstrong [ https://gitbox.apache.org/repos/asf?p=impala.git;h=88da6fd ] IMPALA-8534: data cache for dockerised tests This adds support for the data cache in dockerised clusters in start-impala-cluster.py. It is handled similarly to the log directories - we ensure that a separate data cache directory is created for each container, then mount it at /opt/impala/cache inside the container. This is then enabled by default for the dockerised tests. Testing: Did a dockerised test run. Change-Id: I2c75d4a5c1eea7a540d051bb175537163dec0e29 Reviewed-on: http://gerrit.cloudera.org:8080/13934 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Enable data cache by default for end-to-end containerised tests > --- > > Key: IMPALA-8534 > URL: https://issues.apache.org/jira/browse/IMPALA-8534 > Project: IMPALA > Issue Type: Sub-task >Reporter: Tim Armstrong >Priority: Major > > Following on from IMPALA-8121, I don't think we can enable the data cache by > default, since it depends on what volumes are available to the container at > runtime. But we should definitely enable it for tests. > [~kwho] said > {quote}When I tested with the data cache enabled in a mini-cluster with 3 > node using the default scale of workload, I ran with 500 MB with 1 partition > by running > start-impala-cluster.py --data_cache_dir=/tmp --data_cache_size=500MB > You can also a pre-existing directory as the startup flag of Impala like > --data_cache=/tmp/data-cache-0:500MB > {quote} > start-impala-cluster.py already mounts some host directories into the > container, so we could either do the same for the data cache, or just depend > on the container root filesystem (which is likely to be slow, unfortunately). -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-8807) OPTIMIZE_PARTITION_KEY_SCANS works in more cases than documented
[ https://issues.apache.org/jira/browse/IMPALA-8807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16895751#comment-16895751 ] ASF subversion and git services commented on IMPALA-8807: - Commit b6b45c06656276edc90928c0bbb95c93e4a04f6f in impala's branch refs/heads/master from Tim Armstrong [ https://gitbox.apache.org/repos/asf?p=impala.git;h=b6b45c0 ] IMPALA-8807: fix OPTIMIZE_PARTITION_KEY_SCANS docs The docs were inaccurate about the cases in which the optimisation applied. Happily, it actually works in a much wider set of cases. Change-Id: I8909b23bfe2b90470fc559fbc01f1e3aa3caa85d Reviewed-on: http://gerrit.cloudera.org:8080/13949 Reviewed-by: Alex Rodoni Tested-by: Impala Public Jenkins > OPTIMIZE_PARTITION_KEY_SCANS works in more cases than documented > > > Key: IMPALA-8807 > URL: https://issues.apache.org/jira/browse/IMPALA-8807 > Project: IMPALA > Issue Type: Bug > Components: Docs >Reporter: Tim Armstrong >Assignee: Tim Armstrong >Priority: Major > Labels: docs > Fix For: Impala 3.3.0 > > > This came up here > https://community.cloudera.com/t5/Support-Questions/Avoiding-hdfs-scan-when-querying-only-partition-columns/m-p/93337#M57192%3Feid=1=1 > Our docs say > {quote} > This optimization does not apply if the queries contain any WHERE, GROUP BY, > or HAVING clause. The relevant queries should only compute the minimum, > maximum, or number of distinct values for the partition key columns across > the whole table. > {quote} > This is false. Here's query illustrating it working with all three things: > {noformat} > [localhost:21000] default> set optimize_partition_key_scans=true; explain > select max(ss_sold_date_sk) from tpcds_parquet.store_sales where > ss_sold_date_sk % 10 = 0 group by ss_sold_date_sk having max(ss_sold_date_sk) > > 1000; > OPTIMIZE_PARTITION_KEY_SCANS set to true > Query: explain select max(ss_sold_date_sk) from tpcds_parquet.store_sales > where ss_sold_date_sk % 10 = 0 group by ss_sold_date_sk having > max(ss_sold_date_sk) > 1000 > ++ > | Explain String | > ++ > | Max Per-Host Resource Reservation: Memory=1.94MB Threads=1 | > | Per-Host Resource Estimates: Memory=10MB | > | Codegen disabled by planner| > || > | PLAN-ROOT SINK | > | | | > | 01:AGGREGATE [FINALIZE]| > | | output: max(ss_sold_date_sk)| > | | group by: ss_sold_date_sk | > | | having: max(ss_sold_date_sk) > 1000 | > | | row-size=8B cardinality=182 | > | | | > | 00:UNION | > |constant-operands=182 | > |row-size=4B cardinality=182 | > ++ > Fetched 15 row(s) in 0.11s > {noformat} > We should reword this to be correct. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-8807) OPTIMIZE_PARTITION_KEY_SCANS works in more cases than documented
[ https://issues.apache.org/jira/browse/IMPALA-8807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-8807. --- Resolution: Fixed Fix Version/s: Impala 3.3.0 > OPTIMIZE_PARTITION_KEY_SCANS works in more cases than documented > > > Key: IMPALA-8807 > URL: https://issues.apache.org/jira/browse/IMPALA-8807 > Project: IMPALA > Issue Type: Bug > Components: Docs >Reporter: Tim Armstrong >Assignee: Tim Armstrong >Priority: Major > Labels: docs > Fix For: Impala 3.3.0 > > > This came up here > https://community.cloudera.com/t5/Support-Questions/Avoiding-hdfs-scan-when-querying-only-partition-columns/m-p/93337#M57192%3Feid=1=1 > Our docs say > {quote} > This optimization does not apply if the queries contain any WHERE, GROUP BY, > or HAVING clause. The relevant queries should only compute the minimum, > maximum, or number of distinct values for the partition key columns across > the whole table. > {quote} > This is false. Here's query illustrating it working with all three things: > {noformat} > [localhost:21000] default> set optimize_partition_key_scans=true; explain > select max(ss_sold_date_sk) from tpcds_parquet.store_sales where > ss_sold_date_sk % 10 = 0 group by ss_sold_date_sk having max(ss_sold_date_sk) > > 1000; > OPTIMIZE_PARTITION_KEY_SCANS set to true > Query: explain select max(ss_sold_date_sk) from tpcds_parquet.store_sales > where ss_sold_date_sk % 10 = 0 group by ss_sold_date_sk having > max(ss_sold_date_sk) > 1000 > ++ > | Explain String | > ++ > | Max Per-Host Resource Reservation: Memory=1.94MB Threads=1 | > | Per-Host Resource Estimates: Memory=10MB | > | Codegen disabled by planner| > || > | PLAN-ROOT SINK | > | | | > | 01:AGGREGATE [FINALIZE]| > | | output: max(ss_sold_date_sk)| > | | group by: ss_sold_date_sk | > | | having: max(ss_sold_date_sk) > 1000 | > | | row-size=8B cardinality=182 | > | | | > | 00:UNION | > |constant-operands=182 | > |row-size=4B cardinality=182 | > ++ > Fetched 15 row(s) in 0.11s > {noformat} > We should reword this to be correct. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-8807) OPTIMIZE_PARTITION_KEY_SCANS works in more cases than documented
Tim Armstrong created IMPALA-8807: - Summary: OPTIMIZE_PARTITION_KEY_SCANS works in more cases than documented Key: IMPALA-8807 URL: https://issues.apache.org/jira/browse/IMPALA-8807 Project: IMPALA Issue Type: Bug Components: Docs Reporter: Tim Armstrong Assignee: Tim Armstrong This came up here https://community.cloudera.com/t5/Support-Questions/Avoiding-hdfs-scan-when-querying-only-partition-columns/m-p/93337#M57192%3Feid=1=1 Our docs say {quote} This optimization does not apply if the queries contain any WHERE, GROUP BY, or HAVING clause. The relevant queries should only compute the minimum, maximum, or number of distinct values for the partition key columns across the whole table. {quote} This is false. Here's query illustrating it working with all three things: {noformat} [localhost:21000] default> set optimize_partition_key_scans=true; explain select max(ss_sold_date_sk) from tpcds_parquet.store_sales where ss_sold_date_sk % 10 = 0 group by ss_sold_date_sk having max(ss_sold_date_sk) > 1000; OPTIMIZE_PARTITION_KEY_SCANS set to true Query: explain select max(ss_sold_date_sk) from tpcds_parquet.store_sales where ss_sold_date_sk % 10 = 0 group by ss_sold_date_sk having max(ss_sold_date_sk) > 1000 ++ | Explain String | ++ | Max Per-Host Resource Reservation: Memory=1.94MB Threads=1 | | Per-Host Resource Estimates: Memory=10MB | | Codegen disabled by planner| || | PLAN-ROOT SINK | | | | | 01:AGGREGATE [FINALIZE]| | | output: max(ss_sold_date_sk)| | | group by: ss_sold_date_sk | | | having: max(ss_sold_date_sk) > 1000 | | | row-size=8B cardinality=182 | | | | | 00:UNION | |constant-operands=182 | |row-size=4B cardinality=182 | ++ Fetched 15 row(s) in 0.11s {noformat} We should reword this to be correct. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-8798) TestAutoScaling does not work on erasure-coded files
[ https://issues.apache.org/jira/browse/IMPALA-8798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16895661#comment-16895661 ] ASF subversion and git services commented on IMPALA-8798: - Commit 8099911fd7886e02cdb727ddebc4ace19f2201bc in impala's branch refs/heads/master from Lars Volker [ https://gitbox.apache.org/repos/asf?p=impala.git;h=8099911 ] IMPALA-8802: Switch to pgrep for graceful shutdown helper Some places discourage the use of pidof and favor pgrep instead. This change switches usage to the latter in the graceful shutdown helper introduced in IMPALA-8798. Change-Id: Iaa8cc7112002a98c42b4dcfbe30b99ae0cfadf83 Reviewed-on: http://gerrit.cloudera.org:8080/13945 Reviewed-by: Tim Armstrong Tested-by: Impala Public Jenkins > TestAutoScaling does not work on erasure-coded files > > > Key: IMPALA-8798 > URL: https://issues.apache.org/jira/browse/IMPALA-8798 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure >Affects Versions: Impala 3.3.0 >Reporter: Lars Volker >Assignee: Lars Volker >Priority: Critical > Labels: scalability > > TestAutoScaling uses the ConcurrentWorkload class, which does not set the > required query option to support scanning erasure-coded files. We should > disable the test for those cases. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-8802) Switch to pgrep for graceful_shutdown_backends.sh
[ https://issues.apache.org/jira/browse/IMPALA-8802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16895660#comment-16895660 ] ASF subversion and git services commented on IMPALA-8802: - Commit 8099911fd7886e02cdb727ddebc4ace19f2201bc in impala's branch refs/heads/master from Lars Volker [ https://gitbox.apache.org/repos/asf?p=impala.git;h=8099911 ] IMPALA-8802: Switch to pgrep for graceful shutdown helper Some places discourage the use of pidof and favor pgrep instead. This change switches usage to the latter in the graceful shutdown helper introduced in IMPALA-8798. Change-Id: Iaa8cc7112002a98c42b4dcfbe30b99ae0cfadf83 Reviewed-on: http://gerrit.cloudera.org:8080/13945 Reviewed-by: Tim Armstrong Tested-by: Impala Public Jenkins > Switch to pgrep for graceful_shutdown_backends.sh > - > > Key: IMPALA-8802 > URL: https://issues.apache.org/jira/browse/IMPALA-8802 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure >Affects Versions: Impala 3.3.0 >Reporter: Lars Volker >Assignee: Lars Volker >Priority: Major > > IMPALA-8798 added a script with a call to {{pidof}}. However, {{pgrep}} seems > generally preferred (https://mywiki.wooledge.org/BadUtils) and we should > switch to it. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-8549) Add support for scanning DEFLATE text files
[ https://issues.apache.org/jira/browse/IMPALA-8549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan updated IMPALA-8549: -- Description: Several Hadoop tools (e.g. Hive, MapReduce, etc.) support reading and writing text files stored using zlib / deflate (results in files such as {{00_0.deflate}}). Impala currently does not support reading {{.deflate}} text files and returns errors such as: {{ERROR: Scanner plugin 'DEFLATE' is not one of the enabled plugins: 'LZO'}}. Moreover, the default compression codec in Hadoop is zlib / deflate (see {{o.a.h.io.compress.DefaultCodec}}). So when writing to a text table in Hive, if users set {{hive.exec.compress.output}} to true, then {{.deflate}} files will be written by default. Impala does support zlib / deflate with other file formats though: Avro, RCFiles, SequenceFiles (see [https://impala.apache.org/docs/build/html/topics/impala_file_formats.html]). Currently, the frontend assigns a compression type to a file depending on its extension. For instance, the functional_text_def database is stored as a file with a .deflate extension and is assigned the compression type DEFLATE. The HdfsTextScanner class receives this value and uses it directly to create a decompressor. The functional_\{avro,seq,rc}_databases are stored as files without extensions, so the frontend interprets their compression type as NONE. However, in the backend, each of their corresponding scanners implement custom logic of their own to read file headers and override the existing NONE compression type assigned to files with new values, such as DEFAULT or DEFLATE, so that they appropriate decompressor can be instantiated. was: Several Hadoop tools (e.g. Hive, MapReduce, etc.) support reading and writing text files stored using zlib / deflate (results in files such as {{00_0.deflate}}). Impala currently does not support reading {{.deflate}} text files and returns errors such as: {{ERROR: Scanner plugin 'DEFLATE' is not one of the enabled plugins: 'LZO'}}. Moreover, the default compression codec in Hadoop is zlib / deflate (see {{o.a.h.io.compress.DefaultCodec}}). So when writing to a text table in Hive, if users set {{hive.exec.compress.output}} to true, then {{.deflate}} files will be written by default. Impala does support zlib / deflate with other file formats though: Avro, RCFiles, SequenceFiles (see [https://impala.apache.org/docs/build/html/topics/impala_file_formats.html]). Currently, the frontend assigns a compression type to a file depending on its extension. For instance, the functional_text_def database is stored as a file with a .deflate extension and is assigned the compression type DEFLATE. The HdfsTextScanner class receives this value and uses it directly to create a decompressor. The functional_\{avro,seq,rc}_databases are stored as files without extensions, so the frontend interprets their compression type as NONE. However, in the backend, each of their corresponding scanners implement custom logic of their own to read file headers and override the existing NONE compression type assigned to files with new values, such as DEFAULT or DEFLATE. > Add support for scanning DEFLATE text files > --- > > Key: IMPALA-8549 > URL: https://issues.apache.org/jira/browse/IMPALA-8549 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Reporter: Sahil Takiar >Assignee: Ethan >Priority: Minor > Labels: ramp-up > > Several Hadoop tools (e.g. Hive, MapReduce, etc.) support reading and writing > text files stored using zlib / deflate (results in files such as > {{00_0.deflate}}). Impala currently does not support reading {{.deflate}} > text files and returns errors such as: {{ERROR: Scanner plugin 'DEFLATE' is > not one of the enabled plugins: 'LZO'}}. > Moreover, the default compression codec in Hadoop is zlib / deflate (see > {{o.a.h.io.compress.DefaultCodec}}). So when writing to a text table in Hive, > if users set {{hive.exec.compress.output}} to true, then {{.deflate}} files > will be written by default. > Impala does support zlib / deflate with other file formats though: Avro, > RCFiles, SequenceFiles (see > [https://impala.apache.org/docs/build/html/topics/impala_file_formats.html]). > Currently, the frontend assigns a compression type to a file depending on its > extension. For instance, the functional_text_def database is stored as a file > with a .deflate extension and is assigned the compression type DEFLATE. The > HdfsTextScanner class receives this value and uses it directly to create a > decompressor. The functional_\{avro,seq,rc}_databases are stored as files > without extensions, so the frontend interprets their compression type as > NONE. However, in the backend, each of their corresponding
[jira] [Updated] (IMPALA-8549) Add support for scanning DEFLATE text files
[ https://issues.apache.org/jira/browse/IMPALA-8549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan updated IMPALA-8549: -- Description: Several Hadoop tools (e.g. Hive, MapReduce, etc.) support reading and writing text files stored using zlib / deflate (results in files such as {{00_0.deflate}}). Impala currently does not support reading {{.deflate}} text files and returns errors such as: {{ERROR: Scanner plugin 'DEFLATE' is not one of the enabled plugins: 'LZO'}}. Moreover, the default compression codec in Hadoop is zlib / deflate (see {{o.a.h.io.compress.DefaultCodec}}). So when writing to a text table in Hive, if users set {{hive.exec.compress.output}} to true, then {{.deflate}} files will be written by default. Impala does support zlib / deflate with other file formats though: Avro, RCFiles, SequenceFiles (see [https://impala.apache.org/docs/build/html/topics/impala_file_formats.html]). Currently, the frontend assigns a compression type to a file depending on its extension. For instance, the functional_text_def database is stored as a file with a .deflate extension and is assigned the compression type DEFLATE. The HdfsTextScanner class receives this value and uses it directly to create a decompressor. The functional_\{avro,seq,rc}_databases are stored as files without extensions, so the frontend interprets their compression type as NONE. However, in the backend, each of their corresponding scanners implement custom logic of their own to read file headers and override the existing NONE compression type assigned to files with new values, such as DEFAULT or DEFLATE. was: Several Hadoop tools (e.g. Hive, MapReduce, etc.) support reading and writing text files stored using zlib / deflate (results in files such as {{00_0.deflate}}). Impala currently does not support reading {{.deflate}} text files and returns errors such as: {{ERROR: Scanner plugin 'DEFLATE' is not one of the enabled plugins: 'LZO'}}. Moreover, the default compression codec in Hadoop is zlib / deflate (see {{o.a.h.io.compress.DefaultCodec}}). So when writing to a text table in Hive, if users set {{hive.exec.compress.output}} to true, then {{.deflate}} files will be written by default. Impala does support zlib / deflate with other file formats though: Avro, RCFiles, SequenceFiles (see [https://impala.apache.org/docs/build/html/topics/impala_file_formats.html]). Currently, the frontend assigns a compression type to a file depending on its extension. For instance, the functional_text_def database is stored as a file with a .deflate extension and is assigned the compression type DEFLATE. The HdfsTextScanner class receives this value and uses it directly to create a decompressor. The functional_\{avro,seq,rc}_databases are stored as files without extensions, so the frontend interprets their compression type as NONE. However, in the backend, each of their corresponding scanners implement custom logic of their own to read file headers and override the existing NONE compression type of files to new values, such as DEFAULT or DEFLATE. > Add support for scanning DEFLATE text files > --- > > Key: IMPALA-8549 > URL: https://issues.apache.org/jira/browse/IMPALA-8549 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Reporter: Sahil Takiar >Assignee: Ethan >Priority: Minor > Labels: ramp-up > > Several Hadoop tools (e.g. Hive, MapReduce, etc.) support reading and writing > text files stored using zlib / deflate (results in files such as > {{00_0.deflate}}). Impala currently does not support reading {{.deflate}} > text files and returns errors such as: {{ERROR: Scanner plugin 'DEFLATE' is > not one of the enabled plugins: 'LZO'}}. > Moreover, the default compression codec in Hadoop is zlib / deflate (see > {{o.a.h.io.compress.DefaultCodec}}). So when writing to a text table in Hive, > if users set {{hive.exec.compress.output}} to true, then {{.deflate}} files > will be written by default. > Impala does support zlib / deflate with other file formats though: Avro, > RCFiles, SequenceFiles (see > [https://impala.apache.org/docs/build/html/topics/impala_file_formats.html]). > Currently, the frontend assigns a compression type to a file depending on its > extension. For instance, the functional_text_def database is stored as a file > with a .deflate extension and is assigned the compression type DEFLATE. The > HdfsTextScanner class receives this value and uses it directly to create a > decompressor. The functional_\{avro,seq,rc}_databases are stored as files > without extensions, so the frontend interprets their compression type as > NONE. However, in the backend, each of their corresponding scanners implement > custom logic of their own to read file headers
[jira] [Updated] (IMPALA-8549) Add support for scanning DEFLATE text files
[ https://issues.apache.org/jira/browse/IMPALA-8549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan updated IMPALA-8549: -- Description: Several Hadoop tools (e.g. Hive, MapReduce, etc.) support reading and writing text files stored using zlib / deflate (results in files such as {{00_0.deflate}}). Impala currently does not support reading {{.deflate}} text files and returns errors such as: {{ERROR: Scanner plugin 'DEFLATE' is not one of the enabled plugins: 'LZO'}}. Moreover, the default compression codec in Hadoop is zlib / deflate (see {{o.a.h.io.compress.DefaultCodec}}). So when writing to a text table in Hive, if users set {{hive.exec.compress.output}} to true, then {{.deflate}} files will be written by default. Impala does support zlib / deflate with other file formats though: Avro, RCFiles, SequenceFiles (see [https://impala.apache.org/docs/build/html/topics/impala_file_formats.html]). Currently, the frontend assigns a compression type to a file depending on its extension. For instance, the functional_text_def database is stored as a file with a .deflate extension and is assigned the compression type DEFLATE. The HdfsTextScanner class receives this value and uses it directly to create a decompressor. The functional_\{avro,seq,rc}_databases are stored as files without extensions, so the frontend interprets their compression type as NONE. However, in the backend, each of their corresponding scanners implement custom logic of their own to read file headers and override the existing NONE compression type of files to new values, such as DEFAULT or DEFLATE. was: Several Hadoop tools (e.g. Hive, MapReduce, etc.) support reading and writing text files stored using zlib / deflate (results in files such as {{00_0.deflate}}). Impala currently does not support reading {{.deflate}} files and returns errors such as: {{ERROR: Scanner plugin 'DEFLATE' is not one of the enabled plugins: 'LZO'}}. Moreover, the default compression codec in Hadoop is zlib / deflate (see {{o.a.h.io.compress.DefaultCodec}}). So when writing to a text table in Hive, if users set {{hive.exec.compress.output}} to true, then {{.deflate}} files will be written by default. Impala does support zlib / deflate with other file formats though: Avro, RCFiles, SequenceFiles (see https://impala.apache.org/docs/build/html/topics/impala_file_formats.html). > Add support for scanning DEFLATE text files > --- > > Key: IMPALA-8549 > URL: https://issues.apache.org/jira/browse/IMPALA-8549 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Reporter: Sahil Takiar >Assignee: Ethan >Priority: Minor > Labels: ramp-up > > Several Hadoop tools (e.g. Hive, MapReduce, etc.) support reading and writing > text files stored using zlib / deflate (results in files such as > {{00_0.deflate}}). Impala currently does not support reading {{.deflate}} > text files and returns errors such as: {{ERROR: Scanner plugin 'DEFLATE' is > not one of the enabled plugins: 'LZO'}}. > Moreover, the default compression codec in Hadoop is zlib / deflate (see > {{o.a.h.io.compress.DefaultCodec}}). So when writing to a text table in Hive, > if users set {{hive.exec.compress.output}} to true, then {{.deflate}} files > will be written by default. > Impala does support zlib / deflate with other file formats though: Avro, > RCFiles, SequenceFiles (see > [https://impala.apache.org/docs/build/html/topics/impala_file_formats.html]). > Currently, the frontend assigns a compression type to a file depending on its > extension. For instance, the functional_text_def database is stored as a file > with a .deflate extension and is assigned the compression type DEFLATE. The > HdfsTextScanner class receives this value and uses it directly to create a > decompressor. The functional_\{avro,seq,rc}_databases are stored as files > without extensions, so the frontend interprets their compression type as > NONE. However, in the backend, each of their corresponding scanners implement > custom logic of their own to read file headers and override the existing NONE > compression type of files to new values, such as DEFAULT or DEFLATE. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Closed] (IMPALA-8510) Impala Doc: Recursively list files within transactional tables
[ https://issues.apache.org/jira/browse/IMPALA-8510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Rodoni closed IMPALA-8510. --- Resolution: Not A Bug An implementation detail and not user-facing. > Impala Doc: Recursively list files within transactional tables > -- > > Key: IMPALA-8510 > URL: https://issues.apache.org/jira/browse/IMPALA-8510 > Project: IMPALA > Issue Type: Sub-task > Components: Docs >Reporter: Alex Rodoni >Assignee: Alex Rodoni >Priority: Major > Labels: future_release_doc, in_33 > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Closed] (IMPALA-8489) TestRecoverPartitions.test_post_invalidate fails with IllegalStateException when HMS polling is enabled
[ https://issues.apache.org/jira/browse/IMPALA-8489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anurag Mantripragada closed IMPALA-8489. Resolution: Fixed Committed to master. > TestRecoverPartitions.test_post_invalidate fails with IllegalStateException > when HMS polling is enabled > --- > > Key: IMPALA-8489 > URL: https://issues.apache.org/jira/browse/IMPALA-8489 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Affects Versions: Impala 3.3.0 >Reporter: Tim Armstrong >Assignee: Anurag Mantripragada >Priority: Critical > Labels: catalog-v2 > > {noformat} > metadata/test_recover_partitions.py:279: in test_post_invalidate > "INSERT INTO TABLE %s PARTITION(i=002, p='p2') VALUES(4)" % FQ_TBL_NAME) > common/impala_test_suite.py:620: in wrapper > return function(*args, **kwargs) > common/impala_test_suite.py:628: in execute_query_expect_success > result = cls.__execute_query(impalad_client, query, query_options, user) > common/impala_test_suite.py:722: in __execute_query > return impalad_client.execute(query, user=user) > common/impala_connection.py:180: in execute > return self.__beeswax_client.execute(sql_stmt, user=user) > beeswax/impala_beeswax.py:187: in execute > handle = self.__execute_query(query_string.strip(), user=user) > beeswax/impala_beeswax.py:364: in __execute_query > self.wait_for_finished(handle) > beeswax/impala_beeswax.py:385: in wait_for_finished > raise ImpalaBeeswaxException("Query aborted:" + error_log, None) > E ImpalaBeeswaxException: ImpalaBeeswaxException: > EQuery aborted:IllegalArgumentException: no such partition id 6244 > {noformat} > The failure is reproducible for me locally with catalog v2. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-8806) Add metrics to improve observability of executor groups
[ https://issues.apache.org/jira/browse/IMPALA-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16895608#comment-16895608 ] Alex Rodoni commented on IMPALA-8806: - [~bikramjeet.vig] User-facing doc impact? For 3.3? > Add metrics to improve observability of executor groups > --- > > Key: IMPALA-8806 > URL: https://issues.apache.org/jira/browse/IMPALA-8806 > Project: IMPALA > Issue Type: Improvement >Affects Versions: Impala 3.3.0 >Reporter: Bikramjeet Vig >Assignee: Bikramjeet Vig >Priority: Major > Labels: observability > > As a follow on IMPALA-8484, it makes sense to add some metrics to provide > better observability into the state of executor groups. > Some metrics can be: > - number of executor groups with any impalas in them > - number of healthy executor groups > - number of backends. Currently we have a python helper that calculates this > - get_num_known_live_backends, but it really should be a metric (we could > replace the test code with a metric if we did this). -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-8806) Add metrics to improve observability of executor groups
Bikramjeet Vig created IMPALA-8806: -- Summary: Add metrics to improve observability of executor groups Key: IMPALA-8806 URL: https://issues.apache.org/jira/browse/IMPALA-8806 Project: IMPALA Issue Type: Improvement Affects Versions: Impala 3.3.0 Reporter: Bikramjeet Vig Assignee: Bikramjeet Vig As a follow on IMPALA-8484, it makes sense to add some metrics to provide better observability into the state of executor groups. Some metrics can be: - number of executor groups with any impalas in them - number of healthy executor groups - number of backends. Currently we have a python helper that calculates this - get_num_known_live_backends, but it really should be a metric (we could replace the test code with a metric if we did this). -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-8799) Prometheus metrics should be prefixed with "impala_"
[ https://issues.apache.org/jira/browse/IMPALA-8799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16895490#comment-16895490 ] Tim Armstrong commented on IMPALA-8799: --- [~arodoni_cloudera] we didn't document the original change - IMPALA-8560. I think it could be worth documenting the prior change since it's an integration point. We have a debug page /metrics_prometheus that implements the prometheus exposition format - https://prometheus.io/docs/instrumenting/exposition_formats/#text-based-format > Prometheus metrics should be prefixed with "impala_" > > > Key: IMPALA-8799 > URL: https://issues.apache.org/jira/browse/IMPALA-8799 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Reporter: Tim Armstrong >Assignee: Tim Armstrong >Priority: Major > Labels: observability > Fix For: Impala 3.3.0 > > > This is recommended by the Prometheus docs - > https://prometheus.io/docs/practices/naming/ > {quote} > A metric name... > ...must comply with the data model for valid characters. > ...should have a (single-word) application prefix relevant to the domain > the metric belongs to. The prefix is sometimes referred to as namespace by > client libraries. For metrics specific to an application, the prefix is > usually the application name itself. Sometimes, however, metrics are more > generic, like standardized metrics exported by client libraries. Examples: > prometheus_notifications_total (specific to the Prometheus server) > process_cpu_seconds_total (exported by many client libraries) > http_request_duration_seconds (for all HTTP requests) > {quote} > It is also awkward in tools like grafana to find impala metrics with the > current naming scheme. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-8456) Impala Doc: Document HTTP based HS2/beeswax endpoints on coordinators
[ https://issues.apache.org/jira/browse/IMPALA-8456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Rodoni updated IMPALA-8456: Description: Also document IMPALA-8717 https://issues.apache.org/jira/browse/IMPALA-8783 was:Also document IMPALA-8717 > Impala Doc: Document HTTP based HS2/beeswax endpoints on coordinators > - > > Key: IMPALA-8456 > URL: https://issues.apache.org/jira/browse/IMPALA-8456 > Project: IMPALA > Issue Type: Sub-task > Components: Docs >Reporter: Alex Rodoni >Assignee: Alex Rodoni >Priority: Major > Labels: future_release_doc, in_33 > > Also document IMPALA-8717 > https://issues.apache.org/jira/browse/IMPALA-8783 -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-8799) Prometheus metrics should be prefixed with "impala_"
[ https://issues.apache.org/jira/browse/IMPALA-8799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16895481#comment-16895481 ] Alex Rodoni commented on IMPALA-8799: - [~tarmstrong] Is there a user-facing doc impact regarding this? > Prometheus metrics should be prefixed with "impala_" > > > Key: IMPALA-8799 > URL: https://issues.apache.org/jira/browse/IMPALA-8799 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Reporter: Tim Armstrong >Assignee: Tim Armstrong >Priority: Major > Labels: observability > Fix For: Impala 3.3.0 > > > This is recommended by the Prometheus docs - > https://prometheus.io/docs/practices/naming/ > {quote} > A metric name... > ...must comply with the data model for valid characters. > ...should have a (single-word) application prefix relevant to the domain > the metric belongs to. The prefix is sometimes referred to as namespace by > client libraries. For metrics specific to an application, the prefix is > usually the application name itself. Sometimes, however, metrics are more > generic, like standardized metrics exported by client libraries. Examples: > prometheus_notifications_total (specific to the Prometheus server) > process_cpu_seconds_total (exported by many client libraries) > http_request_duration_seconds (for all HTTP requests) > {quote} > It is also awkward in tools like grafana to find impala metrics with the > current naming scheme. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Issue Comment Deleted] (IMPALA-8789) Include script to trigger graceful shutdown in docker containers
[ https://issues.apache.org/jira/browse/IMPALA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vihang Karajgaonkar updated IMPALA-8789: Comment: was deleted (was: Hi [~lv] I am seeing a test failure which is possibly related to this patch. https://master-02.jenkins.cloudera.com/job/impala-asf-master-core/1372/testReport/junit/custom_cluster.test_restart_services/TestGracefulShutdown/test_graceful_shutdown_script/ Can you please confirm?) > Include script to trigger graceful shutdown in docker containers > > > Key: IMPALA-8789 > URL: https://issues.apache.org/jira/browse/IMPALA-8789 > Project: IMPALA > Issue Type: Improvement >Affects Versions: Impala 3.3.0 >Reporter: Lars Volker >Assignee: Lars Volker >Priority: Major > > We should include a utility script in the docker containers to trigger a > graceful shutdown by sending SIGRTMIN to all impalads. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-8789) Include script to trigger graceful shutdown in docker containers
[ https://issues.apache.org/jira/browse/IMPALA-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16895478#comment-16895478 ] Vihang Karajgaonkar commented on IMPALA-8789: - Hi [~lv] I am seeing a test failure which is possibly related to this patch. https://master-02.jenkins.cloudera.com/job/impala-asf-master-core/1372/testReport/junit/custom_cluster.test_restart_services/TestGracefulShutdown/test_graceful_shutdown_script/ Can you please confirm? > Include script to trigger graceful shutdown in docker containers > > > Key: IMPALA-8789 > URL: https://issues.apache.org/jira/browse/IMPALA-8789 > Project: IMPALA > Issue Type: Improvement >Affects Versions: Impala 3.3.0 >Reporter: Lars Volker >Assignee: Lars Volker >Priority: Major > > We should include a utility script in the docker containers to trigger a > graceful shutdown by sending SIGRTMIN to all impalads. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-8456) Impala Doc: Document HTTP based HS2/beeswax endpoints on coordinators
[ https://issues.apache.org/jira/browse/IMPALA-8456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Rodoni updated IMPALA-8456: Description: Also document IMPALA-8717 > Impala Doc: Document HTTP based HS2/beeswax endpoints on coordinators > - > > Key: IMPALA-8456 > URL: https://issues.apache.org/jira/browse/IMPALA-8456 > Project: IMPALA > Issue Type: Sub-task > Components: Docs >Reporter: Alex Rodoni >Assignee: Alex Rodoni >Priority: Major > Labels: future_release_doc, in_33 > > Also document IMPALA-8717 -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-8803) Coordinator should release admitted memory per-backend rather than per-query
[ https://issues.apache.org/jira/browse/IMPALA-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16895456#comment-16895456 ] Sahil Takiar commented on IMPALA-8803: -- CC: [~tarmstr...@cloudera.com], [~kwho] A few other points: * Wondering if this is useful outside of the result spooling context; maybe there are queries where certain backends complete earlier than others? My first guess would be any Hash-Joins; the dimension table scan of a Hash-Join should complete earlier than the rest of the query? Assuming the dimension table scan fragment is running on its own backend. * Considering the following approach for batching: ** Batching will be driven by the completion of {{BackendStates}}: the {{Coordinator}} will internally buffer a list of {{BackendStates}} that need to be released, when each backend completes, the {{Coordinator}} will check some number of conditions, and if those conditions are met, it will release the admitted memory for all buffered {{BackendStates}} *** This is in contrast to adding a blocking queue of backends that need to be released, and adding a scheduled thread to periodically read from the queue and release the resources of any buffered backends ** Considering the following two conditions that would trigger the coordinator to release all of its buffered {{BackendStates}} (both would need to be true to trigger a call to admission control): *** If "x" number of backends are buffered, then this condition returns true; "x" is initially set to num backends / 2, and it is exponentially decaying; so for a query running on 1000 nodes, you have an upper limit of {{log(num backends)}} calls to the admission controller *** If "y" milliseconds have passed since the last time we released the buffered backends, this condition returns true ("y" can initially be set to 1000); this avoids additional overhead of queries that complete relatively quickly, and perhaps do not take advantage of the result spooling benefits ** If a query is cancelled or hits an error, all remaining running backends will be released at once > Coordinator should release admitted memory per-backend rather than per-query > > > Key: IMPALA-8803 > URL: https://issues.apache.org/jira/browse/IMPALA-8803 > Project: IMPALA > Issue Type: Sub-task >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > > When {{SPOOL_QUERY_RESULTS}} is true, the coordinator backend may be long > lived, even though all other backends for the query have completed. > Currently, the Coordinator only releases admitted memory when the entire > query has completed (include the coordinator fragment) - > https://github.com/apache/impala/blob/72c9370856d7436885adbee3e8da7e7d9336df15/be/src/runtime/coordinator.cc#L562 > In order to more aggressively return admitted memory, the coordinator should > release memory when each backend for a query completes, rather than waiting > for the entire query to complete. > Releasing memory per backend should be batched because releasing admitted > memory in the admission controller requires obtaining a global lock and > refreshing the internal stats of the admission controller. Batching will help > mitigate any additional overhead from releasing admitted memory per backend. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-8783) Add Kerberos SPNEGO support to the http hs2 server
[ https://issues.apache.org/jira/browse/IMPALA-8783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Tauber-Marshall resolved IMPALA-8783. Resolution: Fixed Fix Version/s: Impala 3.3.0 > Add Kerberos SPNEGO support to the http hs2 server > -- > > Key: IMPALA-8783 > URL: https://issues.apache.org/jira/browse/IMPALA-8783 > Project: IMPALA > Issue Type: Bug > Components: Clients >Affects Versions: Impala 3.3.0 >Reporter: Thomas Tauber-Marshall >Assignee: Thomas Tauber-Marshall >Priority: Major > Labels: security > Fix For: Impala 3.3.0 > > > IMPALA-8538 added support for http connections to the hs2 server along with > LDAP auth support. We should also add support for Kerberos auth via SPNEGO -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-8804) Memory based admission control is always disabled if pool max mem is not set
[ https://issues.apache.org/jira/browse/IMPALA-8804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong updated IMPALA-8804: -- Affects Version/s: Impala 3.3.0 Impala 3.1.0 Impala 3.2.0 > Memory based admission control is always disabled if pool max mem is not set > > > Key: IMPALA-8804 > URL: https://issues.apache.org/jira/browse/IMPALA-8804 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 3.1.0, Impala 3.2.0, Impala 3.3.0 >Reporter: Tim Armstrong >Priority: Major > Labels: admission-control > Attachments: minicluster-fair-scheduler.xml, > minicluster-llama-site.xml > > > Memory-based admission control doesn't kick in with the provided config files > where no max memory is configured for the pool. This is the documented > behaviour and not a bug - see > https://impala.apache.org/docs/build/html/topics/impala_admission.html. > However, it is inconvenient since you need to specify some max memory value > even if you don't want to limit the pool's share of the clusters resources > (or the cluster is variable in size). > This is unfriendly. It is also confusing since there is no explicit way to > enable memory-based admission control. > You can workaround by setting the pool max memory to a very high value. > To reproduce, start a minicluster with the provided configs. If you submit > multiple memory-intensive queries in parallel, they will never be queued. > {noformat} > start-impala-cluster.py > --impalad_args="-fair_scheduler_allocation_path=minicluster-fair-scheduler.xml > -llama_site_path=minicluster-llama-site.xml" > --impalad_args=-vmodule=admission-controller > {noformat} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-8804) Memory based admission control is always disabled if pool max mem is not set
[ https://issues.apache.org/jira/browse/IMPALA-8804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong updated IMPALA-8804: -- Component/s: Backend > Memory based admission control is always disabled if pool max mem is not set > > > Key: IMPALA-8804 > URL: https://issues.apache.org/jira/browse/IMPALA-8804 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Reporter: Tim Armstrong >Priority: Major > Labels: admission-control > Attachments: minicluster-fair-scheduler.xml, > minicluster-llama-site.xml > > > Memory-based admission control doesn't kick in with the provided config files > where no max memory is configured for the pool. This is the documented > behaviour and not a bug - see > https://impala.apache.org/docs/build/html/topics/impala_admission.html. > However, it is inconvenient since you need to specify some max memory value > even if you don't want to limit the pool's share of the clusters resources > (or the cluster is variable in size). > This is unfriendly. It is also confusing since there is no explicit way to > enable memory-based admission control. > You can workaround by setting the pool max memory to a very high value. > To reproduce, start a minicluster with the provided configs. If you submit > multiple memory-intensive queries in parallel, they will never be queued. > {noformat} > start-impala-cluster.py > --impalad_args="-fair_scheduler_allocation_path=minicluster-fair-scheduler.xml > -llama_site_path=minicluster-llama-site.xml" > --impalad_args=-vmodule=admission-controller > {noformat} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-8804) Memory based admission control is always disabled if pool max mem is not set
Tim Armstrong created IMPALA-8804: - Summary: Memory based admission control is always disabled if pool max mem is not set Key: IMPALA-8804 URL: https://issues.apache.org/jira/browse/IMPALA-8804 Project: IMPALA Issue Type: Improvement Reporter: Tim Armstrong Attachments: minicluster-fair-scheduler.xml, minicluster-llama-site.xml Memory-based admission control doesn't kick in with the provided config files where no max memory is configured for the pool. This is the documented behaviour and not a bug - see https://impala.apache.org/docs/build/html/topics/impala_admission.html. However, it is inconvenient since you need to specify some max memory value even if you don't want to limit the pool's share of the clusters resources (or the cluster is variable in size). This is unfriendly. It is also confusing since there is no explicit way to enable memory-based admission control. You can workaround by setting the pool max memory to a very high value. To reproduce, start a minicluster with the provided configs. If you submit multiple memory-intensive queries in parallel, they will never be queued. {noformat} start-impala-cluster.py --impalad_args="-fair_scheduler_allocation_path=minicluster-fair-scheduler.xml -llama_site_path=minicluster-llama-site.xml" --impalad_args=-vmodule=admission-controller {noformat} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-8803) Coordinator should release admitted memory per-backend rather than per-query
Sahil Takiar created IMPALA-8803: Summary: Coordinator should release admitted memory per-backend rather than per-query Key: IMPALA-8803 URL: https://issues.apache.org/jira/browse/IMPALA-8803 Project: IMPALA Issue Type: Sub-task Reporter: Sahil Takiar Assignee: Sahil Takiar When {{SPOOL_QUERY_RESULTS}} is true, the coordinator backend may be long lived, even though all other backends for the query have completed. Currently, the Coordinator only releases admitted memory when the entire query has completed (include the coordinator fragment) - https://github.com/apache/impala/blob/72c9370856d7436885adbee3e8da7e7d9336df15/be/src/runtime/coordinator.cc#L562 In order to more aggressively return admitted memory, the coordinator should release memory when each backend for a query completes, rather than waiting for the entire query to complete. Releasing memory per backend should be batched because releasing admitted memory in the admission controller requires obtaining a global lock and refreshing the internal stats of the admission controller. Batching will help mitigate any additional overhead from releasing admitted memory per backend. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-8802) Switch to pgrep for graceful_shutdown_backends.sh
Lars Volker created IMPALA-8802: --- Summary: Switch to pgrep for graceful_shutdown_backends.sh Key: IMPALA-8802 URL: https://issues.apache.org/jira/browse/IMPALA-8802 Project: IMPALA Issue Type: Bug Components: Infrastructure Affects Versions: Impala 3.3.0 Reporter: Lars Volker Assignee: Lars Volker IMPALA-8798 added a script with a call to {{pidof}}. However, {{pgrep}} seems generally preferred (https://mywiki.wooledge.org/BadUtils) and we should switch to it. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-8801) Add DATE type support to ORC scanner/writer
Attila Jeges created IMPALA-8801: Summary: Add DATE type support to ORC scanner/writer Key: IMPALA-8801 URL: https://issues.apache.org/jira/browse/IMPALA-8801 Project: IMPALA Issue Type: Sub-task Reporter: Attila Jeges -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-8800) Add DATE type support to Kudu scanner
[ https://issues.apache.org/jira/browse/IMPALA-8800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Jeges updated IMPALA-8800: - Summary: Add DATE type support to Kudu scanner (was: Add support for Kudu DATE type) > Add DATE type support to Kudu scanner > - > > Key: IMPALA-8800 > URL: https://issues.apache.org/jira/browse/IMPALA-8800 > Project: IMPALA > Issue Type: Sub-task >Reporter: Attila Jeges >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-8800) Add DATE type support to Kudu scanner
[ https://issues.apache.org/jira/browse/IMPALA-8800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Jeges reassigned IMPALA-8800: Assignee: Attila Jeges > Add DATE type support to Kudu scanner > - > > Key: IMPALA-8800 > URL: https://issues.apache.org/jira/browse/IMPALA-8800 > Project: IMPALA > Issue Type: Sub-task >Reporter: Attila Jeges >Assignee: Attila Jeges >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-8800) Add support for Kudu DATE type
Attila Jeges created IMPALA-8800: Summary: Add support for Kudu DATE type Key: IMPALA-8800 URL: https://issues.apache.org/jira/browse/IMPALA-8800 Project: IMPALA Issue Type: Sub-task Reporter: Attila Jeges -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-8636) Implement INSERT for insert-only ACID tables
[ https://issues.apache.org/jira/browse/IMPALA-8636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltán Borók-Nagy resolved IMPALA-8636. --- Resolution: Fixed Fix Version/s: Impala 3.3.0 Impala 4.0 > Implement INSERT for insert-only ACID tables > > > Key: IMPALA-8636 > URL: https://issues.apache.org/jira/browse/IMPALA-8636 > Project: IMPALA > Issue Type: New Feature >Reporter: Zoltán Borók-Nagy >Assignee: Zoltán Borók-Nagy >Priority: Critical > Labels: impala-acid > Fix For: Impala 4.0, Impala 3.3.0 > > > Impala should support insertion for insert-only ACID tables. > For this we need to allocate a write ID for the target table, and write the > data into the base/delta directories. > INSERT operation should create a new delta directory with the allocated write > ID. > INSERT OVERWRITE should create a new base directory with the allocated write > ID. This new base directory will only contain the data coming from this > operation. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Work started] (IMPALA-8198) Add DATE type support to Avro scanner/writer
[ https://issues.apache.org/jira/browse/IMPALA-8198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on IMPALA-8198 started by Attila Jeges. > Add DATE type support to Avro scanner/writer > > > Key: IMPALA-8198 > URL: https://issues.apache.org/jira/browse/IMPALA-8198 > Project: IMPALA > Issue Type: Sub-task > Components: Backend, Frontend >Reporter: Attila Jeges >Assignee: Attila Jeges >Priority: Major > > Implement Avro DATE type support. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-8679) Make the query options set in the dynamic resource pool/admission control un-overridable in the user session
[ https://issues.apache.org/jira/browse/IMPALA-8679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adriano resolved IMPALA-8679. - Resolution: Fixed Fix Version/s: Impala 3.1.0 IMPALA-7349 is applicable here (specifically for the memory settings). With this feature the users cannot override the MEM_LIMIT set in the pool configuration causing potential query failures for memory limit exceeded). [1] https://www.cloudera.com/documentation/enterprise/6/6.1/topics/impala_admission.html#admission_memory > Make the query options set in the dynamic resource pool/admission control > un-overridable in the user session > > > Key: IMPALA-8679 > URL: https://issues.apache.org/jira/browse/IMPALA-8679 > Project: IMPALA > Issue Type: New Feature > Components: Frontend >Reporter: Adriano >Priority: Major > Fix For: Impala 3.1.0 > > > Issue description: Once the admission control is configured (with the > MAX_MEM_ESTIMATE_FOR_ADMISSION, MEM_LIMIT, etc,etc query options), if a user > bypass the default setting setting the query options in the session it can > cause queries failures in the pools configured (e.g decreasing > MAX_MEM_ESTIMATE_FOR_ADMISSION and increasing MEM_LIMIT). > Improvement: It will be great to have a further checkboxes (with those > default values) like "do not allow user to override this value". > The value can be changed eventually by the query optimizer, but we do not > allow users to change MAX_MEM_ESTIMATE_FOR_ADMISSION at all. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-8799) Prometheus metrics should be prefixed with "impala_"
[ https://issues.apache.org/jira/browse/IMPALA-8799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-8799. --- Resolution: Fixed Fix Version/s: Impala 3.3.0 > Prometheus metrics should be prefixed with "impala_" > > > Key: IMPALA-8799 > URL: https://issues.apache.org/jira/browse/IMPALA-8799 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Reporter: Tim Armstrong >Assignee: Tim Armstrong >Priority: Major > Labels: observability > Fix For: Impala 3.3.0 > > > This is recommended by the Prometheus docs - > https://prometheus.io/docs/practices/naming/ > {quote} > A metric name... > ...must comply with the data model for valid characters. > ...should have a (single-word) application prefix relevant to the domain > the metric belongs to. The prefix is sometimes referred to as namespace by > client libraries. For metrics specific to an application, the prefix is > usually the application name itself. Sometimes, however, metrics are more > generic, like standardized metrics exported by client libraries. Examples: > prometheus_notifications_total (specific to the Prometheus server) > process_cpu_seconds_total (exported by many client libraries) > http_request_duration_seconds (for all HTTP requests) > {quote} > It is also awkward in tools like grafana to find impala metrics with the > current naming scheme. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-8627) Re-enable catalog v2 in containers
[ https://issues.apache.org/jira/browse/IMPALA-8627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong reassigned IMPALA-8627: - Assignee: Vihang Karajgaonkar (was: Tim Armstrong) > Re-enable catalog v2 in containers > -- > > Key: IMPALA-8627 > URL: https://issues.apache.org/jira/browse/IMPALA-8627 > Project: IMPALA > Issue Type: Sub-task > Components: Infrastructure >Affects Versions: Impala 3.3.0 >Reporter: Tim Armstrong >Assignee: Vihang Karajgaonkar >Priority: Major > Labels: catalog-v2 > Fix For: Impala 3.3.0 > > > We also need to set --invalidate_tables_on_memory_pressure on the impalads > for that to be fully effective - the impalads send table usage info to the > catalogd -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org