[jira] [Resolved] (IMPALA-10923) Fine grained table refreshing at partition level events for transactional tables
[ https://issues.apache.org/jira/browse/IMPALA-10923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu-Wen Lai resolved IMPALA-10923. - Fix Version/s: Impala 4.1.0 Target Version: Impala 4.1.0 Resolution: Fixed > Fine grained table refreshing at partition level events for transactional > tables > > > Key: IMPALA-10923 > URL: https://issues.apache.org/jira/browse/IMPALA-10923 > Project: IMPALA > Issue Type: Improvement > Components: Catalog >Reporter: Yu-Wen Lai >Assignee: Yu-Wen Lai >Priority: Major > Fix For: Impala 4.1.0 > > > For ensuring the transactional tables are consistent, we currently take whole > table refreshing even a change is just for a partition only. That is too > expensive and possibly make event processing has a longer delay. > To enable fine-grained table refreshing, there are three main changes in this > proposal. > # maintain validWriteIdList in Catalogd for transactional tables. We will > track write id changes by AllocWriteIdEvents, CommitTxnEvents, and > AbortTxnEvents. > # trigger partition level refreshing for addPartitionEvents, > dropPartitionEvents, and AlterPartitionEvents. > # Introduce a config *incremental_refresh_acid*, which can switch on/off the > fine-grained table refreshing -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (IMPALA-11132) Front-end test PlannerTest.testResourceRequirements can fail
Qifan Chen created IMPALA-11132: --- Summary: Front-end test PlannerTest.testResourceRequirements can fail Key: IMPALA-11132 URL: https://issues.apache.org/jira/browse/IMPALA-11132 Project: IMPALA Issue Type: Test Reporter: Qifan Chen The test miscalculates per-host memory requirements, apparently due to an incorrect HBase cardinality estimate: {code:java} Section DISTRIBUTEDPLAN of query: select * from functional_hbase.alltypessmall Actual does not match expected result: Max Per-Host Resource Reservation: Memory=4.00MB Threads=2 Per-Host Resource Estimates: Memory=10MB Codegen disabled by planner Analyzed query: SELECT * FROM functional_hbase.alltypessmall F01:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1 | Per-Host Resources: mem-estimate=5.08MB mem-reservation=4.00MB thread-reservation=1 ^^ PLAN-ROOT SINK | output exprs: functional_hbase.alltypessmall.id, functional_hbase.alltypessmall.bigint_col, functional_hbase.alltypessmall.bool_col, functional_hbase.alltypessmall.date_string_col, functional_hbase.alltypessmall.double_col, functional_hbase.alltypessmall.float_col, functional_hbase.alltypessmall.int_col, functional_hbase.alltypessmall.month, functional_hbase.alltypessmall.smallint_col, functional_hbase.alltypessmall.string_col, functional_hbase.alltypessmall.timestamp_col, functional_hbase.alltypessmall.tinyint_col, functional_hbase.alltypessmall.year | mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB thread-reservation=0 | 01:EXCHANGE [UNPARTITIONED] | mem-estimate=1.08MB mem-reservation=0B thread-reservation=0 | tuple-ids=0 row-size=89B cardinality=28.57K | in pipelines: 00(GETNEXT) | F00:PLAN FRAGMENT [RANDOM] hosts=3 instances=3 Per-Host Resources: mem-estimate=4.00KB mem-reservation=0B thread-reservation=1 00:SCAN HBASE [functional_hbase.alltypessmall] stored statistics: table: rows=100 columns: all mem-estimate=4.00KB mem-reservation=0B thread-reservation=0 tuple-ids=0 row-size=89B cardinality=28.57K in pipelines: 00(GETNEXT) Expected: Max Per-Host Resource Reservation: Memory=4.00MB Threads=2 Per-Host Resource Estimates: Memory=10MB Codegen disabled by planner Analyzed query: SELECT * FROM functional_hbase.alltypessmall F01:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1 | Per-Host Resources: mem-estimate=4.02MB mem-reservation=4.00MB thread-reservation=1 PLAN-ROOT SINK | output exprs: functional_hbase.alltypessmall.id, functional_hbase.alltypessmall.bigint_col, functional_hbase.alltypessmall.bool_col, functional_hbase.alltypessmall.date_string_col, functional_hbase.alltypessmall.double_col, functional_hbase.alltypessmall.float_col, functional_hbase.alltypessmall.int_col, functional_hbase.alltypessmall.month, functional_hbase.alltypessmall.smallint_col, functional_hbase.alltypessmall.string_col, functional_hbase.alltypessmall.timestamp_col, functional_hbase.alltypessmall.tinyint_col, functional_hbase.alltypessmall.year | mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB thread-reservation=0 | 01:EXCHANGE [UNPARTITIONED] | mem-estimate=16.00KB mem-reservation=0B thread-reservation=0 | tuple-ids=0 row-size=89B cardinality=50 | in pipelines: 00(GETNEXT) | F00:PLAN FRAGMENT [RANDOM] hosts=3 instances=3 Per-Host Resources: mem-estimate=4.00KB mem-reservation=0B thread-reservation=1 00:SCAN HBASE [functional_hbase.alltypessmall] stored statistics: table: rows=100 columns: all mem-estimate=4.00KB mem-reservation=0B thread-reservation=0 tuple-ids=0 row-size=89B cardinality=50 in pipelines: 00(GETNEXT) {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (IMPALA-11131) Replace 'cnd_cwnd' with 'snd_cwnd' in www/rpcz.tmpl
Riza Suminto created IMPALA-11131: - Summary: Replace 'cnd_cwnd' with 'snd_cwnd' in www/rpcz.tmpl Key: IMPALA-11131 URL: https://issues.apache.org/jira/browse/IMPALA-11131 Project: IMPALA Issue Type: Improvement Components: Backend Affects Versions: Impala 4.0.0 Reporter: Riza Suminto Assignee: Riza Suminto There has been a typo in www/rpcz.tmpl [https://github.com/apache/impala/blob/b96439f/www/rpcz.tmpl#L303] The correct json key should be 'snd_cwnd', not 'cnd_cwnd'. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (IMPALA-11093) Fine grained table refreshing doesn't refresh table file metadata
[ https://issues.apache.org/jira/browse/IMPALA-11093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu-Wen Lai resolved IMPALA-11093. - Fix Version/s: Impala 4.0.1 Resolution: Fixed > Fine grained table refreshing doesn't refresh table file metadata > - > > Key: IMPALA-11093 > URL: https://issues.apache.org/jira/browse/IMPALA-11093 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Reporter: Yu-Wen Lai >Assignee: Yu-Wen Lai >Priority: Major > Fix For: Impala 4.0.1 > > > If we insert data into an acid partitioned table from Hive, the generated > events will be like open_txn -> alter_partition -> commit_txn. > In alter_partition event, we will refresh the partition (without refreshing > file metadata) because the write id with the partition object in the event > > write id of the cached partition. Then, in commit_txn event, we still do the > same write id check. However, the write id is still the same, so we end up > don't fresh file metadata. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (IMPALA-11130) Postgres JDBC driver should be upgraded to 42.3.3
Joe McDonnell created IMPALA-11130: -- Summary: Postgres JDBC driver should be upgraded to 42.3.3 Key: IMPALA-11130 URL: https://issues.apache.org/jira/browse/IMPALA-11130 Project: IMPALA Issue Type: Task Components: Infrastructure Affects Versions: Impala 4.1.0 Reporter: Joe McDonnell Impala currently uses Postgres driver version 42.2.14 which is impacted by CVE-2022-21724. We should upgrade to 42.3.3, which has the resolution for the CVE. [https://search.maven.org/artifact/org.postgresql/postgresql/42.3.3/jar] -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (IMPALA-2272) Parquet scanner always materializes NULL for empty collections
[ https://issues.apache.org/jira/browse/IMPALA-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Csaba Ringhofer resolved IMPALA-2272. - Fix Version/s: Impala 4.1.0 Resolution: Fixed Resolved by IMPALA-9498 > Parquet scanner always materializes NULL for empty collections > -- > > Key: IMPALA-2272 > URL: https://issues.apache.org/jira/browse/IMPALA-2272 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 2.3.0 >Reporter: Skye Wanderman-Milne >Priority: Minor > Labels: complextype, nested_types > Fix For: Impala 4.1.0 > > > Currently the Parquet scanner will always materialize a NULL slot for an > empty collection, rather than an empty ArrayValue/CollectionValue. It is not > currently possible to write a query that exposes this bug (i.e. it's not > possible to write a query that distinguishes between an empty and NULL > collection), but it will be once we add expressions that take collections as > input (e.g. "select array_column is null from tbl"). > We have this bug because the parquet scanner only looks at the repeated field > of an array, not the containing group field. To fix it, it will have to > consider the def/rep levels of both. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (IMPALA-9498) Allow array type in SELECT list for Parquet tables
[ https://issues.apache.org/jira/browse/IMPALA-9498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Csaba Ringhofer resolved IMPALA-9498. - Fix Version/s: Impala 4.1.0 Resolution: Fixed > Allow array type in SELECT list for Parquet tables > -- > > Key: IMPALA-9498 > URL: https://issues.apache.org/jira/browse/IMPALA-9498 > Project: IMPALA > Issue Type: New Feature > Components: Backend, Frontend >Reporter: Gabor Kaszab >Assignee: Csaba Ringhofer >Priority: Major > Labels: complextype > Fix For: Impala 4.1.0 > > > This covers collections: Array > Expected printout format: > Array: [null,1,2,null,3,null] -- This message was sent by Atlassian Jira (v8.20.1#820001)