[jira] [Resolved] (IMPALA-10923) Fine grained table refreshing at partition level events for transactional tables

2022-02-17 Thread Yu-Wen Lai (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu-Wen Lai resolved IMPALA-10923.
-
 Fix Version/s: Impala 4.1.0
Target Version: Impala 4.1.0
Resolution: Fixed

> Fine grained table refreshing at partition level events for transactional 
> tables
> 
>
> Key: IMPALA-10923
> URL: https://issues.apache.org/jira/browse/IMPALA-10923
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Reporter: Yu-Wen Lai
>Assignee: Yu-Wen Lai
>Priority: Major
> Fix For: Impala 4.1.0
>
>
> For ensuring the transactional tables are consistent, we currently take whole 
> table refreshing even a change is just for a partition only. That is too 
> expensive and possibly make event processing has a longer delay.
> To enable fine-grained table refreshing, there are three main changes in this 
> proposal.
>  # maintain validWriteIdList in Catalogd for transactional tables. We will 
> track write id changes by AllocWriteIdEvents, CommitTxnEvents, and 
> AbortTxnEvents.
>  # trigger partition level refreshing for addPartitionEvents, 
> dropPartitionEvents, and AlterPartitionEvents.
>  # Introduce a config *incremental_refresh_acid*, which can switch on/off the 
> fine-grained table refreshing



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (IMPALA-11132) Front-end test PlannerTest.testResourceRequirements can fail

2022-02-17 Thread Qifan Chen (Jira)
Qifan Chen created IMPALA-11132:
---

 Summary: Front-end test PlannerTest.testResourceRequirements can 
fail
 Key: IMPALA-11132
 URL: https://issues.apache.org/jira/browse/IMPALA-11132
 Project: IMPALA
  Issue Type: Test
Reporter: Qifan Chen


The test miscalculates per-host memory requirements, apparently due to an 
incorrect HBase cardinality estimate:


{code:java}
Section DISTRIBUTEDPLAN of query:
select * from functional_hbase.alltypessmall

Actual does not match expected result:
Max Per-Host Resource Reservation: Memory=4.00MB Threads=2
Per-Host Resource Estimates: Memory=10MB
Codegen disabled by planner
Analyzed query: SELECT * FROM functional_hbase.alltypessmall

F01:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
|  Per-Host Resources: mem-estimate=5.08MB mem-reservation=4.00MB 
thread-reservation=1
^^
PLAN-ROOT SINK
|  output exprs: functional_hbase.alltypessmall.id, 
functional_hbase.alltypessmall.bigint_col, 
functional_hbase.alltypessmall.bool_col, 
functional_hbase.alltypessmall.date_string_col, 
functional_hbase.alltypessmall.double_col, 
functional_hbase.alltypessmall.float_col, 
functional_hbase.alltypessmall.int_col, functional_hbase.alltypessmall.month, 
functional_hbase.alltypessmall.smallint_col, 
functional_hbase.alltypessmall.string_col, 
functional_hbase.alltypessmall.timestamp_col, 
functional_hbase.alltypessmall.tinyint_col, functional_hbase.alltypessmall.year
|  mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB 
thread-reservation=0
|
01:EXCHANGE [UNPARTITIONED]
|  mem-estimate=1.08MB mem-reservation=0B thread-reservation=0
|  tuple-ids=0 row-size=89B cardinality=28.57K
|  in pipelines: 00(GETNEXT)
|
F00:PLAN FRAGMENT [RANDOM] hosts=3 instances=3
Per-Host Resources: mem-estimate=4.00KB mem-reservation=0B thread-reservation=1
00:SCAN HBASE [functional_hbase.alltypessmall]
   stored statistics:
 table: rows=100
 columns: all
   mem-estimate=4.00KB mem-reservation=0B thread-reservation=0
   tuple-ids=0 row-size=89B cardinality=28.57K
   in pipelines: 00(GETNEXT)

Expected:
Max Per-Host Resource Reservation: Memory=4.00MB Threads=2
Per-Host Resource Estimates: Memory=10MB
Codegen disabled by planner
Analyzed query: SELECT * FROM functional_hbase.alltypessmall

F01:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
|  Per-Host Resources: mem-estimate=4.02MB mem-reservation=4.00MB 
thread-reservation=1
PLAN-ROOT SINK
|  output exprs: functional_hbase.alltypessmall.id, 
functional_hbase.alltypessmall.bigint_col, 
functional_hbase.alltypessmall.bool_col, 
functional_hbase.alltypessmall.date_string_col, 
functional_hbase.alltypessmall.double_col, 
functional_hbase.alltypessmall.float_col, 
functional_hbase.alltypessmall.int_col, functional_hbase.alltypessmall.month, 
functional_hbase.alltypessmall.smallint_col, 
functional_hbase.alltypessmall.string_col, 
functional_hbase.alltypessmall.timestamp_col, 
functional_hbase.alltypessmall.tinyint_col, functional_hbase.alltypessmall.year
|  mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB 
thread-reservation=0
|
01:EXCHANGE [UNPARTITIONED]
|  mem-estimate=16.00KB mem-reservation=0B thread-reservation=0
|  tuple-ids=0 row-size=89B cardinality=50
|  in pipelines: 00(GETNEXT)
|
F00:PLAN FRAGMENT [RANDOM] hosts=3 instances=3
Per-Host Resources: mem-estimate=4.00KB mem-reservation=0B thread-reservation=1
00:SCAN HBASE [functional_hbase.alltypessmall]
   stored statistics:
 table: rows=100
 columns: all
   mem-estimate=4.00KB mem-reservation=0B thread-reservation=0
   tuple-ids=0 row-size=89B cardinality=50
   in pipelines: 00(GETNEXT)
{code}





--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (IMPALA-11131) Replace 'cnd_cwnd' with 'snd_cwnd' in www/rpcz.tmpl

2022-02-17 Thread Riza Suminto (Jira)
Riza Suminto created IMPALA-11131:
-

 Summary: Replace 'cnd_cwnd' with 'snd_cwnd' in www/rpcz.tmpl
 Key: IMPALA-11131
 URL: https://issues.apache.org/jira/browse/IMPALA-11131
 Project: IMPALA
  Issue Type: Improvement
  Components: Backend
Affects Versions: Impala 4.0.0
Reporter: Riza Suminto
Assignee: Riza Suminto


There has been a typo in www/rpcz.tmpl

[https://github.com/apache/impala/blob/b96439f/www/rpcz.tmpl#L303]

The correct json key should be 'snd_cwnd', not 'cnd_cwnd'.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (IMPALA-11093) Fine grained table refreshing doesn't refresh table file metadata

2022-02-17 Thread Yu-Wen Lai (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu-Wen Lai resolved IMPALA-11093.
-
Fix Version/s: Impala 4.0.1
   Resolution: Fixed

> Fine grained table refreshing doesn't refresh table file metadata
> -
>
> Key: IMPALA-11093
> URL: https://issues.apache.org/jira/browse/IMPALA-11093
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Yu-Wen Lai
>Assignee: Yu-Wen Lai
>Priority: Major
> Fix For: Impala 4.0.1
>
>
> If we insert data into an acid partitioned table from Hive, the generated 
> events will be like open_txn -> alter_partition -> commit_txn.
> In alter_partition event, we will refresh the partition (without refreshing 
> file metadata) because the write id with the partition object in the event > 
> write id of the cached partition. Then, in commit_txn event, we still do the 
> same write id check. However, the write id is still the same, so we end up 
> don't fresh file metadata. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (IMPALA-11130) Postgres JDBC driver should be upgraded to 42.3.3

2022-02-17 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-11130:
--

 Summary: Postgres JDBC driver should be upgraded to 42.3.3
 Key: IMPALA-11130
 URL: https://issues.apache.org/jira/browse/IMPALA-11130
 Project: IMPALA
  Issue Type: Task
  Components: Infrastructure
Affects Versions: Impala 4.1.0
Reporter: Joe McDonnell


Impala currently uses Postgres driver version 42.2.14 which is impacted by 
CVE-2022-21724. We should upgrade to 42.3.3, which has the resolution for the 
CVE.

[https://search.maven.org/artifact/org.postgresql/postgresql/42.3.3/jar]

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (IMPALA-2272) Parquet scanner always materializes NULL for empty collections

2022-02-17 Thread Csaba Ringhofer (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer resolved IMPALA-2272.
-
Fix Version/s: Impala 4.1.0
   Resolution: Fixed

Resolved by IMPALA-9498

> Parquet scanner always materializes NULL for empty collections
> --
>
> Key: IMPALA-2272
> URL: https://issues.apache.org/jira/browse/IMPALA-2272
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.3.0
>Reporter: Skye Wanderman-Milne
>Priority: Minor
>  Labels: complextype, nested_types
> Fix For: Impala 4.1.0
>
>
> Currently the Parquet scanner will always materialize a NULL slot for an 
> empty collection, rather than an empty ArrayValue/CollectionValue. It is not 
> currently possible to write a query that exposes this bug (i.e. it's not 
> possible to write a query that distinguishes between an empty and NULL 
> collection), but it will be once we add expressions that take collections as 
> input (e.g. "select array_column is null from tbl").
> We have this bug because the parquet scanner only looks at the repeated field 
> of an array, not the containing group field. To fix it, it will have to 
> consider the def/rep levels of both.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (IMPALA-9498) Allow array type in SELECT list for Parquet tables

2022-02-17 Thread Csaba Ringhofer (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer resolved IMPALA-9498.
-
Fix Version/s: Impala 4.1.0
   Resolution: Fixed

> Allow array type in SELECT list for Parquet tables
> --
>
> Key: IMPALA-9498
> URL: https://issues.apache.org/jira/browse/IMPALA-9498
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Backend, Frontend
>Reporter: Gabor Kaszab
>Assignee: Csaba Ringhofer
>Priority: Major
>  Labels: complextype
> Fix For: Impala 4.1.0
>
>
> This covers collections: Array
> Expected printout format:
> Array: [null,1,2,null,3,null]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)