[Impala-ASF-CR] IMPALA-9615: re2's max mem opt configurable via an Impala startup flag

2022-06-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18602 )

Change subject: IMPALA-9615: re2's max_mem opt configurable via an Impala 
startup flag
..


Patch Set 12: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/18602
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idf28d2f7217b1322ab8fdfb2c02fff0608078571
Gerrit-Change-Number: 18602
Gerrit-PatchSet: 12
Gerrit-Owner: Omid Shahidi 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Omid Shahidi 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Tue, 28 Jun 2022 02:18:03 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9615: re2's max mem opt configurable via an Impala startup flag

2022-06-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/18602 )

Change subject: IMPALA-9615: re2's max_mem opt configurable via an Impala 
startup flag
..

IMPALA-9615: re2's max_mem opt configurable via an Impala startup flag

Some regex patterns require more memory to be compiled and pattern matched
using different string functions and like predicate available.
For more memory consuming patterns this can cause the following error:
"re2/re2.cc:667: DFA out of memory:
size x, bytemap range xx, list count x".

To avoid such errors in Impalad's ERROR log, a global flag can
be added to impala cluster startup. The re2_mem_limit flag will
accept a memory specification string to set the re2 max_mem parameter for
memory used to store regexps in Bytes.

Testing:
 - Use a long regex pattern to use up all the memory in the
   case of allocating less or the same amount of memory as default for re2.
   By using a greater value for re2_mem_limit flag, the regexp can be
   consumed with no error.

Change-Id: Idf28d2f7217b1322ab8fdfb2c02fff0608078571
Reviewed-on: http://gerrit.cloudera.org:8080/18602
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M be/src/common/global-flags.cc
M be/src/common/init.cc
M be/src/exprs/like-predicate.cc
M be/src/exprs/string-functions-ir.cc
M be/src/exprs/string-functions.h
A tests/custom_cluster/test_re2_max_mem.py
6 files changed, 118 insertions(+), 1 deletion(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/18602
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Idf28d2f7217b1322ab8fdfb2c02fff0608078571
Gerrit-Change-Number: 18602
Gerrit-PatchSet: 13
Gerrit-Owner: Omid Shahidi 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Omid Shahidi 
Gerrit-Reviewer: Riza Suminto 


[Impala-ASF-CR] IMPALA-10791 Add batch reading for remote temporary files

2022-06-27 Thread Yida Wu (Code Review)
Yida Wu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17979 )

Change subject: IMPALA-10791 Add batch reading for remote temporary files
..


Patch Set 12:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/disk-file.h
File be/src/runtime/io/disk-file.h:

http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/disk-file.h@55
PS12, Line 55: class MemBlock {
> This seems like a fairly leaky abstraction. It mostly exists to handle the
It is quite a simple class working closely with the DiskFile and TmpFile stuff, 
may rely on the TmpFile stuff to reserve the memory (which relies on a global 
variable in TmpFileMgr) for it before having the right to allocate, and rely on 
the IO functions to write and read directly on the data_ under certain block 
status. Think for a while for a better abstraction, but dont have much idea for 
now while guarantee the efficiency and safety without too much complexity. I 
think one of the good thing here is the status transition is clear and one-way 
down, what the user needs to do, in most cases, is to lock on the block 
spinlock, do something, change the status if needed.


http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/disk-file.h@220
PS12, Line 220: std::unique_ptr page_cnts_per_block_;
> This seems a little funny, but I think I see why it's necessary. A const ve
Maybe personal preference. Prefer a fixed-size array if the size is fixed and 
type is simple. Looks more clear.


http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/disk-file.h@446
PS12, Line 446:   // Helper function to check the status of a read buffe block.
> nit: buffe -> buffer
Done


http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/disk-file.h@458
PS12, Line 458:   // Helper function to delete the read buffe block.
> nit: buffe -> buffer
Done


http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/disk-file.h@516
PS12, Line 516:   /// The lock also protects the memory blocks from 
destruction, if the disk file has.
> "if the disk file has" seems like an incomplete sentence.
Done


http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/disk-file.cc
File be/src/runtime/io/disk-file.cc:

http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/disk-file.cc@99
PS12, Line 99: void MemBlock::Delete(bool* reserved, bool* allocated) {
> Some unit tests around MemBlock transitions might be useful.
Added a unit testcase "MemBlockTest" should have covered the MemBlock status 
transitions.


http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/request-ranges.h
File be/src/runtime/io/request-ranges.h:

http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/request-ranges.h@152
PS12, Line 152:   RequestRange(RequestType::type request_type, int disk_id = 
-1, int64_t offset = -1)
> How does an offset of -1 differ from 0?
I think the change is to allow assigning the offset in the constructor 
function. -1 is an invalid value for the offset, that means when we need to set 
the offset of the range before using the range, there are some assertion about 
it like "DCHECK_GE(offset, 0);" when using it.


http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/scan-range.cc
File be/src/runtime/io/scan-range.cc:

http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/scan-range.cc@156
PS12, Line 156: if (!use_mem_buffer) {
> Since we asserted above that use_local_buff implies !use_mem_buffer, this c
Added some comments. The logic here is to set the correct file reader if it 
involves reading from the file system (could be local or remote file). If it 
uses memory buffer, it doesn't need to set this. There is a case if 
use_mem_buffer and use_local_buff are all false, this case we will use the 
original file_reader_ to get the range.


http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/io/scan-range.cc@291
PS12, Line 291:   // 1. If it is the local buffer file is not 
deleted(evicted) yet.
> Change to "If the local buffer file ..."
Done


http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/tmp-file-mgr-test.cc
File be/src/runtime/tmp-file-mgr-test.cc:

http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/tmp-file-mgr-test.cc@1835
PS12, Line 1835:   int64_t file_size = 512 * 1024 * 1024;
> Why was this changed?
The maximum allowed file size for one remote file increases to 512MB in this 
patch defined in MAX_REMOTE_TMPFILE_SIZE_THRESHOLD_MB in tmp-file-mgr.cc. The 
reason for the increase is a bigger file size may have a better upload 
performance. By default we are still using 256MB, but may give users a chance 
if they would like to try a bigger size.


http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/tmp-file-mgr.cc
File be/src/runtime/tmp-file-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/17979/12/be/src/runtime/tmp-file-mgr.cc@117
PS12, Line 117: "Set if 

[Impala-ASF-CR] IMPALA-10791 Add batch reading for remote temporary files

2022-06-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17979 )

Change subject: IMPALA-10791 Add batch reading for remote temporary files
..


Patch Set 13:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/10889/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17979
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1dcc5d0881ffaeff09c5c514306cd668373ad31b
Gerrit-Change-Number: 17979
Gerrit-PatchSet: 13
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Tue, 28 Jun 2022 00:27:22 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10791 Add batch reading for remote temporary files

2022-06-27 Thread Yida Wu (Code Review)
Yida Wu has uploaded a new patch set (#13). ( 
http://gerrit.cloudera.org:8080/17979 )

Change subject: IMPALA-10791 Add batch reading for remote temporary files
..

IMPALA-10791 Add batch reading for remote temporary files

The patch adds a feature to batch read from a remote temporary
file in order to improve the reading performance for the spilled
remote data.

Originally, the design is to use the local disk file as the buffer
for batch reading from the remote file. But in practice, it
doesn't help to improve the performance. Therefore, the design
is changed to use the memory as the read buffer.

Currently, each TmpFileRemote has two DiskFile, one is for the
remote, and one is for the local buffer. The patch adds MemBlocks
to the local buffer file. Each local buffer file is divided into
several MemBlocks evenly. Moreover, in order to guarantee a
single page not being cut into two parts in different blocks,
the block size could be a little different to each other in
practice. The default block size is the minimum value between
the default file size and
MAX_REMOTE_READ_MEM_BLOCK_THRESHOLD_BYTES, which is 16MB.

When pinning a page, the system will detect if there is enough
memory for the block that holds the page, if not, we will go
reading the page directly and disable this block, because it may
be good to avoid duplicated reads from the remote fs for the same
content. If the system decides to fetch a block, the block will be
stored in the memory until all of the pages in the block are read
or the query ends.

One challenge of using the memory for the buffer is that, when the
system is lacking of memory when it needs to spill the data. So we
make a restriction to limit the percentage of the memory for the
read buffer to 10% of the total, because right now the impala
process will reserve 20% memory as unused memory by default, using
10% for the emergency case like spilling is reasonable.

Two start options have been added for the new feature.

1. remote_batch_read. Default is false. If set true, the batch read
is enabled.
2. remote_read_memory_buffer_size. Default is 1G. The maximum memory
that can be used by the read buffer. The number also restricted by
the total system memory, which can not exceed 10% of the total
system memory.

Added metrics ScratchReadsUseMem/ScratchBytesReadUseMem/
ScratchBytesReadUseLocalDisk to the query profile.

The patch also increases the MAX_REMOTE_TMPFILE_SIZE_THRESHOLD_MB
from 256 to 512.

Tests:
Ran core and exhaustive tests.
Added and ran TmpFileMgrTest::TestBatchReadingFromRemote.
Added e2e test test_scratch_dirs_batch_reading.

Change-Id: I1dcc5d0881ffaeff09c5c514306cd668373ad31b
---
M be/src/runtime/io/disk-file.cc
M be/src/runtime/io/disk-file.h
M be/src/runtime/io/disk-io-mgr-test.cc
M be/src/runtime/io/disk-io-mgr.cc
M be/src/runtime/io/request-context.cc
M be/src/runtime/io/request-context.h
M be/src/runtime/io/request-ranges.h
M be/src/runtime/io/scan-range.cc
M be/src/runtime/tmp-file-mgr-internal.h
M be/src/runtime/tmp-file-mgr-test.cc
M be/src/runtime/tmp-file-mgr.cc
M be/src/runtime/tmp-file-mgr.h
M be/src/util/metrics.h
M common/thrift/metrics.json
M tests/custom_cluster/test_scratch_disk.py
15 files changed, 1,340 insertions(+), 151 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/17979/13
--
To view, visit http://gerrit.cloudera.org:8080/17979
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1dcc5d0881ffaeff09c5c514306cd668373ad31b
Gerrit-Change-Number: 17979
Gerrit-PatchSet: 13
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Yida Wu 


[Impala-ASF-CR] IMPALA-9615: re2's max mem opt configurable via an Impala startup flag

2022-06-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18602 )

Change subject: IMPALA-9615: re2's max_mem opt configurable via an Impala 
startup flag
..


Patch Set 11:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/10888/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/18602
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idf28d2f7217b1322ab8fdfb2c02fff0608078571
Gerrit-Change-Number: 18602
Gerrit-PatchSet: 11
Gerrit-Owner: Omid Shahidi 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Omid Shahidi 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Mon, 27 Jun 2022 21:46:03 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9615: re2's max mem opt configurable via an Impala startup flag

2022-06-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18602 )

Change subject: IMPALA-9615: re2's max_mem opt configurable via an Impala 
startup flag
..


Patch Set 10:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/10887/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/18602
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idf28d2f7217b1322ab8fdfb2c02fff0608078571
Gerrit-Change-Number: 18602
Gerrit-PatchSet: 10
Gerrit-Owner: Omid Shahidi 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Omid Shahidi 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Mon, 27 Jun 2022 21:42:05 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9615: re2's max mem opt configurable via an Impala startup flag

2022-06-27 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18602 )

Change subject: IMPALA-9615: re2's max_mem opt configurable via an Impala 
startup flag
..


Patch Set 11: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/18602
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idf28d2f7217b1322ab8fdfb2c02fff0608078571
Gerrit-Change-Number: 18602
Gerrit-PatchSet: 11
Gerrit-Owner: Omid Shahidi 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Omid Shahidi 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Mon, 27 Jun 2022 21:36:21 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9615: re2's max mem opt configurable via an Impala startup flag

2022-06-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18602 )

Change subject: IMPALA-9615: re2's max_mem opt configurable via an Impala 
startup flag
..


Patch Set 12:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8275/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/18602
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idf28d2f7217b1322ab8fdfb2c02fff0608078571
Gerrit-Change-Number: 18602
Gerrit-PatchSet: 12
Gerrit-Owner: Omid Shahidi 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Omid Shahidi 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Mon, 27 Jun 2022 21:38:43 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9615: re2's max mem opt configurable via an Impala startup flag

2022-06-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18602 )

Change subject: IMPALA-9615: re2's max_mem opt configurable via an Impala 
startup flag
..


Patch Set 12: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/18602
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idf28d2f7217b1322ab8fdfb2c02fff0608078571
Gerrit-Change-Number: 18602
Gerrit-PatchSet: 12
Gerrit-Owner: Omid Shahidi 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Omid Shahidi 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Mon, 27 Jun 2022 21:38:42 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11399: Download shell dependencies from PyPI

2022-06-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18668 )

Change subject: IMPALA-11399: Download shell dependencies from PyPI
..


Patch Set 6: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/8274/


--
To view, visit http://gerrit.cloudera.org:8080/18668
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3b76e112ba9d0db19fae3e8eb15fd54a721f80fd
Gerrit-Change-Number: 18668
Gerrit-PatchSet: 6
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Comment-Date: Mon, 27 Jun 2022 21:30:12 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9615: re2's max mem opt configurable via an Impala startup flag

2022-06-27 Thread Omid Shahidi (Code Review)
Hello Kurt Deschler, Riza Suminto, Abhishek Rawat, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/18602

to look at the new patch set (#11).

Change subject: IMPALA-9615: re2's max_mem opt configurable via an Impala 
startup flag
..

IMPALA-9615: re2's max_mem opt configurable via an Impala startup flag

Some regex patterns require more memory to be compiled and pattern matched
using different string functions and like predicate available.
For more memory consuming patterns this can cause the following error:
"re2/re2.cc:667: DFA out of memory:
size x, bytemap range xx, list count x".

To avoid such errors in Impalad's ERROR log, a global flag can
be added to impala cluster startup. The re2_mem_limit flag will
accept a memory specification string to set the re2 max_mem parameter for
memory used to store regexps in Bytes.

Testing:
 - Use a long regex pattern to use up all the memory in the
   case of allocating less or the same amount of memory as default for re2.
   By using a greater value for re2_mem_limit flag, the regexp can be
   consumed with no error.

Change-Id: Idf28d2f7217b1322ab8fdfb2c02fff0608078571
---
M be/src/common/global-flags.cc
M be/src/common/init.cc
M be/src/exprs/like-predicate.cc
M be/src/exprs/string-functions-ir.cc
M be/src/exprs/string-functions.h
A tests/custom_cluster/test_re2_max_mem.py
6 files changed, 118 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/02/18602/11
--
To view, visit http://gerrit.cloudera.org:8080/18602
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idf28d2f7217b1322ab8fdfb2c02fff0608078571
Gerrit-Change-Number: 18602
Gerrit-PatchSet: 11
Gerrit-Owner: Omid Shahidi 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Omid Shahidi 
Gerrit-Reviewer: Riza Suminto 


[Impala-ASF-CR] IMPALA-9615: re2's max mem opt configurable via an Impala startup flag

2022-06-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18602 )

Change subject: IMPALA-9615: re2's max_mem opt configurable via an Impala 
startup flag
..


Patch Set 10:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/18602/10/tests/custom_cluster/test_re2_max_mem.py
File tests/custom_cluster/test_re2_max_mem.py:

http://gerrit.cloudera.org:8080/#/c/18602/10/tests/custom_cluster/test_re2_max_mem.py@45
PS10, Line 45:
flake8: W291 trailing whitespace


http://gerrit.cloudera.org:8080/#/c/18602/10/tests/custom_cluster/test_re2_max_mem.py@45
PS10, Line 45: # DFA out of memory issue at that will be brought up 
when
line has trailing whitespace



--
To view, visit http://gerrit.cloudera.org:8080/18602
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idf28d2f7217b1322ab8fdfb2c02fff0608078571
Gerrit-Change-Number: 18602
Gerrit-PatchSet: 10
Gerrit-Owner: Omid Shahidi 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Omid Shahidi 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Mon, 27 Jun 2022 21:21:44 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9615: re2's max mem opt configurable via an Impala startup flag

2022-06-27 Thread Omid Shahidi (Code Review)
Hello Kurt Deschler, Riza Suminto, Abhishek Rawat, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/18602

to look at the new patch set (#10).

Change subject: IMPALA-9615: re2's max_mem opt configurable via an Impala 
startup flag
..

IMPALA-9615: re2's max_mem opt configurable via an Impala startup flag

Some regex patterns require more memory to be compiled and pattern matched
using different string functions and like predicate available.
For more memory consuming patterns this can cause the following error:
"re2/re2.cc:667: DFA out of memory:
size x, bytemap range xx, list count x".

To avoid such errors in Impalad's ERROR log, a global flag can
be added to impala cluster startup. The re2_mem_limit flag will
accept a memory specification string to set the re2 max_mem parameter for
memory used to store regexps in Bytes.

Testing:
 - Use a long regex pattern to use up all the memory in the
   case of allocating less or the same amount of memory as default for re2.
   By using a greater value for re2_mem_limit flag, the regexp can be
   consumed with no error.

Change-Id: Idf28d2f7217b1322ab8fdfb2c02fff0608078571
---
M be/src/common/global-flags.cc
M be/src/common/init.cc
M be/src/exprs/like-predicate.cc
M be/src/exprs/string-functions-ir.cc
M be/src/exprs/string-functions.h
A tests/custom_cluster/test_re2_max_mem.py
6 files changed, 118 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/02/18602/10
--
To view, visit http://gerrit.cloudera.org:8080/18602
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idf28d2f7217b1322ab8fdfb2c02fff0608078571
Gerrit-Change-Number: 18602
Gerrit-PatchSet: 10
Gerrit-Owner: Omid Shahidi 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Omid Shahidi 
Gerrit-Reviewer: Riza Suminto 


[Impala-ASF-CR] IMPALA-11389: Include Python 3 eggs in tarball

2022-06-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18653 )

Change subject: IMPALA-11389: Include Python 3 eggs in tarball
..


Patch Set 9:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/10885/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/18653
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I94f86de9e2a6303151c2f0e6454b5f629cbc9444
Gerrit-Change-Number: 18653
Gerrit-PatchSet: 9
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Comment-Date: Mon, 27 Jun 2022 16:28:09 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11399: Download shell dependencies from PyPI

2022-06-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18668 )

Change subject: IMPALA-11399: Download shell dependencies from PyPI
..


Patch Set 6:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/10886/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/18668
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3b76e112ba9d0db19fae3e8eb15fd54a721f80fd
Gerrit-Change-Number: 18668
Gerrit-PatchSet: 6
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Comment-Date: Mon, 27 Jun 2022 16:28:04 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10545: Higher data cache write concurrency for SSDs

2022-06-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18616 )

Change subject: IMPALA-10545: Higher data_cache_write_concurrency for SSDs
..


Patch Set 6:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/10884/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/18616
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I60761faa2710f4795f1f3eaf66da866b5553f609
Gerrit-Change-Number: 18616
Gerrit-PatchSet: 6
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Comment-Date: Mon, 27 Jun 2022 16:26:47 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10545: Higher data cache write concurrency for SSDs

2022-06-27 Thread Michael Smith (Code Review)
Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18616 )

Change subject: IMPALA-10545: Higher data_cache_write_concurrency for SSDs
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18616/4/be/src/runtime/io/data-cache.cc
File be/src/runtime/io/data-cache.cc:

http://gerrit.cloudera.org:8080/#/c/18616/4/be/src/runtime/io/data-cache.cc@839
PS4, Line 839: }
 :
 : Status DataCache::Init() {
 :   // Verifies all the configured flags are sane.
 :   if (FLAGS_data_cache_file_max_size_bytes <
> (This is about removing it in the later patch set)
It was somewhat unintentional. I removed it when I was playing with mimicking 
disk-io-mgr settings, because they considered non-positive values "unset". Then 
left it when I noticed it had been removed because it didn't seem like a 
problem. I'll add it back.



--
To view, visit http://gerrit.cloudera.org:8080/18616
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I60761faa2710f4795f1f3eaf66da866b5553f609
Gerrit-Change-Number: 18616
Gerrit-PatchSet: 5
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 
Gerrit-Comment-Date: Mon, 27 Jun 2022 15:53:35 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11399: Download shell dependencies from PyPI

2022-06-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18668 )

Change subject: IMPALA-11399: Download shell dependencies from PyPI
..


Patch Set 6:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8274/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/18668
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3b76e112ba9d0db19fae3e8eb15fd54a721f80fd
Gerrit-Change-Number: 18668
Gerrit-PatchSet: 6
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Comment-Date: Mon, 27 Jun 2022 16:10:01 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11389: Include Python 3 eggs in tarball

2022-06-27 Thread Michael Smith (Code Review)
Hello Joe McDonnell, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/18653

to look at the new patch set (#9).

Change subject: IMPALA-11389: Include Python 3 eggs in tarball
..

IMPALA-11389: Include Python 3 eggs in tarball

Build Python 3 eggs for the shell tarball so it works with both Python 2
and Python 3. The impala-shell script selects eggs based on the
available Python version.

Inlines thrift for impala-shell so we can easily build Python 2 and
Python 3 versions, consistent with other libraries. The impala-shell
version should always be at least as new as IMPALA_THRIFT_PY_VERSION.

A specific Python version can be selected with IMPALA_PYTHON_EXECUTABLE;
otherwise it will use 'python', and if unavailable try 'python3'.

Adds tests for impala-shell tarball with Python 3.

Change-Id: I94f86de9e2a6303151c2f0e6454b5f629cbc9444
---
M bin/bootstrap_build.sh
M bin/bootstrap_system.sh
M bin/impala-config.sh
M bin/rat_exclude_files.txt
M shell/.gitignore
A shell/ext-py/thrift-0.14.2/CMakeLists.txt
A shell/ext-py/thrift-0.14.2/MANIFEST.in
A shell/ext-py/thrift-0.14.2/Makefile.am
A shell/ext-py/thrift-0.14.2/README.md
A shell/ext-py/thrift-0.14.2/coding_standards.md
A shell/ext-py/thrift-0.14.2/compat/win32/stdint.h
A shell/ext-py/thrift-0.14.2/setup.cfg
A shell/ext-py/thrift-0.14.2/setup.py
A shell/ext-py/thrift-0.14.2/src/TMultiplexedProcessor.py
A shell/ext-py/thrift-0.14.2/src/TRecursive.py
A shell/ext-py/thrift-0.14.2/src/TSCons.py
A shell/ext-py/thrift-0.14.2/src/TSerialization.py
A shell/ext-py/thrift-0.14.2/src/TTornado.py
A shell/ext-py/thrift-0.14.2/src/Thrift.py
A shell/ext-py/thrift-0.14.2/src/__init__.py
A shell/ext-py/thrift-0.14.2/src/compat.py
A shell/ext-py/thrift-0.14.2/src/ext/binary.cpp
A shell/ext-py/thrift-0.14.2/src/ext/binary.h
A shell/ext-py/thrift-0.14.2/src/ext/compact.cpp
A shell/ext-py/thrift-0.14.2/src/ext/compact.h
A shell/ext-py/thrift-0.14.2/src/ext/endian.h
A shell/ext-py/thrift-0.14.2/src/ext/module.cpp
A shell/ext-py/thrift-0.14.2/src/ext/protocol.h
A shell/ext-py/thrift-0.14.2/src/ext/protocol.tcc
A shell/ext-py/thrift-0.14.2/src/ext/types.cpp
A shell/ext-py/thrift-0.14.2/src/ext/types.h
A shell/ext-py/thrift-0.14.2/src/protocol/TBase.py
A shell/ext-py/thrift-0.14.2/src/protocol/TBinaryProtocol.py
A shell/ext-py/thrift-0.14.2/src/protocol/TCompactProtocol.py
A shell/ext-py/thrift-0.14.2/src/protocol/THeaderProtocol.py
A shell/ext-py/thrift-0.14.2/src/protocol/TJSONProtocol.py
A shell/ext-py/thrift-0.14.2/src/protocol/TMultiplexedProtocol.py
A shell/ext-py/thrift-0.14.2/src/protocol/TProtocol.py
A shell/ext-py/thrift-0.14.2/src/protocol/TProtocolDecorator.py
A shell/ext-py/thrift-0.14.2/src/protocol/__init__.py
A shell/ext-py/thrift-0.14.2/src/server/THttpServer.py
A shell/ext-py/thrift-0.14.2/src/server/TNonblockingServer.py
A shell/ext-py/thrift-0.14.2/src/server/TProcessPoolServer.py
A shell/ext-py/thrift-0.14.2/src/server/TServer.py
A shell/ext-py/thrift-0.14.2/src/server/__init__.py
A shell/ext-py/thrift-0.14.2/src/transport/THeaderTransport.py
A shell/ext-py/thrift-0.14.2/src/transport/THttpClient.py
A shell/ext-py/thrift-0.14.2/src/transport/TSSLSocket.py
A shell/ext-py/thrift-0.14.2/src/transport/TSocket.py
A shell/ext-py/thrift-0.14.2/src/transport/TTransport.py
A shell/ext-py/thrift-0.14.2/src/transport/TTwisted.py
A shell/ext-py/thrift-0.14.2/src/transport/TZlibTransport.py
A shell/ext-py/thrift-0.14.2/src/transport/__init__.py
A shell/ext-py/thrift-0.14.2/src/transport/sslcompat.py
A shell/ext-py/thrift-0.14.2/test/_import_local_thrift.py
A shell/ext-py/thrift-0.14.2/test/test_socket.py
A shell/ext-py/thrift-0.14.2/test/test_sslsocket.py
A shell/ext-py/thrift-0.14.2/test/test_thrift_file/TestServer.thrift
A shell/ext-py/thrift-0.14.2/test/thrift_TBinaryProtocol.py
A shell/ext-py/thrift-0.14.2/test/thrift_TCompactProtocol.py
A shell/ext-py/thrift-0.14.2/test/thrift_TNonblockingServer.py
A shell/ext-py/thrift-0.14.2/test/thrift_TZlibTransport.py
A shell/ext-py/thrift-0.14.2/test/thrift_json.py
A shell/ext-py/thrift-0.14.2/test/thrift_transport.py
M shell/impala-shell
M shell/make_shell_tarball.sh
M shell/packaging/make_python_package.sh
M tests/shell/test_shell_commandline.py
M tests/shell/test_shell_interactive.py
M tests/shell/util.py
70 files changed, 10,643 insertions(+), 38 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/18653/9
--
To view, visit http://gerrit.cloudera.org:8080/18653
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I94f86de9e2a6303151c2f0e6454b5f629cbc9444
Gerrit-Change-Number: 18653
Gerrit-PatchSet: 9
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 


[Impala-ASF-CR] IMPALA-11399: Download shell dependencies from PyPI

2022-06-27 Thread Michael Smith (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/18668

to look at the new patch set (#6).

Change subject: IMPALA-11399: Download shell dependencies from PyPI
..

IMPALA-11399: Download shell dependencies from PyPI

Downloads shell dependencies from PyPI rather than including them in the
repository. Simplifies dependency updates so we only need to update
requirements.txt.

Removes old dependencies from requirements.txt. configparser is included
in all versions of Python we support, and setuptools is generally
installed as a system package. PyPI configparser does include Python 3
extensions over Python 2's ConfigParser, but since we didn't include
configparser in the tarball I conclude we don't use those extensions.

Change-Id: I3b76e112ba9d0db19fae3e8eb15fd54a721f80fd
---
M bin/impala-config.sh
M infra/python/deps/download_requirements
M infra/python/deps/find_py26.py
M infra/python/deps/pip_download.py
M shell/.gitignore
D shell/ext-py/bitarray-2.3.0/CHANGE_LOG
D shell/ext-py/bitarray-2.3.0/LICENSE
D shell/ext-py/bitarray-2.3.0/README.rst
D shell/ext-py/bitarray-2.3.0/bitarray/__init__.py
D shell/ext-py/bitarray-2.3.0/bitarray/__init__.pyi
D shell/ext-py/bitarray-2.3.0/bitarray/_bitarray.c
D shell/ext-py/bitarray-2.3.0/bitarray/_util.c
D shell/ext-py/bitarray-2.3.0/bitarray/architecture.txt
D shell/ext-py/bitarray-2.3.0/bitarray/bitarray.h
D shell/ext-py/bitarray-2.3.0/bitarray/copy_n.txt
D shell/ext-py/bitarray-2.3.0/bitarray/py.typed
D shell/ext-py/bitarray-2.3.0/bitarray/pythoncapi_compat.h
D shell/ext-py/bitarray-2.3.0/bitarray/test_bitarray.py
D shell/ext-py/bitarray-2.3.0/bitarray/test_data.pickle
D shell/ext-py/bitarray-2.3.0/bitarray/test_util.py
D shell/ext-py/bitarray-2.3.0/bitarray/util.py
D shell/ext-py/bitarray-2.3.0/bitarray/util.pyi
D shell/ext-py/bitarray-2.3.0/contributing.md
D shell/ext-py/bitarray-2.3.0/setup.py
D shell/ext-py/bitarray-2.3.0/update_doc.py
D shell/ext-py/kerberos-1.3.1/MANIFEST.in
D shell/ext-py/kerberos-1.3.1/PKG-INFO
D shell/ext-py/kerberos-1.3.1/README.md
D shell/ext-py/kerberos-1.3.1/pysrc/kerberos.py
D shell/ext-py/kerberos-1.3.1/setup.cfg
D shell/ext-py/kerberos-1.3.1/setup.py
D shell/ext-py/kerberos-1.3.1/src/base64.c
D shell/ext-py/kerberos-1.3.1/src/base64.h
D shell/ext-py/kerberos-1.3.1/src/kerberos.c
D shell/ext-py/kerberos-1.3.1/src/kerberosbasic.c
D shell/ext-py/kerberos-1.3.1/src/kerberosbasic.h
D shell/ext-py/kerberos-1.3.1/src/kerberosgss.c
D shell/ext-py/kerberos-1.3.1/src/kerberosgss.h
D shell/ext-py/kerberos-1.3.1/src/kerberospw.c
D shell/ext-py/kerberos-1.3.1/src/kerberospw.h
D shell/ext-py/prettytable-0.7.2/CHANGELOG
D shell/ext-py/prettytable-0.7.2/COPYING
D shell/ext-py/prettytable-0.7.2/MANIFEST.in
D shell/ext-py/prettytable-0.7.2/PKG-INFO
D shell/ext-py/prettytable-0.7.2/README
D shell/ext-py/prettytable-0.7.2/prettytable.py
D shell/ext-py/prettytable-0.7.2/setup.cfg
D shell/ext-py/prettytable-0.7.2/setup.py
D shell/ext-py/sasl-0.2.1/LICENSE.txt
D shell/ext-py/sasl-0.2.1/MANIFEST.in
D shell/ext-py/sasl-0.2.1/recython.sh
D shell/ext-py/sasl-0.2.1/sasl/__init__.py
D shell/ext-py/sasl-0.2.1/sasl/saslwrapper.cpp
D shell/ext-py/sasl-0.2.1/sasl/saslwrapper.h
D shell/ext-py/sasl-0.2.1/sasl/saslwrapper.pyx
D shell/ext-py/sasl-0.2.1/setup.py
D shell/ext-py/six-1.14.0/CHANGES
D shell/ext-py/six-1.14.0/CONTRIBUTORS
D shell/ext-py/six-1.14.0/LICENSE
D shell/ext-py/six-1.14.0/MANIFEST.in
D shell/ext-py/six-1.14.0/README.rst
D shell/ext-py/six-1.14.0/setup.cfg
D shell/ext-py/six-1.14.0/setup.py
D shell/ext-py/six-1.14.0/six.py
D shell/ext-py/six-1.14.0/test_six.py
D shell/ext-py/six-1.14.0/tox.ini
D shell/ext-py/sqlparse-0.3.1/AUTHORS
D shell/ext-py/sqlparse-0.3.1/CHANGELOG
D shell/ext-py/sqlparse-0.3.1/LICENSE
D shell/ext-py/sqlparse-0.3.1/MANIFEST.in
D shell/ext-py/sqlparse-0.3.1/README.rst
D shell/ext-py/sqlparse-0.3.1/TODO
D shell/ext-py/sqlparse-0.3.1/setup.cfg
D shell/ext-py/sqlparse-0.3.1/setup.py
D shell/ext-py/sqlparse-0.3.1/sqlparse/__init__.py
D shell/ext-py/sqlparse-0.3.1/sqlparse/__main__.py
D shell/ext-py/sqlparse-0.3.1/sqlparse/cli.py
D shell/ext-py/sqlparse-0.3.1/sqlparse/compat.py
D shell/ext-py/sqlparse-0.3.1/sqlparse/engine/__init__.py
D shell/ext-py/sqlparse-0.3.1/sqlparse/engine/filter_stack.py
D shell/ext-py/sqlparse-0.3.1/sqlparse/engine/grouping.py
D shell/ext-py/sqlparse-0.3.1/sqlparse/engine/statement_splitter.py
D shell/ext-py/sqlparse-0.3.1/sqlparse/exceptions.py
D shell/ext-py/sqlparse-0.3.1/sqlparse/filters/__init__.py
D shell/ext-py/sqlparse-0.3.1/sqlparse/filters/aligned_indent.py
D shell/ext-py/sqlparse-0.3.1/sqlparse/filters/others.py
D shell/ext-py/sqlparse-0.3.1/sqlparse/filters/output.py
D shell/ext-py/sqlparse-0.3.1/sqlparse/filters/reindent.py
D shell/ext-py/sqlparse-0.3.1/sqlparse/filters/right_margin.py
D shell/ext-py/sqlparse-0.3.1/sqlparse/filters/tokens.py
D 

[Impala-ASF-CR] IMPALA-10545: Higher data cache write concurrency for SSDs

2022-06-27 Thread Michael Smith (Code Review)
Hello Joe McDonnell, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/18616

to look at the new patch set (#6).

Change subject: IMPALA-10545: Higher data_cache_write_concurrency for SSDs
..

IMPALA-10545: Higher data_cache_write_concurrency for SSDs

Provide device-specific defaults for `data_cache_write_concurrency`
based on device type. Rotational disks continue to use a default of 1,
while non-rotational disks use a default of 8. Option default of 0 is
used to select this mode.

Added unit test confirming concurrency based on mocked partitions and
block device info. Replaced FRIEND_TEST macros for a test that no longer
exists.

Started cluster with
start-impala-cluster.py --data_cache_dir=/home/michael/cache
  --data_cache_size=1G --impalad_args=--always_use_data_cache=true

and observed
> Default data_cache_write_concurrency=8 for non-rotational disk nvme0n1

Change-Id: I60761faa2710f4795f1f3eaf66da866b5553f609
---
M be/src/runtime/io/data-cache-test.cc
M be/src/runtime/io/data-cache.cc
M be/src/runtime/io/data-cache.h
M be/src/util/disk-info.cc
M be/src/util/disk-info.h
5 files changed, 149 insertions(+), 20 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/18616/6
--
To view, visit http://gerrit.cloudera.org:8080/18616
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I60761faa2710f4795f1f3eaf66da866b5553f609
Gerrit-Change-Number: 18616
Gerrit-PatchSet: 6
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Smith 


[Impala-ASF-CR] IMPALA-11398: Update flake8 for indent-size=2

2022-06-27 Thread Michael Smith (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/18669

to look at the new patch set (#4).

Change subject: IMPALA-11398: Update flake8 for indent-size=2
..

IMPALA-11398: Update flake8 for indent-size=2

Updates flake8 to the latest Python 2-compatible version so we can use
indent-size=2. Our code uses 2-space indents and we have previously
worked around or disabled flake8 checks that rely on 4-space indenting.

Change-Id: Ia701f6e3d86be451ae86d041b799c8a10aee2d93
---
M infra/python/deps/requirements.txt
M setup.cfg
2 files changed, 19 insertions(+), 11 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/18669/4
--
To view, visit http://gerrit.cloudera.org:8080/18669
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia701f6e3d86be451ae86d041b799c8a10aee2d93
Gerrit-Change-Number: 18669
Gerrit-PatchSet: 4
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-11279: Optimize plain count(*) queries for Iceberg tables

2022-06-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18574 )

Change subject: IMPALA-11279: Optimize plain count(*) queries for Iceberg tables
..


Patch Set 16: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/18574
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8e9c48bbba7ab2320fa80915e7001ce54f1ef6d9
Gerrit-Change-Number: 18574
Gerrit-PatchSet: 16
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Xianqing He 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 27 Jun 2022 15:29:10 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11279: Optimize plain count(*) queries for Iceberg tables

2022-06-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/18574 )

Change subject: IMPALA-11279: Optimize plain count(*) queries for Iceberg tables
..

IMPALA-11279: Optimize plain count(*) queries for Iceberg tables

This commit optimizes the plain count(*) queries for the Iceberg tables.
When the `org.apache.iceberg.SnapshotSummary#TOTAL_RECORDS_PROP` can be
retrieved from the current `org.apache.iceberg.BaseSnapshot#summary` of
the Iceberg table, this kind of query can be very fast. If this property
is not retrieved, the query will aggregate the `num_rows` of parquet
`file_metadata_` as usual.

Queries that can be optimized need to meet the following requirements:
 - SelectStmt does not have WHERE clause
 - SelectStmt does not have GROUP BY clause
 - SelectStmt does not have HAVING clause
 - The TableRefs of FROM clause contains only one BaseTableRef
 - Only for the Iceberg table
 - SelectList must contain 'count(*)' or 'count(constant)'
 - SelectList can contain other agg functions, e.g. min, sum, etc
 - SelectList can contain constant

Testing:
 - Added end-to-end test
 - Existing tests
 - Test it in a real cluster

Change-Id: I8e9c48bbba7ab2320fa80915e7001ce54f1ef6d9
Reviewed-on: http://gerrit.cloudera.org:8080/18574
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java
A fe/src/main/java/org/apache/impala/rewrite/CountStarToConstRule.java
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-compound-predicate-push-down.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-in-predicate-push-down.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-is-null-predicate-push-down.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert.test
A 
testdata/workloads/functional-query/queries/QueryTest/iceberg-plain-count-star-optimization.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-upper-lower-bound-metrics.test
M tests/query_test/test_iceberg.py
12 files changed, 496 insertions(+), 15 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/18574
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I8e9c48bbba7ab2320fa80915e7001ce54f1ef6d9
Gerrit-Change-Number: 18574
Gerrit-PatchSet: 17
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Xianqing He 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-11320: SHOW PARTITIONS on Iceberg table doesn't list the partitions

2022-06-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18641 )

Change subject: IMPALA-11320: SHOW PARTITIONS on Iceberg table doesn't list the 
partitions
..


Patch Set 8:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/10883/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/18641
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3b4399ae924dadb89875735b12a2f92453b6754c
Gerrit-Change-Number: 18641
Gerrit-PatchSet: 8
Gerrit-Owner: Xiaoqing Gao 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Xiaoqing Gao 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 27 Jun 2022 12:02:48 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11320: SHOW PARTITIONS on Iceberg table doesn't list the partitions

2022-06-27 Thread Xiaoqing Gao (Code Review)
Hello Tamas Mate, Zoltan Borok-Nagy, lipeng...@sensorsdata.cn, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/18641

to look at the new patch set (#8).

Change subject: IMPALA-11320: SHOW PARTITIONS on Iceberg table doesn't list the 
partitions
..

IMPALA-11320: SHOW PARTITIONS on Iceberg table doesn't list the partitions

Currently, SHOW PARTITIONS on Iceberg tables only outputs the partition
spec which is not too useful.

Instead it should output the concrete partitions, number of files, number
of rows in each partitions. E.g.:

SHOW PARTITIONS ice_ctas_hadoop_tables_part;

'{"d_month":"613"}',4,2
'{"d_month":"614"}',3,1
'{"d_month":"615"}',2,1

Testing:
 - Added end-to-end test

Change-Id: I3b4399ae924dadb89875735b12a2f92453b6754c
---
M common/thrift/CatalogObjects.thrift
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
M testdata/workloads/functional-query/queries/QueryTest/iceberg-alter.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-ctas.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test
A tests/custom_cluster/test_iceberg.py
11 files changed, 313 insertions(+), 101 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/41/18641/8
--
To view, visit http://gerrit.cloudera.org:8080/18641
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3b4399ae924dadb89875735b12a2f92453b6754c
Gerrit-Change-Number: 18641
Gerrit-PatchSet: 8
Gerrit-Owner: Xiaoqing Gao 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Xiaoqing Gao 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-11279: Optimize plain count(*) queries for Iceberg tables

2022-06-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18574 )

Change subject: IMPALA-11279: Optimize plain count(*) queries for Iceberg tables
..


Patch Set 16: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/18574
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8e9c48bbba7ab2320fa80915e7001ce54f1ef6d9
Gerrit-Change-Number: 18574
Gerrit-PatchSet: 16
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Xianqing He 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 27 Jun 2022 10:55:32 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11279: Optimize plain count(*) queries for Iceberg tables

2022-06-27 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18574 )

Change subject: IMPALA-11279: Optimize plain count(*) queries for Iceberg tables
..


Patch Set 15: Code-Review+2

Great work! Thanks, LiPenglin!


--
To view, visit http://gerrit.cloudera.org:8080/18574
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8e9c48bbba7ab2320fa80915e7001ce54f1ef6d9
Gerrit-Change-Number: 18574
Gerrit-PatchSet: 15
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Xianqing He 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 27 Jun 2022 10:55:09 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11279: Optimize plain count(*) queries for Iceberg tables

2022-06-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18574 )

Change subject: IMPALA-11279: Optimize plain count(*) queries for Iceberg tables
..


Patch Set 16:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8273/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/18574
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8e9c48bbba7ab2320fa80915e7001ce54f1ef6d9
Gerrit-Change-Number: 18574
Gerrit-PatchSet: 16
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Xianqing He 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 27 Jun 2022 10:55:33 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10927: Deflaky TestFetchAndSpooling.test rows sent counters

2022-06-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/18671 )

Change subject: IMPALA-10927: Deflaky 
TestFetchAndSpooling.test_rows_sent_counters
..

IMPALA-10927: Deflaky TestFetchAndSpooling.test_rows_sent_counters

IMPALA-8957 fixed the flakiness for test by adding a delay via
DEBUG_ACTION BPRS_BEFORE_ADD_ROWS in BlockingPlanRootSink::Send().
test_rows_sent_counters uses DEBUG_ACTION BPRS_BEFORE_ADD_BATCH when
spool_query_results is on, and uses BPRS_BEFORE_ADD_ROWS when
spool_query_results is off with assumption that result spooling is
disabled by default.

IMPALA-9856 enabled result spooling by default.
Following two issues were introduced for the test when result spooling
was enabled by default.
1) spool_query_results as false is not covered in the test since
extended dimension is added with spool_query_results as true.
2) Since the test uses BPRS_BEFORE_ADD_ROWS if spool_query_results is
not specified as true, it makes DEBUG_ACTION BPRS_BEFORE_ADD_ROWS to be
used for spool_query_results as true. This causes the test flaky since
no delay to be added in BufferedPlanRootSink::Send().

There is another bug in the test. It uses bool() to convert string to
boolean value, but the function returns true for any non empty string.

This patch changed the extended dimension setting for
spool_query_results as false, and made the test to use the right
DEBUG_ACTION for spool_query_results as true and false.
Also reverted the previous fixing which disabled the test for S3
testing environment.

Testing:
  - Ran the test more than 1 times without failure on Jenkins.

Change-Id: I790bbe1072357caf8ee11bb37644cf29dc8bea0f
Reviewed-on: http://gerrit.cloudera.org:8080/18671
Reviewed-by: Quanlong Huang 
Tested-by: Impala Public Jenkins 
---
M tests/query_test/test_fetch.py
1 file changed, 5 insertions(+), 6 deletions(-)

Approvals:
  Quanlong Huang: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/18671
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I790bbe1072357caf8ee11bb37644cf29dc8bea0f
Gerrit-Change-Number: 18671
Gerrit-PatchSet: 3
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Wenzhe Zhou 


[Impala-ASF-CR] IMPALA-10927: Deflaky TestFetchAndSpooling.test rows sent counters

2022-06-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18671 )

Change subject: IMPALA-10927: Deflaky 
TestFetchAndSpooling.test_rows_sent_counters
..


Patch Set 2: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/18671
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I790bbe1072357caf8ee11bb37644cf29dc8bea0f
Gerrit-Change-Number: 18671
Gerrit-PatchSet: 2
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Mon, 27 Jun 2022 10:54:37 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11279: Optimize plain count(*) queries for Iceberg tables

2022-06-27 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18574 )

Change subject: IMPALA-11279: Optimize plain count(*) queries for Iceberg tables
..


Patch Set 15: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/18574
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8e9c48bbba7ab2320fa80915e7001ce54f1ef6d9
Gerrit-Change-Number: 18574
Gerrit-PatchSet: 15
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Xianqing He 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 27 Jun 2022 10:17:03 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11034: Resolve schema of old data files in migrated Iceberg tables

2022-06-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18639 )

Change subject: IMPALA-11034: Resolve schema of old data files in migrated 
Iceberg tables
..


Patch Set 6:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/10882/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/18639
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I77570bbfc2fcc60c2756812d7210110e8cc11ccc
Gerrit-Change-Number: 18639
Gerrit-PatchSet: 6
Gerrit-Owner: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 27 Jun 2022 08:54:15 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11034: Resolve schema of old data files in migrated Iceberg tables

2022-06-27 Thread Code Review
Gergely Fürnstáhl has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/18639


Change subject: IMPALA-11034: Resolve schema of old data files in migrated 
Iceberg tables
..

IMPALA-11034: Resolve schema of old data files in migrated Iceberg tables

When external tables are converted to Iceberg, the data files remain
intact, thus missing field IDs. Previously, Impala used name based
column resolution in this case.

Added a feature to traverse through the data files before column
resolution and assign field IDs the same way as iceberg would, to be
able to use field ID based column resolutions.

Testing:

TBD

Change-Id: I77570bbfc2fcc60c2756812d7210110e8cc11ccc
---
M be/src/exec/orc-metadata-utils.cc
M be/src/exec/orc-metadata-utils.h
M be/src/exec/parquet/parquet-metadata-utils.cc
M be/src/exec/parquet/parquet-metadata-utils.h
4 files changed, 117 insertions(+), 21 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/18639/6
--
To view, visit http://gerrit.cloudera.org:8080/18639
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I77570bbfc2fcc60c2756812d7210110e8cc11ccc
Gerrit-Change-Number: 18639
Gerrit-PatchSet: 6
Gerrit-Owner: Gergely Fürnstáhl 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-11320: SHOW PARTITIONS on Iceberg table doesn't list the partitions

2022-06-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18641 )

Change subject: IMPALA-11320: SHOW PARTITIONS on Iceberg table doesn't list the 
partitions
..


Patch Set 7:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/10881/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/18641
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3b4399ae924dadb89875735b12a2f92453b6754c
Gerrit-Change-Number: 18641
Gerrit-PatchSet: 7
Gerrit-Owner: Xiaoqing Gao 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Xiaoqing Gao 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 27 Jun 2022 06:30:31 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10927: Deflaky TestFetchAndSpooling.test rows sent counters

2022-06-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18671 )

Change subject: IMPALA-10927: Deflaky 
TestFetchAndSpooling.test_rows_sent_counters
..


Patch Set 2:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8272/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/18671
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I790bbe1072357caf8ee11bb37644cf29dc8bea0f
Gerrit-Change-Number: 18671
Gerrit-PatchSet: 2
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Mon, 27 Jun 2022 06:12:25 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11320: SHOW PARTITIONS on Iceberg table doesn't list the partitions

2022-06-27 Thread Xiaoqing Gao (Code Review)
Xiaoqing Gao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18641 )

Change subject: IMPALA-11320: SHOW PARTITIONS on Iceberg table doesn't list the 
partitions
..


Patch Set 6:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/18641/6/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
File fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java:

http://gerrit.cloudera.org:8080/#/c/18641/6/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@575
PS6, Line 575: Returns
> nit: Return
Done


http://gerrit.cloudera.org:8080/#/c/18641/6/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@576
PS6, Line 576: resulset
> nit: result set
Done


http://gerrit.cloudera.org:8080/#/c/18641/6/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@585
PS6, Line 585: new TreeMap<>();
> Already ensured that the return value of this method is ordered by the 'par
Solved. Forgot to change back.



--
To view, visit http://gerrit.cloudera.org:8080/18641
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3b4399ae924dadb89875735b12a2f92453b6754c
Gerrit-Change-Number: 18641
Gerrit-PatchSet: 6
Gerrit-Owner: Xiaoqing Gao 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Xiaoqing Gao 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 27 Jun 2022 06:11:54 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10927: Deflaky TestFetchAndSpooling.test rows sent counters

2022-06-27 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18671 )

Change subject: IMPALA-10927: Deflaky 
TestFetchAndSpooling.test_rows_sent_counters
..


Patch Set 2:

hit IMPALA-11160


--
To view, visit http://gerrit.cloudera.org:8080/18671
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I790bbe1072357caf8ee11bb37644cf29dc8bea0f
Gerrit-Change-Number: 18671
Gerrit-PatchSet: 2
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Mon, 27 Jun 2022 06:11:23 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11320: SHOW PARTITIONS on Iceberg table doesn't list the partitions

2022-06-27 Thread Xiaoqing Gao (Code Review)
Hello Tamas Mate, Zoltan Borok-Nagy, lipeng...@sensorsdata.cn, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/18641

to look at the new patch set (#7).

Change subject: IMPALA-11320: SHOW PARTITIONS on Iceberg table doesn't list the 
partitions
..

IMPALA-11320: SHOW PARTITIONS on Iceberg table doesn't list the partitions

Currently, SHOW PARTITIONS on Iceberg tables only outputs the partition
spec which is not too useful.

Instead it should output the concrete partitions, number of files, number
of rows in each partitions. E.g.:

SHOW PARTITIONS ice_ctas_hadoop_tables_part;

'{"d_month":"613"}',4,2
'{"d_month":"614"}',3,1
'{"d_month":"615"}',2,1

Testing:
 - Added end-to-end test

Change-Id: I3b4399ae924dadb89875735b12a2f92453b6754c
---
M common/thrift/CatalogObjects.thrift
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
M testdata/workloads/functional-query/queries/QueryTest/iceberg-alter.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-ctas.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test
A tests/custom_cluster/test_iceberg.py
11 files changed, 315 insertions(+), 101 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/41/18641/7
--
To view, visit http://gerrit.cloudera.org:8080/18641
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3b4399ae924dadb89875735b12a2f92453b6754c
Gerrit-Change-Number: 18641
Gerrit-PatchSet: 7
Gerrit-Owner: Xiaoqing Gao 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Xiaoqing Gao 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-10927: Deflaky TestFetchAndSpooling.test rows sent counters

2022-06-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18671 )

Change subject: IMPALA-10927: Deflaky 
TestFetchAndSpooling.test_rows_sent_counters
..


Patch Set 2: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/8271/


--
To view, visit http://gerrit.cloudera.org:8080/18671
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I790bbe1072357caf8ee11bb37644cf29dc8bea0f
Gerrit-Change-Number: 18671
Gerrit-PatchSet: 2
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Mon, 27 Jun 2022 06:09:37 +
Gerrit-HasComments: No