[Impala-ASF-CR] IMPALA-10503: testdata load hits hive memory limit errors during hive inserts

2021-03-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17061 )

Change subject: IMPALA-10503: testdata load hits hive memory limit errors 
during hive inserts
..


Patch Set 10: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6941/


--
To view, visit http://gerrit.cloudera.org:8080/17061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idac5f054e814070b983f7f57aef4ea9d54252bb2
Gerrit-Change-Number: 17061
Gerrit-PatchSet: 10
Gerrit-Owner: Kurt Deschler 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Comment-Date: Tue, 09 Mar 2021 04:42:37 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10549: Register transactions from external frontend DML

2021-03-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17122 )

Change subject: IMPALA-10549: Register transactions from external frontend DML
..


Patch Set 12:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8313/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17122
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8863b8d9d281a5d164f10de9c5ee52cf3be63db
Gerrit-Change-Number: 17122
Gerrit-PatchSet: 12
Gerrit-Owner: Kurt Deschler 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Tue, 09 Mar 2021 01:19:46 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7712: Support Google Cloud Storage

2021-03-08 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17121 )

Change subject: IMPALA-7712: Support Google Cloud Storage
..


Patch Set 5:

(4 comments)

Here's my first pass. I'm also going to look at the ABFS change and see if 
anything is different.

http://gerrit.cloudera.org:8080/#/c/17121/5/be/src/runtime/io/disk-io-mgr.cc
File be/src/runtime/io/disk-io-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/17121/5/be/src/runtime/io/disk-io-mgr.cc@186
PS5, Line 186: true
We'll need to double check that GCS file handles work with the file handle 
cache. It would be ok to default to false until we validate it. In previous 
cases like ABFS, we started out with this disabled.

We would need to verify that unbuffer() is implemented and everything behaves 
well when there are thousands of these sitting around in the cache. It might 
just work.


http://gerrit.cloudera.org:8080/#/c/17121/5/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
File fe/src/main/java/org/apache/impala/common/FileSystemUtil.java:

http://gerrit.cloudera.org:8080/#/c/17121/5/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@863
PS5, Line 863:   if (fs instanceof GoogleHadoopFileSystem) {
 : curIter_.hasNext();
 :   }
I have seen an issue like this before on older versions of the S3 connector as 
well for recursively_list_partitions=false.

If recursively_list_partitions=false, we won't go through this codepath. We 
would have a FilterIterator that uses a fs.listStatusIterator(p) directly. See 
listStatus() in this file.

It might make sense for us to put this in the constructor for FilterIterator 
and do it for all filesystems.


http://gerrit.cloudera.org:8080/#/c/17121/5/testdata/bin/load-test-warehouse-snapshot.sh
File testdata/bin/load-test-warehouse-snapshot.sh:

http://gerrit.cloudera.org:8080/#/c/17121/5/testdata/bin/load-test-warehouse-snapshot.sh@80
PS5, Line 80:   hadoop fs -rm -r -skipTrash 
${FILESYSTEM_PREFIX}${TEST_WAREHOUSE_DIR}
I'm assuming that this command to remove any existing warehouse works for GCS, 
because it looks like this is what we would use for ABFS/ADLS.


http://gerrit.cloudera.org:8080/#/c/17121/5/tests/stress/test_insert_stress.py
File tests/stress/test_insert_stress.py:

http://gerrit.cloudera.org:8080/#/c/17121/5/tests/stress/test_insert_stress.py@81
PS5, Line 81:   @SkipIfGCS.jira(reason="IMPALA-10563")
Does IMPALA-10563 consistently reproduce? Do we have any idea if it is specific 
to the minicluster environment or if it could happen on a real cluster?



--
To view, visit http://gerrit.cloudera.org:8080/17121
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b
Gerrit-Change-Number: 17121
Gerrit-PatchSet: 5
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 09 Mar 2021 01:04:49 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10549: Register transactions from external frontend DML

2021-03-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17122 )

Change subject: IMPALA-10549: Register transactions from external frontend DML
..


Patch Set 12:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17122/12/fe/src/main/java/org/apache/impala/service/JniFrontend.java
File fe/src/main/java/org/apache/impala/service/JniFrontend.java:

http://gerrit.cloudera.org:8080/#/c/17122/12/fe/src/main/java/org/apache/impala/service/JniFrontend.java@695
PS12, Line 695:   public void addTransaction(byte[] thriftQueryContext) throws 
TransactionException, ImpalaException {
line too long (102 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/17122
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8863b8d9d281a5d164f10de9c5ee52cf3be63db
Gerrit-Change-Number: 17122
Gerrit-PatchSet: 12
Gerrit-Owner: Kurt Deschler 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Tue, 09 Mar 2021 01:01:08 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10549: Register transactions from external frontend DML

2021-03-08 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17122 )

Change subject: IMPALA-10549: Register transactions from external frontend DML
..


Patch Set 11:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17122/11/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/17122/11/fe/src/main/java/org/apache/impala/service/Frontend.java@2028
PS11, Line 2028: /**
   :* Opens a new transaction and registers it to the keepalive 
object.
   :* @param queryCtx context of the query that requires the 
transaction.
   :* @return the transaction id.
   :* @throws TransactionException
   :*/
> This comment is really for openTransaction and should be moved, addTransact
Done



--
To view, visit http://gerrit.cloudera.org:8080/17122
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8863b8d9d281a5d164f10de9c5ee52cf3be63db
Gerrit-Change-Number: 17122
Gerrit-PatchSet: 11
Gerrit-Owner: Kurt Deschler 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Tue, 09 Mar 2021 01:00:52 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10549: Register transactions from external frontend DML

2021-03-08 Thread Kurt Deschler (Code Review)
Hello Thomas Tauber-Marshall, Joe McDonnell, John Sherman, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17122

to look at the new patch set (#12).

Change subject: IMPALA-10549: Register transactions from external frontend DML
..

IMPALA-10549: Register transactions from external frontend DML

This change registers transactions that were started by an external
frontend so that coordinator keepalive can track them properly.

Testing: manually tested using DMLs from external frontend

Reviewed-by: John Sherman 
Change-Id: Ia8863b8d9d281a5d164f10de9c5ee52cf3be63db
---
M be/src/service/frontend.cc
M be/src/service/frontend.h
M be/src/service/impala-server.cc
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
5 files changed, 43 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/17122/12
--
To view, visit http://gerrit.cloudera.org:8080/17122
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia8863b8d9d281a5d164f10de9c5ee52cf3be63db
Gerrit-Change-Number: 17122
Gerrit-PatchSet: 12
Gerrit-Owner: Kurt Deschler 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-8680: Docker-based tests fail to archive the minicluster component logs

2021-03-08 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15898 )

Change subject: IMPALA-8680: Docker-based tests fail to archive the minicluster 
component logs
..


Patch Set 5:

I'm running the docker based tests with this to look at the output, then I'll 
review.


--
To view, visit http://gerrit.cloudera.org:8080/15898
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I23e25d42992cec47c593dc388bcf0bcef828c05e
Gerrit-Change-Number: 15898
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Garaguly 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Zoltan Garaguly 
Gerrit-Comment-Date: Tue, 09 Mar 2021 00:20:59 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10535: Add interface to ImpalaServer for execution of externally compiled statements

2021-03-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17104 )

Change subject: IMPALA-10535: Add interface to ImpalaServer for execution of 
externally compiled statements
..


Patch Set 13: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6940/


--
To view, visit http://gerrit.cloudera.org:8080/17104
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iace716dd67290f08441857dc02d2428b0e335eaa
Gerrit-Change-Number: 17104
Gerrit-PatchSet: 13
Gerrit-Owner: Kurt Deschler 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Mon, 08 Mar 2021 23:30:10 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10503: testdata load hits hive memory limit errors during hive inserts

2021-03-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17061 )

Change subject: IMPALA-10503: testdata load hits hive memory limit errors 
during hive inserts
..


Patch Set 10:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6941/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/17061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idac5f054e814070b983f7f57aef4ea9d54252bb2
Gerrit-Change-Number: 17061
Gerrit-PatchSet: 10
Gerrit-Owner: Kurt Deschler 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Comment-Date: Mon, 08 Mar 2021 23:08:53 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10503: testdata load hits hive memory limit errors during hive inserts

2021-03-08 Thread Aman Sinha (Code Review)
Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17061 )

Change subject: IMPALA-10503: testdata load hits hive memory limit errors 
during hive inserts
..


Patch Set 10: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/17061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idac5f054e814070b983f7f57aef4ea9d54252bb2
Gerrit-Change-Number: 17061
Gerrit-PatchSet: 10
Gerrit-Owner: Kurt Deschler 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Comment-Date: Mon, 08 Mar 2021 23:07:45 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10549: Register transactions from external frontend DML

2021-03-08 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17122 )

Change subject: IMPALA-10549: Register transactions from external frontend DML
..


Patch Set 11:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17122/11/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/17122/11/fe/src/main/java/org/apache/impala/service/Frontend.java@2028
PS11, Line 2028: /**
   :* Opens a new transaction and registers it to the keepalive 
object.
   :* @param queryCtx context of the query that requires the 
transaction.
   :* @return the transaction id.
   :* @throws TransactionException
   :*/
This comment is really for openTransaction and should be moved, addTransaction 
needs its own comment



--
To view, visit http://gerrit.cloudera.org:8080/17122
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8863b8d9d281a5d164f10de9c5ee52cf3be63db
Gerrit-Change-Number: 17122
Gerrit-PatchSet: 11
Gerrit-Owner: Kurt Deschler 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Mon, 08 Mar 2021 23:02:18 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10555: Fix Hit DCHECK in TmpFileGroup::RecoverWriteError

2021-03-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17140 )

Change subject: IMPALA-10555: Fix Hit DCHECK in TmpFileGroup::RecoverWriteError
..


Patch Set 4:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8312/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17140
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifd9aea4bf2fff634ea9a30bf6e87987be4e1c611
Gerrit-Change-Number: 17140
Gerrit-PatchSet: 4
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Mon, 08 Mar 2021 22:28:08 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10552: External Frontend CTAS support

2021-03-08 Thread John Sherman (Code Review)
John Sherman has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17145 )

Change subject: IMPALA-10552: External Frontend CTAS support
..


Patch Set 4:

(3 comments)

Thanks Joe for the feedback

http://gerrit.cloudera.org:8080/#/c/17145/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17145/4//COMMIT_MSG@12
PS4, Line 12:   - When this is set, the external frontend is responsible for
:   for moving and managing the results
> I assume the external frontend is also handling any metadata operations as
This is correct. I'll update the commit message to make that clearer.


http://gerrit.cloudera.org:8080/#/c/17145/4/be/src/exec/hdfs-table-sink.cc
File be/src/exec/hdfs-table-sink.cc:

http://gerrit.cloudera.org:8080/#/c/17145/4/be/src/exec/hdfs-table-sink.cc@127
PS4, Line 127:   if (HasExternalStagingDir()) {
 : // When an external FE has provided a staging path, we use 
it directly.
 : staging_dir_ = external_staging_dir_;
 :   } else {
 : staging_dir_ = Substitute("$0/_impala_insert_staging/$1",
 : table_desc_->hdfs_base_dir(), PrintId(state->query_id(), 
"_"));
 :   }
> The external staging directory is a staging directory from the point of vie
I can certainly change everything to be external output rather than external 
staging.

I'll verify why I'm modifying staging_dir_ here - reading the code I assumed I 
initially did this so BuildHdfsFileNames would use it correctly to build 
tmp_hdfs_* variables but indeed I don't think these paths would utilize them.


http://gerrit.cloudera.org:8080/#/c/17145/4/be/src/exec/hdfs-table-sink.cc@279
PS4, Line 279:   // The 0 padding on base and delta is to match the 
behavior of Hive since various
 :   // systems will expect a certain format for dynamic 
partition creation. Additionally
 :   // include an 0 statement id for delta directory so 
various Hive AcidUtils detect
 :   // the directory (such as AcidUtils.baseOrDeltaSubdir()). 
Multiple statements in a
 :   // single transaction is not supported.
 :   if (overwrite_) {
 : output_partition->final_hdfs_file_name_prefix += 
StringPrintf("/base_%07ld/",
 : write_id_);
 :   } else {
 : output_partition->final_hdfs_file_name_prefix += 
StringPrintf(
 : "/delta_%07ld_%07ld_/", write_id_, write_id_);
 :   }
> Should this be triggered by its own variable independently from the externa
It is a bit orthogonal. I think ideally Impala/Hive and external frontend 
implementations would standardize on something consistent. I took the paranoid 
approach of only applying this to external frontends as I do not want to cause 
any inadvertent regressions. I'm not sure it is currently worth it to plumb 
through a flag indicating which format is expected. If another implementation 
of a frontend shows up,I think it might be worth it. Feel free to push back if 
you feel strongly.



--
To view, visit http://gerrit.cloudera.org:8080/17145
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iae0ea4a832d8281c563427d0d7da1623bfce437b
Gerrit-Change-Number: 17145
Gerrit-PatchSet: 4
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Mon, 08 Mar 2021 22:15:51 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10555: Fix Hit DCHECK in TmpFileGroup::RecoverWriteError

2021-03-08 Thread Yida Wu (Code Review)
Yida Wu has uploaded a new patch set (#4). ( 
http://gerrit.cloudera.org:8080/17140 )

Change subject: IMPALA-10555: Fix Hit DCHECK in TmpFileGroup::RecoverWriteError
..

IMPALA-10555: Fix Hit DCHECK in TmpFileGroup::RecoverWriteError

The DCHECK error happens when there is an IO error during the spilling
process if the scratch directory is in a remote filesystem and doing
an error recovery(rewrite). Because currently the DCHECK only consider
the file number of local scratch files, it leads to a file number
requirement mismatch in the DCHECK.
Because the implementation of spilling to the local fs and the remote fs
are quite different, for simplify, we don't recover write error
for spilling to a remote fs in the current version. Instead, the errors
generated during spilling to remote would be returned directly to the
upper layer. So, we avoid the DCHECK logic for spilling to remote.

Tests:
* Added a unit test: TmpFileMgrTest::TestRemoteRemoveBuffer.
* Ran Unit Tests:
$IMPALA_HOME/be/build/latest/runtime/tmp-file-mgr-test

Change-Id: Ifd9aea4bf2fff634ea9a30bf6e87987be4e1c611
---
M be/src/runtime/tmp-file-mgr-test.cc
M be/src/runtime/tmp-file-mgr.cc
2 files changed, 38 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/17140/4
--
To view, visit http://gerrit.cloudera.org:8080/17140
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifd9aea4bf2fff634ea9a30bf6e87987be4e1c611
Gerrit-Change-Number: 17140
Gerrit-PatchSet: 4
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Wenzhe Zhou 


[Impala-ASF-CR] IMPALA-10552: External Frontend CTAS support

2021-03-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17145 )

Change subject: IMPALA-10552: External Frontend CTAS support
..


Patch Set 3: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/17145
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iae0ea4a832d8281c563427d0d7da1623bfce437b
Gerrit-Change-Number: 17145
Gerrit-PatchSet: 3
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Mon, 08 Mar 2021 21:21:03 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10494: Making use of the min/max column stats to improve min/max filters

2021-03-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17075 )

Change subject: IMPALA-10494: Making use of the min/max column stats to improve 
min/max filters
..


Patch Set 16:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/8311/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/17075
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I08581b44419bb8da5940cbf98502132acd1c86df
Gerrit-Change-Number: 17075
Gerrit-PatchSet: 16
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Comment-Date: Mon, 08 Mar 2021 21:20:45 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10494: Making use of the min/max column stats to improve min/max filters

2021-03-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17075 )

Change subject: IMPALA-10494: Making use of the min/max column stats to improve 
min/max filters
..


Patch Set 15:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/8310/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/17075
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I08581b44419bb8da5940cbf98502132acd1c86df
Gerrit-Change-Number: 17075
Gerrit-PatchSet: 15
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Comment-Date: Mon, 08 Mar 2021 21:12:51 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10494: Making use of the min/max column stats to improve min/max filters

2021-03-08 Thread Qifan Chen (Code Review)
Qifan Chen has uploaded a new patch set (#16). ( 
http://gerrit.cloudera.org:8080/17075 )

Change subject: IMPALA-10494: Making use of the min/max column stats to improve 
min/max filters
..

IMPALA-10494: Making use of the min/max column stats to improve min/max filters

This patch adds the functionality to compute the minimal and
the maximal value for a column of type integers, float or double
for parquet tables, and to make use of the new stats to discard
the min/max filters whose coverage are too close to the actual
range.

The computation and dislay of the new column min/max stats can be
controlled by two new Boolean query options (default to false):
  1. compute_column_minmax_stats
  2. show_column_minmax_stats

When enabled, two new columns 'Min' and 'Max' are added in the output
of the show column command as shown below.

set show_column_minmax_stats=true;
show column stats tpcds_parquet.store_sales;
+---+--+-...---+-+-+
| Column| Type |   #Falses | Min | Max |
+---+--+-...---+-+-+
| ss_sold_time_sk   | INT  |   -1  | 28800   | 75599   |
| ss_item_sk| BIGINT   |   -1  | 1   | 18000   |
| ss_customer_sk| INT  |   -1  | 1   | 10  |
| ss_cdemo_sk   | INT  |   -1  | 15  | 1920797 |
| ss_hdemo_sk   | INT  |   -1  | 1   | 7200|
| ss_addr_sk| INT  |   -1  | 1   | 5   |
| ss_store_sk   | INT  |   -1  | 1   | 10  |
| ss_promo_sk   | INT  |   -1  | 1   | 300 |
| ss_ticket_number  | BIGINT   |   -1  | 1   | 24  |
| ss_quantity   | INT  |   -1  | 1   | 100 |
| ss_wholesale_cost | DECIMAL(7,2) |   -1  | -1  | -1  |
| ss_list_price | DECIMAL(7,2) |   -1  | -1  | -1  |
| ss_sales_price| DECIMAL(7,2) |   -1  | -1  | -1  |
| ss_ext_discount_amt   | DECIMAL(7,2) |   -1  | -1  | -1  |
| ss_ext_sales_price| DECIMAL(7,2) |   -1  | -1  | -1  |
| ss_ext_wholesale_cost | DECIMAL(7,2) |   -1  | -1  | -1  |
| ss_ext_list_price | DECIMAL(7,2) |   -1  | -1  | -1  |
| ss_ext_tax| DECIMAL(7,2) |   -1  | -1  | -1  |
| ss_coupon_amt | DECIMAL(7,2) |   -1  | -1  | -1  |
| ss_net_paid   | DECIMAL(7,2) |   -1  | -1  | -1  |
| ss_net_paid_inc_tax   | DECIMAL(7,2) |   -1  | -1  | -1  |
| ss_net_profit | DECIMAL(7,2) |   -1  | -1  | -1  |
| ss_sold_date_sk   | INT  |   -1  | 2450816 | 2452642 |
+---+--+-...---+-+-+

Only the min/max values for non-partition columns are stored in HMS.
The min/max values for partition columns are computed in coordinator.

The min-max filters, in C++ class or protobuf form, are augmented to
deal with the always true state better. Once always true is set, the min
and max in the filter is no longer populated.

Testing:
 - Added new compute/show stats tests for integers, float and double
   column data types in compute-stats-column-minmax.test;
 - Added new tests in overlap_min_max_filters.test to demonstrate the
   usefulness of column stats to quickly disable useless filters;
 - Added tests in min-max-filter-test.cc to demonstrate method Or(),
   ToProtobuf() and constructor can deal with always true flag well.
 - core tests.

TODO:
 1. Test compute stats for timestamp and date columns;

Change-Id: I08581b44419bb8da5940cbf98502132acd1c86df
---
M be/src/exec/catalog-op-executor.cc
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/hdfs-scanner.h
M be/src/exec/incr-stats-util-test.cc
M be/src/exec/incr-stats-util.cc
M be/src/exec/incr-stats-util.h
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/service/hs2-util.cc
M be/src/service/hs2-util.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/min-max-filter-test.cc
M be/src/util/min-max-filter.cc
M be/src/util/min-max-filter.h
M common/thrift/CatalogObjects.thrift
M common/thrift/Frontend.thrift
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/main/java/org/apache/impala/analysis/ShowStatsStmt.java
M fe/src/main/java/org/apache/impala/catalog/ColumnStats.java
M fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M 

[Impala-ASF-CR] IMPALA-10494: Making use of the min/max column stats to improve min/max filters

2021-03-08 Thread Qifan Chen (Code Review)
Qifan Chen has uploaded a new patch set (#15). ( 
http://gerrit.cloudera.org:8080/17075 )

Change subject: IMPALA-10494: Making use of the min/max column stats to improve 
min/max filters
..

IMPALA-10494: Making use of the min/max column stats to improve min/max filters

This patch adds the functionality to compute the minimal and
the maximal value for a column of type integers, float or double
for parquet tables, and to make use of the new stats to discard
the min/max filters whose coverage are too close to the actual
range.

The computation and dislay of the new column min/max stats can be
controlled by two new Boolean query options (default to false):
  1. compute_column_minmax_stats
  2. show_column_minmax_stats

When enabled, two new columns 'Min' and 'Max' are added in the output
of the show column command as shown below.

set show_column_minmax_stats=true;
show column stats tpcds_parquet.store_sales;
+---+--+-...---+-+-+
| Column| Type |   #Falses | Min | Max |
+---+--+-...---+-+-+
| ss_sold_time_sk   | INT  |   -1  | 28800   | 75599   |
| ss_item_sk| BIGINT   |   -1  | 1   | 18000   |
| ss_customer_sk| INT  |   -1  | 1   | 10  |
| ss_cdemo_sk   | INT  |   -1  | 15  | 1920797 |
| ss_hdemo_sk   | INT  |   -1  | 1   | 7200|
| ss_addr_sk| INT  |   -1  | 1   | 5   |
| ss_store_sk   | INT  |   -1  | 1   | 10  |
| ss_promo_sk   | INT  |   -1  | 1   | 300 |
| ss_ticket_number  | BIGINT   |   -1  | 1   | 24  |
| ss_quantity   | INT  |   -1  | 1   | 100 |
| ss_wholesale_cost | DECIMAL(7,2) |   -1  | -1  | -1  |
| ss_list_price | DECIMAL(7,2) |   -1  | -1  | -1  |
| ss_sales_price| DECIMAL(7,2) |   -1  | -1  | -1  |
| ss_ext_discount_amt   | DECIMAL(7,2) |   -1  | -1  | -1  |
| ss_ext_sales_price| DECIMAL(7,2) |   -1  | -1  | -1  |
| ss_ext_wholesale_cost | DECIMAL(7,2) |   -1  | -1  | -1  |
| ss_ext_list_price | DECIMAL(7,2) |   -1  | -1  | -1  |
| ss_ext_tax| DECIMAL(7,2) |   -1  | -1  | -1  |
| ss_coupon_amt | DECIMAL(7,2) |   -1  | -1  | -1  |
| ss_net_paid   | DECIMAL(7,2) |   -1  | -1  | -1  |
| ss_net_paid_inc_tax   | DECIMAL(7,2) |   -1  | -1  | -1  |
| ss_net_profit | DECIMAL(7,2) |   -1  | -1  | -1  |
| ss_sold_date_sk   | INT  |   -1  | 2450816 | 2452642 |
+---+--+-...---+-+-+

Only the min/max values for non-partition columns are stored in HMS.
The min/max values for partition columns are computed in coordinator.

The min-max filters, in C++ class or protobuf form, are augmented to
deal with the always true state better. Once always true is set, the min
and max in the filter is no longer populated.

Testing:
 - Added new compute/show stats tests for integers, float and double
   column data types in compute-stats-column-minmax.test;
 - Added new tests in overlap_min_max_filters.test to demonstrate the
   usefulness of column stats to quickly disable useless filters;
 - Added tests in min-max-filter-test.cc to demonstrate method Or(),
   ToProtobuf() and constructor can deal with always true flag well.
 - core tests.

TODO:
 1. Test compute stats for timestamp and date columns;

Change-Id: I08581b44419bb8da5940cbf98502132acd1c86df
---
M be/src/exec/catalog-op-executor.cc
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/hdfs-scanner.h
M be/src/exec/incr-stats-util-test.cc
M be/src/exec/incr-stats-util.cc
M be/src/exec/incr-stats-util.h
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/runtime/runtime-filter-bank.cc
M be/src/service/hs2-util.cc
M be/src/service/hs2-util.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/min-max-filter-test.cc
M be/src/util/min-max-filter.cc
M be/src/util/min-max-filter.h
M common/thrift/CatalogObjects.thrift
M common/thrift/Frontend.thrift
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/main/java/org/apache/impala/analysis/ShowStatsStmt.java
M fe/src/main/java/org/apache/impala/catalog/ColumnStats.java
M fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M 

[Impala-ASF-CR] IMPALA-10552: External Frontend CTAS support

2021-03-08 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17145 )

Change subject: IMPALA-10552: External Frontend CTAS support
..


Patch Set 4:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/17145/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17145/4//COMMIT_MSG@12
PS4, Line 12:   - When this is set, the external frontend is responsible for
:   for moving and managing the results
I assume the external frontend is also handling any metadata operations as 
well. Is that right?


http://gerrit.cloudera.org:8080/#/c/17145/4/be/src/exec/hdfs-table-sink.cc
File be/src/exec/hdfs-table-sink.cc:

http://gerrit.cloudera.org:8080/#/c/17145/4/be/src/exec/hdfs-table-sink.cc@127
PS4, Line 127:   if (HasExternalStagingDir()) {
 : // When an external FE has provided a staging path, we use 
it directly.
 : staging_dir_ = external_staging_dir_;
 :   } else {
 : staging_dir_ = Substitute("$0/_impala_insert_staging/$1",
 : table_desc_->hdfs_base_dir(), PrintId(state->query_id(), 
"_"));
 :   }
The external staging directory is a staging directory from the point of view of 
an external frontend, but in this file, it doesn't use the staging codepath. It 
uses the direct writing codepath with an alternate location (i.e. 
ShouldSkipStaging() returns true and it customizes final_hdfs_file_name_prefix).

So, I think there is no need to set staging_dir_ to something different. That 
doesn't interact with the external staging directory codepath.

Separately, what do you think about calling this something other than a staging 
directory? Custom output location? External output directory?


http://gerrit.cloudera.org:8080/#/c/17145/4/be/src/exec/hdfs-table-sink.cc@279
PS4, Line 279:   // The 0 padding on base and delta is to match the 
behavior of Hive since various
 :   // systems will expect a certain format for dynamic 
partition creation. Additionally
 :   // include an 0 statement id for delta directory so 
various Hive AcidUtils detect
 :   // the directory (such as AcidUtils.baseOrDeltaSubdir()). 
Multiple statements in a
 :   // single transaction is not supported.
 :   if (overwrite_) {
 : output_partition->final_hdfs_file_name_prefix += 
StringPrintf("/base_%07ld/",
 : write_id_);
 :   } else {
 : output_partition->final_hdfs_file_name_prefix += 
StringPrintf(
 : "/delta_%07ld_%07ld_/", write_id_, write_id_);
 :   }
Should this be triggered by its own variable independently from the external 
staging directory? It seems orthogonal to where the output goes.



--
To view, visit http://gerrit.cloudera.org:8080/17145
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iae0ea4a832d8281c563427d0d7da1623bfce437b
Gerrit-Change-Number: 17145
Gerrit-PatchSet: 4
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Mon, 08 Mar 2021 20:52:53 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10552: External Frontend CTAS support

2021-03-08 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17145 )

Change subject: IMPALA-10552: External Frontend CTAS support
..


Patch Set 4: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17145/2/be/src/exec/hdfs-table-sink.cc
File be/src/exec/hdfs-table-sink.cc:

http://gerrit.cloudera.org:8080/#/c/17145/2/be/src/exec/hdfs-table-sink.cc@262
PS2, Line 262: // When an external FE has provided a staging directory we 
use that directly.
> Sure, I can add a comment here. I didn't initially because if I added a com
Done



--
To view, visit http://gerrit.cloudera.org:8080/17145
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iae0ea4a832d8281c563427d0d7da1623bfce437b
Gerrit-Change-Number: 17145
Gerrit-PatchSet: 4
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Mon, 08 Mar 2021 19:38:37 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10522: Support external use of frontend libraries

2021-03-08 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17115 )

Change subject: IMPALA-10522: Support external use of frontend libraries
..


Patch Set 9: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/17115
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4e3a84721ba196ec00773ce2923b19610b90edd9
Gerrit-Change-Number: 17115
Gerrit-PatchSet: 9
Gerrit-Owner: Kurt Deschler 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Mon, 08 Mar 2021 19:18:35 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10535: Add interface to ImpalaServer for execution of externally compiled statements

2021-03-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17104 )

Change subject: IMPALA-10535: Add interface to ImpalaServer for execution of 
externally compiled statements
..


Patch Set 13:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6940/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/17104
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iace716dd67290f08441857dc02d2428b0e335eaa
Gerrit-Change-Number: 17104
Gerrit-PatchSet: 13
Gerrit-Owner: Kurt Deschler 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Mon, 08 Mar 2021 17:47:02 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10535: Add interface to ImpalaServer for execution of externally compiled statements

2021-03-08 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17104 )

Change subject: IMPALA-10535: Add interface to ImpalaServer for execution of 
externally compiled statements
..


Patch Set 13: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/17104
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iace716dd67290f08441857dc02d2428b0e335eaa
Gerrit-Change-Number: 17104
Gerrit-PatchSet: 13
Gerrit-Owner: Kurt Deschler 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Mon, 08 Mar 2021 17:45:10 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10552: External Frontend CTAS support

2021-03-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17145 )

Change subject: IMPALA-10552: External Frontend CTAS support
..


Patch Set 4:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8309/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17145
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iae0ea4a832d8281c563427d0d7da1623bfce437b
Gerrit-Change-Number: 17145
Gerrit-PatchSet: 4
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Mon, 08 Mar 2021 17:26:15 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10546: Add ImpalaServer interface to retrieve BackendConfig from impalad

2021-03-08 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17116 )

Change subject: IMPALA-10546: Add ImpalaServer interface to retrieve 
BackendConfig from impalad
..


Patch Set 16: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/17116
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14a3cee29f1fc91f4431b7ea89053bb3fbfa5e69
Gerrit-Change-Number: 17116
Gerrit-PatchSet: 16
Gerrit-Owner: Kurt Deschler 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Mon, 08 Mar 2021 17:20:52 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10552: External Frontend CTAS support

2021-03-08 Thread John Sherman (Code Review)
John Sherman has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17145 )

Change subject: IMPALA-10552: External Frontend CTAS support
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17145/2/be/src/exec/hdfs-table-sink.h
File be/src/exec/hdfs-table-sink.h:

http://gerrit.cloudera.org:8080/#/c/17145/2/be/src/exec/hdfs-table-sink.h@249
PS2, Line 249:
> Done
Done


http://gerrit.cloudera.org:8080/#/c/17145/2/be/src/exec/hdfs-table-sink.cc
File be/src/exec/hdfs-table-sink.cc:

http://gerrit.cloudera.org:8080/#/c/17145/2/be/src/exec/hdfs-table-sink.cc@262
PS2, Line 262: output_partition->final_hdfs_file_name_prefix = 
Substitute("$0/$1/",
> Sure, I can add a comment here. I didn't initially because if I added a com
I added a comment here and to the commit message for this change set.



--
To view, visit http://gerrit.cloudera.org:8080/17145
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iae0ea4a832d8281c563427d0d7da1623bfce437b
Gerrit-Change-Number: 17145
Gerrit-PatchSet: 2
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Mon, 08 Mar 2021 17:07:50 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10552: External Frontend CTAS support

2021-03-08 Thread John Sherman (Code Review)
Hello Thomas Tauber-Marshall, Kurt Deschler, Joe McDonnell, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17145

to look at the new patch set (#4).

Change subject: IMPALA-10552: External Frontend CTAS support
..

IMPALA-10552: External Frontend CTAS support

- Adds the concept of an external staging dir to HdfsTableSink
  - This allows an external to specify the destination of the
  sink
  - When this is set, the external frontend is responsible for
  for moving and managing the results
  - External frontends may optionally supply a partition
  depth which acts as a hint to skip a certain number of
  partitions while creating directories for partitions. This
  is for when the external frontend has pre-created a
  certain number of the directories in staging (usually the
  static portion of a partition specification)/
- Modifies delta/base naming to include 0 prefix padding to
  match Hive for dynamic partitioning detection
- External frontends are responsible for doing authorization
  checks against the staging directory and it is assumed the
  external frontend service is not exposed directly to users.

Co-authored-by: Kurt Deschler 

Change-Id: Iae0ea4a832d8281c563427d0d7da1623bfce437b
Reviewed-by: Kurt Deschler 
---
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M common/thrift/DataSinks.thrift
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
5 files changed, 131 insertions(+), 20 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/17145/4
--
To view, visit http://gerrit.cloudera.org:8080/17145
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iae0ea4a832d8281c563427d0d7da1623bfce437b
Gerrit-Change-Number: 17145
Gerrit-PatchSet: 4
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-9470: Use Parquet Bloom filters - Part 1

2021-03-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17026 )

Change subject: IMPALA-9470: Use Parquet Bloom filters - Part 1
..


Patch Set 12:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/8308/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/17026
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7119c7161fa3658e561fc1265430cb90079d8287
Gerrit-Change-Number: 17026
Gerrit-PatchSet: 12
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Mon, 08 Mar 2021 16:58:17 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10552: External Frontend CTAS support

2021-03-08 Thread John Sherman (Code Review)
John Sherman has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17145 )

Change subject: IMPALA-10552: External Frontend CTAS support
..


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/17145/2/be/src/exec/hdfs-table-sink.cc
File be/src/exec/hdfs-table-sink.cc:

http://gerrit.cloudera.org:8080/#/c/17145/2/be/src/exec/hdfs-table-sink.cc@262
PS2, Line 262: output_partition->final_hdfs_file_name_prefix = 
Substitute("$0/$1/",
> Please add a comment to this effect. We may want to at least validate the p
Sure, I can add a comment here. I didn't initially because if I added a comment 
everywhere where this is possible it'd be a lot of comments. The commit message 
for the external planner port makes it a bit clearer too
"This allows different security policy to be applied to
  each type of connection. The external_fe_port should be considered
  a privileged service and should only be exposed to an external
  frontend that does user authentication and does authorization
  checks on generated plans"

In any case, I do not mind being paranoid via comments in such a case.


http://gerrit.cloudera.org:8080/#/c/17145/2/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
File fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java:

http://gerrit.cloudera.org:8080/#/c/17145/2/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java@243
PS2, Line 243: if (externalStagingDir_ != null) {
> Actually was thinking C++. This is Java an you are correct.
Done


http://gerrit.cloudera.org:8080/#/c/17145/2/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/17145/2/fe/src/main/java/org/apache/impala/service/Frontend.java@1803
PS2, Line 1803:   public static void addFinalizationParamsForInsert(
> No comment added?
I think you are looking at patch set 2 and not 3



--
To view, visit http://gerrit.cloudera.org:8080/17145
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iae0ea4a832d8281c563427d0d7da1623bfce437b
Gerrit-Change-Number: 17145
Gerrit-PatchSet: 2
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Mon, 08 Mar 2021 16:51:50 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9470: Use Parquet Bloom filters - Part 1

2021-03-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17026 )

Change subject: IMPALA-9470: Use Parquet Bloom filters - Part 1
..


Patch Set 12:

(182 comments)

http://gerrit.cloudera.org:8080/#/c/17026/12/be/src/thirdparty/xxhash/xxhash.h
File be/src/thirdparty/xxhash/xxhash.h:

http://gerrit.cloudera.org:8080/#/c/17026/12/be/src/thirdparty/xxhash/xxhash.h@70
PS12, Line 70: 
https://fastcompression.blogspot.com/2019/03/presenting-xxh3.html?showComment=1552696407071#c3490092340461170735
line too long (112 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/12/be/src/thirdparty/xxhash/xxhash.h@92
PS12, Line 92:  *  
https://fastcompression.blogspot.com/2018/03/xxhash-for-small-keys-impressive-power.html
line too long (96 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/12/be/src/thirdparty/xxhash/xxhash.h@113
PS12, Line 113: #  elif defined (__cplusplus) || (defined (__STDC_VERSION__) && 
(__STDC_VERSION__ >= 199901L) /* C99 */)
line too long (104 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/12/be/src/thirdparty/xxhash/xxhash.h@243
PS12, Line 243: #  define XXH3_64bits_reset_withSecret XXH_NAME2(XXH_NAMESPACE, 
XXH3_64bits_reset_withSecret)
line too long (93 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/12/be/src/thirdparty/xxhash/xxhash.h@253
PS12, Line 253: #  define XXH3_128bits_reset_withSeed XXH_NAME2(XXH_NAMESPACE, 
XXH3_128bits_reset_withSeed)
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/12/be/src/thirdparty/xxhash/xxhash.h@254
PS12, Line 254: #  define XXH3_128bits_reset_withSecret 
XXH_NAME2(XXH_NAMESPACE, XXH3_128bits_reset_withSecret)
line too long (95 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/12/be/src/thirdparty/xxhash/xxhash.h@270
PS12, Line 270: #define XXH_VERSION_NUMBER  (XXH_VERSION_MAJOR *100*100 + 
XXH_VERSION_MINOR *100 + XXH_VERSION_RELEASE)
line too long (103 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/12/be/src/thirdparty/xxhash/xxhash.h@429
PS12, Line 429:  * @param statePtr A pointer to an @ref XXH32_state_t allocated 
with @ref XXH32_createState().
line too long (94 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/12/be/src/thirdparty/xxhash/xxhash.h@441
PS12, Line 441: XXH_PUBLIC_API void XXH32_copyState(XXH32_state_t* dst_state, 
const XXH32_state_t* src_state);
line too long (94 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/12/be/src/thirdparty/xxhash/xxhash.h@476
PS12, Line 476: XXH_PUBLIC_API XXH_errorcode XXH32_update (XXH32_state_t* 
statePtr, const void* input, size_t length);
line too long (102 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/12/be/src/thirdparty/xxhash/xxhash.h@628
PS12, Line 628: XXH_PUBLIC_API void XXH64_copyState(XXH64_state_t* dst_state, 
const XXH64_state_t* src_state);
line too long (94 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/12/be/src/thirdparty/xxhash/xxhash.h@631
PS12, Line 631: XXH_PUBLIC_API XXH_errorcode XXH64_update (XXH64_state_t* 
statePtr, const void* input, size_t length);
line too long (102 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/12/be/src/thirdparty/xxhash/xxhash.h@700
PS12, Line 700: XXH_PUBLIC_API XXH64_hash_t XXH3_64bits_withSeed(const void* 
data, size_t len, XXH64_hash_t seed);
line too long (98 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/12/be/src/thirdparty/xxhash/xxhash.h@724
PS12, Line 724: XXH_PUBLIC_API XXH64_hash_t XXH3_64bits_withSecret(const void* 
data, size_t len, const void* secret, size_t secretSize);
line too long (120 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/12/be/src/thirdparty/xxhash/xxhash.h@743
PS12, Line 743: XXH_PUBLIC_API void XXH3_copyState(XXH3_state_t* dst_state, 
const XXH3_state_t* src_state);
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/12/be/src/thirdparty/xxhash/xxhash.h@756
PS12, Line 756: XXH_PUBLIC_API XXH_errorcode 
XXH3_64bits_reset_withSeed(XXH3_state_t* statePtr, XXH64_hash_t seed);
line too long (99 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/12/be/src/thirdparty/xxhash/xxhash.h@766
PS12, Line 766: XXH_PUBLIC_API XXH_errorcode 
XXH3_64bits_reset_withSecret(XXH3_state_t* statePtr, const void* secret, size_t 
secretSize);
line too long (121 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/12/be/src/thirdparty/xxhash/xxhash.h@768
PS12, Line 768: XXH_PUBLIC_API XXH_errorcode XXH3_64bits_update (XXH3_state_t* 
statePtr, const void* input, size_t length);
line too long (107 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/12/be/src/thirdparty/xxhash/xxhash.h@791
PS12, Line 791: XXH_PUBLIC_API XXH128_hash_t XXH3_128bits_withSeed(const void* 
data, size_t len, XXH64_hash_t seed);
line too long (100 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/12/be/src/thirdparty/xxhash/xxhash.h@792
PS12, Line 792: XXH_PUBLIC_API XXH128_hash_t XXH3_128bits_withSecret(const 
void* data, size_t len, const void* secret, size_t secretSize);
line too long (122 > 90)



[Impala-ASF-CR] IMPALA-9470: Use Parquet Bloom filters - Part 1

2021-03-08 Thread Daniel Becker (Code Review)
Daniel Becker has uploaded a new patch set (#12). ( 
http://gerrit.cloudera.org:8080/17026 )

Change subject: IMPALA-9470: Use Parquet Bloom filters - Part 1
..

IMPALA-9470: Use Parquet Bloom filters - Part 1

This change adds read support for Parquet Bloom filters for some types.
The supported Parquet type - Impala type pairs are the following:

 ---
|Parquet type |  Impala type|
|---|
|INT32|  TINYINT, SMALLINT, INT |
|INT64|  BIGINT |
|FLOAT|  FLOAT  |
|DOUBLE   |  DOUBLE |
|BYTE_ARRAY   |  STRING |
 ---

If a Bloom filter is available for a column that is fully dictionary
encoded, the Bloom filter is not used as the dictionary can give exact
results in filtering.

Testing:
  - Added tests/query_test/test_parquet_bloom_filter.py that tests that
Parquet Bloom filtering works for the supported types and that we do
not incorrectly discard row groups for the unsupported type VARCHAR.

Change-Id: I7119c7161fa3658e561fc1265430cb90079d8287
---
M LICENSE.txt
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
M be/src/exprs/expr-value.h
M be/src/exprs/literal.cc
M be/src/exprs/literal.h
M be/src/kudu/util/block_bloom_filter.cc
M be/src/kudu/util/block_bloom_filter.h
M be/src/runtime/bufferpool/buffer-pool-internal.h
M be/src/runtime/bufferpool/buffer-pool.cc
M be/src/runtime/bufferpool/buffer-pool.h
A be/src/thirdparty/xxhash/README.md
A be/src/thirdparty/xxhash/xxhash.h
M be/src/util/CMakeLists.txt
M be/src/util/bloom-filter.cc
M be/src/util/bloom-filter.h
A be/src/util/impala-bloom-filter-buffer-allocator.cc
A be/src/util/impala-bloom-filter-buffer-allocator.h
A be/src/util/parquet-bloom-filter.cc
A be/src/util/parquet-bloom-filter.h
M bin/rat_exclude_files.txt
M common/thrift/parquet.thrift
A testdata/data/parquet-bloom-filtering.parquet
A 
testdata/workloads/functional-query/queries/QueryTest/parquet-bloom-filter.test
A tests/query_test/test_parquet_bloom_filter.py
26 files changed, 6,820 insertions(+), 122 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/17026/12
--
To view, visit http://gerrit.cloudera.org:8080/17026
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7119c7161fa3658e561fc1265430cb90079d8287
Gerrit-Change-Number: 17026
Gerrit-PatchSet: 12
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-10552: External Frontend CTAS support

2021-03-08 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17145 )

Change subject: IMPALA-10552: External Frontend CTAS support
..


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/17145/2/be/src/exec/hdfs-table-sink.cc
File be/src/exec/hdfs-table-sink.cc:

http://gerrit.cloudera.org:8080/#/c/17145/2/be/src/exec/hdfs-table-sink.cc@262
PS2, Line 262: output_partition->final_hdfs_file_name_prefix = 
Substitute("$0/$1/",
> For now we are treating the external planner port as a trusted/protected se
Please add a comment to this effect. We may want to at least validate the 
prefix and depth arguments are consistent with what is expected here.


http://gerrit.cloudera.org:8080/#/c/17145/2/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
File fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java:

http://gerrit.cloudera.org:8080/#/c/17145/2/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java@243
PS2, Line 243: if (externalStagingDir_ != null) {
> externalStagingDir_ is a String object and if it is not set I do believe it
Actually was thinking C++. This is Java an you are correct.


http://gerrit.cloudera.org:8080/#/c/17145/2/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/17145/2/fe/src/main/java/org/apache/impala/service/Frontend.java@1803
PS2, Line 1803:   public static void addFinalizationParamsForInsert(
> Done
No comment added?



--
To view, visit http://gerrit.cloudera.org:8080/17145
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iae0ea4a832d8281c563427d0d7da1623bfce437b
Gerrit-Change-Number: 17145
Gerrit-PatchSet: 2
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Mon, 08 Mar 2021 15:56:58 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9470: Use Parquet Bloom filters - Part 1

2021-03-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17026 )

Change subject: IMPALA-9470: Use Parquet Bloom filters - Part 1
..


Patch Set 11:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/8307/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/17026
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7119c7161fa3658e561fc1265430cb90079d8287
Gerrit-Change-Number: 17026
Gerrit-PatchSet: 11
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Mon, 08 Mar 2021 15:44:54 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10552: External Frontend CTAS support

2021-03-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17145 )

Change subject: IMPALA-10552: External Frontend CTAS support
..


Patch Set 3:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6939/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/17145
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iae0ea4a832d8281c563427d0d7da1623bfce437b
Gerrit-Change-Number: 17145
Gerrit-PatchSet: 3
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Mon, 08 Mar 2021 15:39:01 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9470: Use Parquet Bloom filters - Part 1

2021-03-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17026 )

Change subject: IMPALA-9470: Use Parquet Bloom filters - Part 1
..


Patch Set 11:

(185 comments)

http://gerrit.cloudera.org:8080/#/c/17026/11/be/src/exec/parquet/hdfs-parquet-scanner.cc
File be/src/exec/parquet/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/17026/11/be/src/exec/parquet/hdfs-parquet-scanner.cc@1576
PS11, Line 1576: status = DeserializeThriftMsg(buffer->buffer(), 
header_size, true, bloom_filter_header);
line too long (92 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/11/be/src/exec/parquet/hdfs-parquet-scanner.cc@1819
PS11, Line 1819:   "Bloom filter data for file '$1'.", 
bloom_filter_header.numBytes, filename()));
line too long (93 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/11/be/src/thirdparty/xxhash/xxhash.h
File be/src/thirdparty/xxhash/xxhash.h:

http://gerrit.cloudera.org:8080/#/c/17026/11/be/src/thirdparty/xxhash/xxhash.h@70
PS11, Line 70: 
https://fastcompression.blogspot.com/2019/03/presenting-xxh3.html?showComment=1552696407071#c3490092340461170735
line too long (112 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/11/be/src/thirdparty/xxhash/xxhash.h@92
PS11, Line 92:  *  
https://fastcompression.blogspot.com/2018/03/xxhash-for-small-keys-impressive-power.html
line too long (96 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/11/be/src/thirdparty/xxhash/xxhash.h@113
PS11, Line 113: #  elif defined (__cplusplus) || (defined (__STDC_VERSION__) && 
(__STDC_VERSION__ >= 199901L) /* C99 */)
line too long (104 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/11/be/src/thirdparty/xxhash/xxhash.h@243
PS11, Line 243: #  define XXH3_64bits_reset_withSecret XXH_NAME2(XXH_NAMESPACE, 
XXH3_64bits_reset_withSecret)
line too long (93 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/11/be/src/thirdparty/xxhash/xxhash.h@253
PS11, Line 253: #  define XXH3_128bits_reset_withSeed XXH_NAME2(XXH_NAMESPACE, 
XXH3_128bits_reset_withSeed)
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/11/be/src/thirdparty/xxhash/xxhash.h@254
PS11, Line 254: #  define XXH3_128bits_reset_withSecret 
XXH_NAME2(XXH_NAMESPACE, XXH3_128bits_reset_withSecret)
line too long (95 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/11/be/src/thirdparty/xxhash/xxhash.h@270
PS11, Line 270: #define XXH_VERSION_NUMBER  (XXH_VERSION_MAJOR *100*100 + 
XXH_VERSION_MINOR *100 + XXH_VERSION_RELEASE)
line too long (103 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/11/be/src/thirdparty/xxhash/xxhash.h@429
PS11, Line 429:  * @param statePtr A pointer to an @ref XXH32_state_t allocated 
with @ref XXH32_createState().
line too long (94 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/11/be/src/thirdparty/xxhash/xxhash.h@441
PS11, Line 441: XXH_PUBLIC_API void XXH32_copyState(XXH32_state_t* dst_state, 
const XXH32_state_t* src_state);
line too long (94 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/11/be/src/thirdparty/xxhash/xxhash.h@476
PS11, Line 476: XXH_PUBLIC_API XXH_errorcode XXH32_update (XXH32_state_t* 
statePtr, const void* input, size_t length);
line too long (102 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/11/be/src/thirdparty/xxhash/xxhash.h@628
PS11, Line 628: XXH_PUBLIC_API void XXH64_copyState(XXH64_state_t* dst_state, 
const XXH64_state_t* src_state);
line too long (94 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/11/be/src/thirdparty/xxhash/xxhash.h@631
PS11, Line 631: XXH_PUBLIC_API XXH_errorcode XXH64_update (XXH64_state_t* 
statePtr, const void* input, size_t length);
line too long (102 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/11/be/src/thirdparty/xxhash/xxhash.h@700
PS11, Line 700: XXH_PUBLIC_API XXH64_hash_t XXH3_64bits_withSeed(const void* 
data, size_t len, XXH64_hash_t seed);
line too long (98 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/11/be/src/thirdparty/xxhash/xxhash.h@724
PS11, Line 724: XXH_PUBLIC_API XXH64_hash_t XXH3_64bits_withSecret(const void* 
data, size_t len, const void* secret, size_t secretSize);
line too long (120 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/11/be/src/thirdparty/xxhash/xxhash.h@743
PS11, Line 743: XXH_PUBLIC_API void XXH3_copyState(XXH3_state_t* dst_state, 
const XXH3_state_t* src_state);
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/11/be/src/thirdparty/xxhash/xxhash.h@756
PS11, Line 756: XXH_PUBLIC_API XXH_errorcode 
XXH3_64bits_reset_withSeed(XXH3_state_t* statePtr, XXH64_hash_t seed);
line too long (99 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/11/be/src/thirdparty/xxhash/xxhash.h@766
PS11, Line 766: XXH_PUBLIC_API XXH_errorcode 
XXH3_64bits_reset_withSecret(XXH3_state_t* statePtr, const void* secret, size_t 
secretSize);
line too long (121 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/11/be/src/thirdparty/xxhash/xxhash.h@768
PS11, Line 768: XXH_PUBLIC_API XXH_errorcode 

[Impala-ASF-CR] IMPALA-9470: Use Parquet Bloom filters - Part 1

2021-03-08 Thread Daniel Becker (Code Review)
Daniel Becker has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17026


Change subject: IMPALA-9470: Use Parquet Bloom filters - Part 1
..

IMPALA-9470: Use Parquet Bloom filters - Part 1

This change adds read support for Parquet Bloom filters for some types.
The supported Parquet type - Impala type pairs are the following:

 ---
|Parquet type |  Impala type|
|---|
|INT32|  TINYINT, SMALLINT, INT |
|INT64|  BIGINT |
|FLOAT|  FLOAT  |
|DOUBLE   |  DOUBLE |
|BYTE_ARRAY   |  STRING |
 ---

If a Bloom filter is available for a column that is fully dictionary
encoded, the Bloom filter is not used as the dictionary can give exact
results in filtering.

Testing:
  - Added tests/query_test/test_parquet_bloom_filter.py that tests that
Parquet Bloom filtering works for the supported types and that we do
not incorrectly discard row groups for the unsupported type VARCHAR.

Change-Id: I7119c7161fa3658e561fc1265430cb90079d8287
---
M LICENSE.txt
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
M be/src/exprs/expr-value.h
M be/src/exprs/literal.cc
M be/src/exprs/literal.h
M be/src/kudu/util/block_bloom_filter.cc
M be/src/kudu/util/block_bloom_filter.h
M be/src/runtime/bufferpool/buffer-pool-internal.h
M be/src/runtime/bufferpool/buffer-pool.cc
M be/src/runtime/bufferpool/buffer-pool.h
A be/src/thirdparty/xxhash/README.md
A be/src/thirdparty/xxhash/xxhash.h
M be/src/util/CMakeLists.txt
M be/src/util/bloom-filter.cc
M be/src/util/bloom-filter.h
A be/src/util/impala-bloom-filter-buffer-allocator.cc
A be/src/util/impala-bloom-filter-buffer-allocator.h
A be/src/util/parquet-bloom-filter.cc
A be/src/util/parquet-bloom-filter.h
M common/thrift/parquet.thrift
A testdata/data/parquet-bloom-filtering.parquet
A 
testdata/workloads/functional-query/queries/QueryTest/parquet-bloom-filter.test
A tests/query_test/test_parquet_bloom_filter.py
25 files changed, 6,815 insertions(+), 122 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/17026/11
--
To view, visit http://gerrit.cloudera.org:8080/17026
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I7119c7161fa3658e561fc1265430cb90079d8287
Gerrit-Change-Number: 17026
Gerrit-PatchSet: 11
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 


[Impala-ASF-CR] IMPALA-10552: External Frontend CTAS support

2021-03-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17145 )

Change subject: IMPALA-10552: External Frontend CTAS support
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8306/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17145
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iae0ea4a832d8281c563427d0d7da1623bfce437b
Gerrit-Change-Number: 17145
Gerrit-PatchSet: 3
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Mon, 08 Mar 2021 15:27:32 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7712: Support Google Cloud Storage

2021-03-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17121 )

Change subject: IMPALA-7712: Support Google Cloud Storage
..


Patch Set 5:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8305/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17121
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b
Gerrit-Change-Number: 17121
Gerrit-PatchSet: 5
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Mon, 08 Mar 2021 15:15:48 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10552: External Frontend CTAS support

2021-03-08 Thread John Sherman (Code Review)
John Sherman has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17145 )

Change subject: IMPALA-10552: External Frontend CTAS support
..


Patch Set 3:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/17145/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17145/1//COMMIT_MSG@22
PS1, Line 22:
> Add co-author for Kurt
Done


http://gerrit.cloudera.org:8080/#/c/17145/2/be/src/exec/hdfs-table-sink.h
File be/src/exec/hdfs-table-sink.h:

http://gerrit.cloudera.org:8080/#/c/17145/2/be/src/exec/hdfs-table-sink.h@249
PS2, Line 249:   // Stores the indices into the list of non-clustering columns 
of the target table that
> Extra line
Done


http://gerrit.cloudera.org:8080/#/c/17145/1/be/src/exec/hdfs-table-sink.h
File be/src/exec/hdfs-table-sink.h:

http://gerrit.cloudera.org:8080/#/c/17145/1/be/src/exec/hdfs-table-sink.h@249
PS1, Line 249:   // Stores the indices into the list of non-clustering columns 
of the target table that
> delete
Done


http://gerrit.cloudera.org:8080/#/c/17145/1/be/src/exec/hdfs-table-sink.cc
File be/src/exec/hdfs-table-sink.cc:

http://gerrit.cloudera.org:8080/#/c/17145/1/be/src/exec/hdfs-table-sink.cc@131
PS1, Line 131: staging_dir_ = Substitute("$0/_impala_insert_staging/$1",
> line too long (91 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/17145/1/be/src/exec/hdfs-table-sink.cc@232
PS1, Line 232: void HdfsTableSink::BuildHdfsFileNames(
> line too long (91 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/17145/1/be/src/exec/hdfs-table-sink.cc@506
PS1, Line 506:   for (int j = 0; j < partition_key_expr_evals_.size(); ++j) {
> line too long (91 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/17145/1/be/src/exec/hdfs-table-sink.cc@549
PS1, Line 549:   // partition_name_ss now holds the unique descriptor for this 
partition,
> line too long (95 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/17145/2/be/src/exec/hdfs-table-sink.cc
File be/src/exec/hdfs-table-sink.cc:

http://gerrit.cloudera.org:8080/#/c/17145/2/be/src/exec/hdfs-table-sink.cc@262
PS2, Line 262: // When an external FE has provided a staging directory we 
use that directly.
> Do we need any validation on the path here to avoid security issues writing
For now we are treating the external planner port as a trusted/protected 
service meant for deployment in scenarios you can lock down access to said 
port. We should certainly in the (near) future look into doing similar things 
that communication between impalads do that prevent users from impersonating a 
coordinator.


http://gerrit.cloudera.org:8080/#/c/17145/2/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
File fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java:

http://gerrit.cloudera.org:8080/#/c/17145/2/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java@90
PS2, Line 90:
> I would say "specifies depth" rather than hint as this needs to be enforced
Done


http://gerrit.cloudera.org:8080/#/c/17145/2/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java@243
PS2, Line 243: hdfsTableSink.setSort_columns(sortColumns_);
> externalStagingDir_ is not a pointer..
externalStagingDir_ is a String object and if it is not set I do believe it can 
be null.

(Maybe you read it as the int externalStagingPartitionDepth_?)


http://gerrit.cloudera.org:8080/#/c/17145/2/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/17145/2/fe/src/main/java/org/apache/impala/service/Frontend.java@1803
PS2, Line 1803:   // This is public to allow external frontends to utilize this 
method to fill in the
> Add comment why this is public static.
Done



--
To view, visit http://gerrit.cloudera.org:8080/17145
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iae0ea4a832d8281c563427d0d7da1623bfce437b
Gerrit-Change-Number: 17145
Gerrit-PatchSet: 3
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Mon, 08 Mar 2021 15:11:32 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7712: Support Google Cloud Storage

2021-03-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17121 )

Change subject: IMPALA-7712: Support Google Cloud Storage
..


Patch Set 4:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8304/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17121
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b
Gerrit-Change-Number: 17121
Gerrit-PatchSet: 4
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Mon, 08 Mar 2021 15:10:55 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10552: External Frontend CTAS support

2021-03-08 Thread John Sherman (Code Review)
Hello Thomas Tauber-Marshall, Kurt Deschler, Joe McDonnell, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17145

to look at the new patch set (#3).

Change subject: IMPALA-10552: External Frontend CTAS support
..

IMPALA-10552: External Frontend CTAS support

- Adds the concept of an external staging dir to HdfsTableSink
  - This allows an external to specify the destination of the
  sink
  - When this is set, the external frontend is responsible for
  for moving and managing the results
  - External frontends may optionally supply a partition
  depth which acts as a hint to skip a certain number of
  partitions while creating directories for partitions. This
  is for when the external frontend has pre-created a
  certain number of the directories in staging (usually the
  static portion of a partition specification)/
- Modifies delta/base naming to include 0 prefix padding to
  match Hive for dynamic partitioning detection

Co-authored-by: Kurt Deschler 

Change-Id: Iae0ea4a832d8281c563427d0d7da1623bfce437b
Reviewed-by: Kurt Deschler 
---
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M common/thrift/DataSinks.thrift
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
5 files changed, 129 insertions(+), 20 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/17145/3
--
To view, visit http://gerrit.cloudera.org:8080/17145
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iae0ea4a832d8281c563427d0d7da1623bfce437b
Gerrit-Change-Number: 17145
Gerrit-PatchSet: 3
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-7712: Support Google Cloud Storage

2021-03-08 Thread Quanlong Huang (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17121

to look at the new patch set (#5).

Change subject: IMPALA-7712: Support Google Cloud Storage
..

IMPALA-7712: Support Google Cloud Storage

This patch adds support for GCS(Google Cloud Storage). Using the
gcs-connector, the implementation is similar to other remote
FileSystems.

New flags for GCS:
 - num_gcs_io_threads: Number of GCS I/O threads. Defaults to be 16.
 - cache_gcs_file_handles: Enable the file handle cache for GCS files.
   Defaults to true.

Follow-up:
 - Support for spilling to GCS will be addressed in IMPALA-10561.
 - Some tests are skipped for further investigation (IMPALA-10562,
   IMPALA-10563).

Tests:
 - Compile and create hdfs test data on a GCE instance. Upload test data
   to a GCS bucket. Modify all locations in HMS DB to point to the GCS
   bucket. Remove some hdfs caching params. Run CORE tests.
 - Compile and load snapshot data to a GCS bucket. Run CORE tests.

Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b
---
M be/src/exec/hdfs-table-sink.cc
M be/src/runtime/io/disk-io-mgr-test.cc
M be/src/runtime/io/disk-io-mgr.cc
M be/src/runtime/io/disk-io-mgr.h
M be/src/runtime/io/scan-range.cc
M be/src/runtime/tmp-file-mgr.cc
M be/src/util/hdfs-util.cc
M be/src/util/hdfs-util.h
M bin/impala-config.sh
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
M java/executor-deps/pom.xml
M java/pom.xml
M testdata/bin/create-load-data.sh
M testdata/bin/load-test-warehouse-snapshot.sh
M testdata/bin/run-all.sh
M testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.py
M tests/authorization/test_ranger.py
M tests/common/impala_test_suite.py
M tests/common/skip.py
M tests/custom_cluster/test_admission_controller.py
M tests/custom_cluster/test_coordinators.py
M tests/custom_cluster/test_event_processing.py
M tests/custom_cluster/test_hdfs_fd_caching.py
M tests/custom_cluster/test_hive_parquet_codec_interop.py
M tests/custom_cluster/test_hive_text_codec_interop.py
M tests/custom_cluster/test_insert_behaviour.py
M tests/custom_cluster/test_lineage.py
M tests/custom_cluster/test_local_catalog.py
M tests/custom_cluster/test_local_tz_conversion.py
M tests/custom_cluster/test_metadata_replicas.py
M tests/custom_cluster/test_parquet_max_page_header.py
M tests/custom_cluster/test_permanent_udfs.py
M tests/custom_cluster/test_query_retries.py
M tests/custom_cluster/test_restart_services.py
M tests/custom_cluster/test_topic_update_frequency.py
M tests/data_errors/test_data_errors.py
M tests/failure/test_failpoints.py
M tests/metadata/test_catalogd_debug_actions.py
M tests/metadata/test_compute_stats.py
M tests/metadata/test_ddl.py
M tests/metadata/test_hdfs_encryption.py
M tests/metadata/test_hdfs_permissions.py
M tests/metadata/test_hms_integration.py
M tests/metadata/test_metadata_query_statements.py
M tests/metadata/test_partition_metadata.py
M tests/metadata/test_refresh_partition.py
M tests/metadata/test_reset_metadata.py
M tests/metadata/test_stale_metadata.py
M tests/metadata/test_testcase_builder.py
M tests/metadata/test_views_compatibility.py
M tests/query_test/test_acid.py
M tests/query_test/test_aggregation.py
M tests/query_test/test_date_queries.py
M tests/query_test/test_hbase_queries.py
M tests/query_test/test_hdfs_caching.py
M tests/query_test/test_insert_behaviour.py
M tests/query_test/test_insert_parquet.py
M tests/query_test/test_insert_permutation.py
M tests/query_test/test_join_queries.py
M tests/query_test/test_nested_types.py
M tests/query_test/test_observability.py
M tests/query_test/test_partitioning.py
M tests/query_test/test_resource_limits.py
M tests/query_test/test_scanners.py
M tests/shell/test_shell_commandline.py
M tests/stress/test_acid_stress.py
M tests/stress/test_ddl_stress.py
M tests/stress/test_insert_stress.py
M tests/util/filesystem_utils.py
70 files changed, 299 insertions(+), 61 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/17121/5
--
To view, visit http://gerrit.cloudera.org:8080/17121
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b
Gerrit-Change-Number: 17121
Gerrit-PatchSet: 5
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-7712: Support Google Cloud Storage

2021-03-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17121 )

Change subject: IMPALA-7712: Support Google Cloud Storage
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17121/4/tests/shell/test_shell_commandline.py
File tests/shell/test_shell_commandline.py:

http://gerrit.cloudera.org:8080/#/c/17121/4/tests/shell/test_shell_commandline.py@656
PS4, Line 656: .
flake8: E501 line too long (91 > 90 characters)



--
To view, visit http://gerrit.cloudera.org:8080/17121
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b
Gerrit-Change-Number: 17121
Gerrit-PatchSet: 4
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Mon, 08 Mar 2021 14:52:07 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7712: Support Google Cloud Storage

2021-03-08 Thread Quanlong Huang (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17121

to look at the new patch set (#4).

Change subject: IMPALA-7712: Support Google Cloud Storage
..

IMPALA-7712: Support Google Cloud Storage

This patch adds support for GCS(Google Cloud Storage). Using the
gcs-connector, the implementation is similar to other remote
FileSystems.

New flags for GCS:
 - num_gcs_io_threads: Number of GCS I/O threads. Defaults to be 16.
 - cache_gcs_file_handles: Enable the file handle cache for GCS files.
   Defaults to true.

Follow-up:
 - Support for spilling to GCS will be addressed in IMPALA-10561.
 - Some tests are skipped for further investigation (IMPALA-10562,
   IMPALA-10563).

Tests:
 - Compile and create hdfs test data on a GCE instance. Upload test data
   to a GCS bucket. Modify all locations in HMS DB to point to the GCS
   bucket. Remove some hdfs caching params. Run CORE tests.
 - Compile and load snapshot data to a GCS bucket. Run CORE tests.

Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b
---
M be/src/exec/hdfs-table-sink.cc
M be/src/runtime/io/disk-io-mgr-test.cc
M be/src/runtime/io/disk-io-mgr.cc
M be/src/runtime/io/disk-io-mgr.h
M be/src/runtime/io/scan-range.cc
M be/src/runtime/tmp-file-mgr.cc
M be/src/util/hdfs-util.cc
M be/src/util/hdfs-util.h
M bin/impala-config.sh
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
M java/executor-deps/pom.xml
M java/pom.xml
M testdata/bin/create-load-data.sh
M testdata/bin/load-test-warehouse-snapshot.sh
M testdata/bin/run-all.sh
M testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.py
M tests/authorization/test_ranger.py
M tests/common/impala_test_suite.py
M tests/common/skip.py
M tests/custom_cluster/test_admission_controller.py
M tests/custom_cluster/test_coordinators.py
M tests/custom_cluster/test_event_processing.py
M tests/custom_cluster/test_hdfs_fd_caching.py
M tests/custom_cluster/test_hive_parquet_codec_interop.py
M tests/custom_cluster/test_hive_text_codec_interop.py
M tests/custom_cluster/test_insert_behaviour.py
M tests/custom_cluster/test_lineage.py
M tests/custom_cluster/test_local_catalog.py
M tests/custom_cluster/test_local_tz_conversion.py
M tests/custom_cluster/test_metadata_replicas.py
M tests/custom_cluster/test_parquet_max_page_header.py
M tests/custom_cluster/test_permanent_udfs.py
M tests/custom_cluster/test_query_retries.py
M tests/custom_cluster/test_restart_services.py
M tests/custom_cluster/test_topic_update_frequency.py
M tests/data_errors/test_data_errors.py
M tests/failure/test_failpoints.py
M tests/metadata/test_catalogd_debug_actions.py
M tests/metadata/test_compute_stats.py
M tests/metadata/test_ddl.py
M tests/metadata/test_hdfs_encryption.py
M tests/metadata/test_hdfs_permissions.py
M tests/metadata/test_hms_integration.py
M tests/metadata/test_metadata_query_statements.py
M tests/metadata/test_partition_metadata.py
M tests/metadata/test_refresh_partition.py
M tests/metadata/test_reset_metadata.py
M tests/metadata/test_stale_metadata.py
M tests/metadata/test_testcase_builder.py
M tests/metadata/test_views_compatibility.py
M tests/query_test/test_acid.py
M tests/query_test/test_aggregation.py
M tests/query_test/test_date_queries.py
M tests/query_test/test_hbase_queries.py
M tests/query_test/test_hdfs_caching.py
M tests/query_test/test_insert_behaviour.py
M tests/query_test/test_insert_parquet.py
M tests/query_test/test_insert_permutation.py
M tests/query_test/test_join_queries.py
M tests/query_test/test_nested_types.py
M tests/query_test/test_observability.py
M tests/query_test/test_partitioning.py
M tests/query_test/test_resource_limits.py
M tests/query_test/test_scanners.py
M tests/shell/test_shell_commandline.py
M tests/stress/test_acid_stress.py
M tests/stress/test_ddl_stress.py
M tests/stress/test_insert_stress.py
M tests/util/filesystem_utils.py
70 files changed, 299 insertions(+), 61 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/17121/4
--
To view, visit http://gerrit.cloudera.org:8080/17121
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b
Gerrit-Change-Number: 17121
Gerrit-PatchSet: 4
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-10552: External Frontend CTAS support

2021-03-08 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17145 )

Change subject: IMPALA-10552: External Frontend CTAS support
..


Patch Set 2:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/17145/2/be/src/exec/hdfs-table-sink.h
File be/src/exec/hdfs-table-sink.h:

http://gerrit.cloudera.org:8080/#/c/17145/2/be/src/exec/hdfs-table-sink.h@249
PS2, Line 249:
Extra line


http://gerrit.cloudera.org:8080/#/c/17145/2/be/src/exec/hdfs-table-sink.cc
File be/src/exec/hdfs-table-sink.cc:

http://gerrit.cloudera.org:8080/#/c/17145/2/be/src/exec/hdfs-table-sink.cc@262
PS2, Line 262: output_partition->final_hdfs_file_name_prefix = 
Substitute("$0/$1/",
Do we need any validation on the path here to avoid security issues writing to 
an arbitrary HDFS directory? Mainly concerned because the path prefix is 
supplied externally.


http://gerrit.cloudera.org:8080/#/c/17145/2/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
File fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java:

http://gerrit.cloudera.org:8080/#/c/17145/2/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java@90
PS2, Line 90:   // This acts as a hint to the sink on how deep into the 
partition specification in
I would say "specifies depth" rather than hint as this needs to be enforced.


http://gerrit.cloudera.org:8080/#/c/17145/2/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java@243
PS2, Line 243: if (externalStagingDir_ != null) {
externalStagingDir_ is not a pointer..


http://gerrit.cloudera.org:8080/#/c/17145/2/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/17145/2/fe/src/main/java/org/apache/impala/service/Frontend.java@1803
PS2, Line 1803:   public static void addFinalizationParamsForInsert(
Add comment why this is public static.



--
To view, visit http://gerrit.cloudera.org:8080/17145
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iae0ea4a832d8281c563427d0d7da1623bfce437b
Gerrit-Change-Number: 17145
Gerrit-PatchSet: 2
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Mon, 08 Mar 2021 13:07:23 +
Gerrit-HasComments: Yes