[Impala-ASF-CR] IMPALA-6128: Add support for AES-CTR encryption when spilling to disk CFB mode is a stream cipher and is secure when used with a different nonce/IV for every message. However it can be

2018-01-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8861 )

Change subject: IMPALA-6128: Add support for AES-CTR encryption when spilling 
to disk CFB mode is a stream cipher and is secure when used with a different 
nonce/IV for every message. However it can be a performance bottleneck. CTR 
mode is also stream cipher and is secure
..


Patch Set 5: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/1687/


--
To view, visit http://gerrit.cloudera.org:8080/8861
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9debc240615dd8cdbf00ec8730cff62ffef52aff
Gerrit-Change-Number: 8861
Gerrit-PatchSet: 5
Gerrit-Owner: Xianda Ke 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Xianda Ke 
Gerrit-Comment-Date: Tue, 09 Jan 2018 05:17:28 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5052: Read and write signed integer logical types in Parquet

2018-01-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/8548 )

Change subject: IMPALA-5052: Read and write signed integer logical types in 
Parquet
..

IMPALA-5052: Read and write signed integer logical types in Parquet

This patch maps a signed integer logical type in parquet to a supported
Impala column type. This change introduces the following mapping -

  INT_8  -> TINYINT
  INT_16 -> SMALLINT
  INT_32 -> INT
  INT_64 -> BIGINT

Also, added a parquet file with the following schema for testing -

  schema {
optional int32 id;
optional int32 tinyint_col (INT_8);
optional int32 smallint_col (INT_16);
optional int32 int_col;
optional int64 bigint_col;
  }

Change-Id: I47a8371858c9597c6a440808cf6f933532468927
Reviewed-on: http://gerrit.cloudera.org:8080/8548
Reviewed-by: Tim Armstrong 
Reviewed-by: Tianyi Wang 
Tested-by: Impala Public Jenkins
---
M be/src/exec/hdfs-parquet-table-writer.cc
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
M testdata/data/README
A testdata/data/signed_integer_logical_types.parquet
M tests/query_test/test_insert_parquet.py
5 files changed, 99 insertions(+), 1 deletion(-)

Approvals:
  Tim Armstrong: Looks good to me, but someone else must approve
  Tianyi Wang: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/8548
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I47a8371858c9597c6a440808cf6f933532468927
Gerrit-Change-Number: 8548
Gerrit-PatchSet: 4
Gerrit-Owner: anujphadke 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 


[Impala-ASF-CR] IMPALA-5052: Read and write signed integer logical types in Parquet

2018-01-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8548 )

Change subject: IMPALA-5052: Read and write signed integer logical types in 
Parquet
..


Patch Set 3: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/8548
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I47a8371858c9597c6a440808cf6f933532468927
Gerrit-Change-Number: 8548
Gerrit-PatchSet: 3
Gerrit-Owner: anujphadke 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Tue, 09 Jan 2018 04:55:58 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3887: Wait for HDFS replication in data loading

2018-01-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/8916 )

Change subject: IMPALA-3887: Wait for HDFS replication in data loading
..

IMPALA-3887: Wait for HDFS replication in data loading

When the data loading finishes, it is possible for some HDFS blocks to
be under replicated. If impala gets the metadata before the replication
is done, some tests may fail. This patch adds a replication waiting step
in the data loading script.
Resubmitted with filesystem type check.

Change-Id: I64d9a8ea1d0a32b40047321b50a7139a8f48eac8
Reviewed-on: http://gerrit.cloudera.org:8080/8916
Reviewed-by: Vuk Ercegovac 
Reviewed-by: Alex Behm 
Tested-by: Impala Public Jenkins
---
M testdata/bin/create-load-data.sh
1 file changed, 19 insertions(+), 0 deletions(-)

Approvals:
  Vuk Ercegovac: Looks good to me, but someone else must approve
  Alex Behm: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/8916
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I64d9a8ea1d0a32b40047321b50a7139a8f48eac8
Gerrit-Change-Number: 8916
Gerrit-PatchSet: 3
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-6193: Track memory of incoming data streams

2018-01-08 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8914 )

Change subject: IMPALA-6193: Track memory of incoming data streams
..


Patch Set 3:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/8914/3/be/src/rpc/rpc-mgr.h
File be/src/rpc/rpc-mgr.h:

http://gerrit.cloudera.org:8080/#/c/8914/3/be/src/rpc/rpc-mgr.h@165
PS3, Line 165:   /// Passed to new services.
 :   MemTracker* mem_tracker_;
Is this still needed ?


http://gerrit.cloudera.org:8080/#/c/8914/3/be/src/runtime/exec-env.cc
File be/src/runtime/exec-env.cc:

http://gerrit.cloudera.org:8080/#/c/8914/3/be/src/runtime/exec-env.cc@343
PS3, Line 343:
nit: extra blank line


http://gerrit.cloudera.org:8080/#/c/8914/3/be/src/runtime/krpc-data-stream-mgr.cc
File be/src/runtime/krpc-data-stream-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/8914/3/be/src/runtime/krpc-data-stream-mgr.cc@175
PS3, Line 175: ctx->serialized_size()
Wouldn't the majority of the changes be not needed if we add a new interface 
ctx->GetTransferSize() instead ?

IMHO, this may be a more generic interface which may be used by other RPCs to 
be added in the future.

Sorry for not providing better advice earlier.


http://gerrit.cloudera.org:8080/#/c/8914/2/be/src/runtime/krpc-data-stream-mgr.inline.h
File be/src/runtime/krpc-data-stream-mgr.inline.h:

http://gerrit.cloudera.org:8080/#/c/8914/2/be/src/runtime/krpc-data-stream-mgr.inline.h@25
PS2, Line 25:
:
: const TransmitDataRequestPB* TransmitDataCtx::request() const {
:   DCHECK(initialized_);
:   return request_;
: }
:
: TransmitDataResponsePB* TransmitDataCtx::response() {
:   DCHECK(initialized_);
:   return response_;
: }
:
: kudu::rpc::RpcContext* TransmitDataCtx::rpc_context() {
:   DCHECK(initialized_);
:   return rpc_context_;
: }
:
: const kudu::Slice& TransmitDataCtx::tuple_offsets() const {
:   DCHECK(initialized_);
:   return tuple_offsets_;
: }
:
: const kudu::Slice& TransmitDataCtx::tuple_data() const {
:   DCHECK(initialized_);
:   return tuple_data_;
: }
:
: int64_t TransmitDataCtx::serialized_size() const {
:   DCHECK(initialized_);
:   return serialized_size_;
: }
:
: int64_t TransmitDataCtx::deserialized_size() const {
:   DCHECK(initialized_);
:   return deserialized_size_;
: }
:
: void TransmitDataCtx::RespondStatus(const Status& status) {
:   status.ToProto(response_->mutable_status());
:   rpc_context_->RespondSuccess();
: }
:
: void EndDataStreamCtx::RespondStatus(const Status& status) {
:   status.ToProto(response_->mutable_status());
:   rpc_context_->RespondSuccess();
: }
> My intention was to make the code safer by adding the DCHECKs, which makes
These changes may not be needed with the TransferSize() changes.



--
To view, visit http://gerrit.cloudera.org:8080/8914
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2df1204d2483313a8a18e5e3be6cec9e402614c4
Gerrit-Change-Number: 8914
Gerrit-PatchSet: 3
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Comment-Date: Tue, 09 Jan 2018 02:26:09 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3833: Fix invalid data handling in Sequence and RCFile scanners

2018-01-08 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8936 )

Change subject: IMPALA-3833: Fix invalid data handling in Sequence and RCFile 
scanners
..


Patch Set 2:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/8936/2/be/src/exec/hdfs-rcfile-scanner.cc
File be/src/exec/hdfs-rcfile-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8936/2/be/src/exec/hdfs-rcfile-scanner.cc@177
PS2, Line 177: ss << "Codec bad, corrupted ";
Can you include a bit more detail, i.e. mention that it's RCFile and include 
the name of the codec?


http://gerrit.cloudera.org:8080/#/c/8936/2/be/src/exec/hdfs-rcfile-scanner.cc@337
PS2, Line 337: ss << "Invalid bytes read col_idx: " << col_idx;
This could also do with a bit more detail.


http://gerrit.cloudera.org:8080/#/c/8936/2/be/src/exec/hdfs-rcfile-scanner.cc@344
PS2, Line 344: void HdfsRCFileScanner::GetCurrentKeyBuffer(int col_idx, bool 
skip_col_data,
How does this avoid buffer overflows if we don't pass in the length of the 
'key_buf_ptr' buffer? I think this should be defensively checking if it reads 
past the end of the buffer.


http://gerrit.cloudera.org:8080/#/c/8936/2/be/src/exec/hdfs-rcfile-scanner.cc@348
PS2, Line 348: GetVInt
These GetVInt() and GetVLong() interfaces seems fundamentally unsafe - they 
don't take the length of the buffer as input! I think we might need to change 
them so that the length of the buffer is passed in and they check the bounds.



--
To view, visit http://gerrit.cloudera.org:8080/8936
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic9cfc38af3f30c65ada9734eb471dbfa6ecdd74a
Gerrit-Change-Number: 8936
Gerrit-PatchSet: 2
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Pranay Singh
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Tue, 09 Jan 2018 01:47:27 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5152: Introduce metadata loading phase

2018-01-08 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8958 )

Change subject: IMPALA-5152: Introduce metadata loading phase
..


Patch Set 1:

(11 comments)

Patch looks good to me overall, have a few suggestions.

http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
File fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java:

http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java@367
PS1, Line 367:
IMO, this method should be documented, given this is the starting point of 
analysis.


http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@309
PS1, Line 309: private final Map loadedTables;
nit: My feeling is that we can call this referencedTables_ (and backup with a 
comment saying they are loaded). Reason being, we are separating loading and 
analysis part, so anything that reached here means it is already loaded. Please 
ignore if you disagree, I'm not too strong about this either.


http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2372
PS1, Line 2372: if (table.isLoaded() && table instanceof IncompleteTable) {
Preconditions.checkState(table.isLoaded())?


http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@835
PS1, Line 835: // Database does not exist.
Maybe catch DatabaseNotFoundEx? CatalogEx seems generic (I checked 
Catalog#getTable() seems to be throwing DBNotFoundEx after you removed 
ImpaladCatalog#getTable()).


http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@838
PS1, Line 838:   if (!tbl.isLoaded()) {
Should we handle tbl.isLoaded() && tbl instanceof IncompleteTable here and fail 
fast instead?


http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@859
PS1, Line 859:   private Set collectTableCandidates(StatementBase 
stmt, String defaultDb) {
I feel like this method belongs to the AnalysisCtx() and not frontend?

Then we could just refactor requestTableLoadAndWait() to 
requestTableLoadAndWait(List)


http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@910
PS1, Line 910: if (missingTbls.isEmpty()) return new LoadedTables(catalog, 
loadedTbls);
Should we update something in the timeline to reflect that all the metadata is 
locally available?


http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@931
PS1, Line 931: currCatalog != catalog
Not fully familiar with != implementation, so I could be wrong on this one.

Shouldn't we do something like (currCatalog.serviceId != catalog.serviceId) ? 
Else, any change to the catalog, like a new updates etc, can alter the 
hashCode() resulting in an incorrect equals() ?


http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@942
PS1, Line 942:   LOG.warn(String.format("Waiting for table metadata. " +
Should we dump current missingTables() too?


http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@951
PS1, Line 951:   getMissingTables(catalog, missingTbls, defaultDb, 
loadedTbls);
Make getMissingTables() always use getCatalog() instead of passing it around?


http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@958
PS1, Line 958: issueLoadRequest
Not sure I follow the need for this. L960 makes sure missingTbls is up to date 
and we always need to load the remaining ones right?

I don't see a case when issueLoadRequest is false and missingTbls is not empty. 
In other words, every time we pass the loop condition, we need to load 
something.



--
To view, visit http://gerrit.cloudera.org:8080/8958
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I68d32d5acd4a6f6bc6cedb05e6cc5cf604d24a55
Gerrit-Change-Number: 8958
Gerrit-PatchSet: 1
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Comment-Date: Tue, 09 Jan 2018 01:21:20 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5052: Read and write signed integer logical types in Parquet

2018-01-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8548 )

Change subject: IMPALA-5052: Read and write signed integer logical types in 
Parquet
..


Patch Set 3:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1686/


--
To view, visit http://gerrit.cloudera.org:8080/8548
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I47a8371858c9597c6a440808cf6f933532468927
Gerrit-Change-Number: 8548
Gerrit-PatchSet: 3
Gerrit-Owner: anujphadke 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Tue, 09 Jan 2018 01:16:39 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5014: Part 2: Round when casting decimal to timestamp

2018-01-08 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8969


Change subject: IMPALA-5014: Part 2: Round when casting decimal to timestamp
..

IMPALA-5014: Part 2: Round when casting decimal to timestamp

When there are too many digits to the right of the dot in a decimal, we
would always truncate when casting to timestamp. In this patch we change
the behavior to round instead of truncating when decimal_v2 is enabled.

Testing:
- Added some EE tests, ran BE tests on my machine.

Change-Id: I8fb3a7d976ab980b8572d7e9524850572bad57da
---
M be/src/exprs/decimal-operators-ir.cc
M be/src/exprs/decimal-operators.h
M testdata/workloads/functional-query/queries/QueryTest/decimal-exprs.test
3 files changed, 65 insertions(+), 14 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/8969/1
--
To view, visit http://gerrit.cloudera.org:8080/8969
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I8fb3a7d976ab980b8572d7e9524850572bad57da
Gerrit-Change-Number: 8969
Gerrit-PatchSet: 1
Gerrit-Owner: Taras Bobrovytsky 


[Impala-ASF-CR] IMPALA-3887: Wait for HDFS replication in data loading

2018-01-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8916 )

Change subject: IMPALA-3887: Wait for HDFS replication in data loading
..


Patch Set 2:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1685/


--
To view, visit http://gerrit.cloudera.org:8080/8916
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I64d9a8ea1d0a32b40047321b50a7139a8f48eac8
Gerrit-Change-Number: 8916
Gerrit-PatchSet: 2
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Mon, 08 Jan 2018 23:44:58 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5052: Read and write signed integer logical types in Parquet

2018-01-08 Thread Tianyi Wang (Code Review)
Tianyi Wang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8548 )

Change subject: IMPALA-5052: Read and write signed integer logical types in 
Parquet
..


Patch Set 3: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/8548
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I47a8371858c9597c6a440808cf6f933532468927
Gerrit-Change-Number: 8548
Gerrit-PatchSet: 3
Gerrit-Owner: anujphadke 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Mon, 08 Jan 2018 23:43:54 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5052: Read and write signed integer logical types in Parquet

2018-01-08 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8548 )

Change subject: IMPALA-5052: Read and write signed integer logical types in 
Parquet
..


Patch Set 3: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/8548
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I47a8371858c9597c6a440808cf6f933532468927
Gerrit-Change-Number: 8548
Gerrit-PatchSet: 3
Gerrit-Owner: anujphadke 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Mon, 08 Jan 2018 23:11:26 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5052: Read and write signed integer logical types in Parquet

2018-01-08 Thread anujphadke (Code Review)
anujphadke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8548 )

Change subject: IMPALA-5052: Read and write signed integer logical types in 
Parquet
..


Patch Set 3:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/8548/2/tests/query_test/test_insert_parquet.py
File tests/query_test/test_insert_parquet.py:

http://gerrit.cloudera.org:8080/#/c/8548/2/tests/query_test/test_insert_parquet.py@310
PS2, Line 310:   if line_split[0] == "id":
> Can you make this an elif chain and assert that the column name is one of t
Done


http://gerrit.cloudera.org:8080/#/c/8548/2/tests/query_test/test_insert_parquet.py@339
PS2, Line 339:   if line_split[0] == "id":
> Same here
Done


http://gerrit.cloudera.org:8080/#/c/8548/2/tests/query_test/test_insert_parquet.py@349
PS2, Line 349:
> nit: missing space after %
Done


http://gerrit.cloudera.org:8080/#/c/8548/2/tests/query_test/test_insert_parquet.py@351
PS2, Line 351: c
> same here
Done



--
To view, visit http://gerrit.cloudera.org:8080/8548
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I47a8371858c9597c6a440808cf6f933532468927
Gerrit-Change-Number: 8548
Gerrit-PatchSet: 3
Gerrit-Owner: anujphadke 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Mon, 08 Jan 2018 22:57:48 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5052: Read and write signed integer logical types in Parquet

2018-01-08 Thread anujphadke (Code Review)
Hello Tianyi Wang, Tim Armstrong,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8548

to look at the new patch set (#3).

Change subject: IMPALA-5052: Read and write signed integer logical types in 
Parquet
..

IMPALA-5052: Read and write signed integer logical types in Parquet

This patch maps a signed integer logical type in parquet to a supported
Impala column type. This change introduces the following mapping -

  INT_8  -> TINYINT
  INT_16 -> SMALLINT
  INT_32 -> INT
  INT_64 -> BIGINT

Also, added a parquet file with the following schema for testing -

  schema {
optional int32 id;
optional int32 tinyint_col (INT_8);
optional int32 smallint_col (INT_16);
optional int32 int_col;
optional int64 bigint_col;
  }

Change-Id: I47a8371858c9597c6a440808cf6f933532468927
---
M be/src/exec/hdfs-parquet-table-writer.cc
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
M testdata/data/README
A testdata/data/signed_integer_logical_types.parquet
M tests/query_test/test_insert_parquet.py
5 files changed, 99 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/8548/3
--
To view, visit http://gerrit.cloudera.org:8080/8548
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I47a8371858c9597c6a440808cf6f933532468927
Gerrit-Change-Number: 8548
Gerrit-PatchSet: 3
Gerrit-Owner: anujphadke 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 


[native-toolchain-CR] Bump LLVM to 5.0.0

2018-01-08 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8932 )

Change subject: Bump LLVM to 5.0.0
..


Patch Set 1:

They've released 5.0.1. Does it make sense to go straight there? (I've not 
looked up the changes.)


--
To view, visit http://gerrit.cloudera.org:8080/8932
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib9b7b97cb135d202eaa9a0bae03e722a2505b712
Gerrit-Change-Number: 8932
Gerrit-PatchSet: 1
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 08 Jan 2018 22:11:44 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set of characters.

2018-01-08 Thread Zoram Thanga (Code Review)
Zoram Thanga has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8349 )

Change subject: IMPALA-6059: Enhance ltrim()/rtrim() functions to trim any set 
of characters.
..


Patch Set 6:

(4 comments)

Changed implementation to use templates. Also ran perf measurements to check 
for regressions. Please see patch set #7

http://gerrit.cloudera.org:8080/#/c/8349/4/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/8349/4/be/src/exprs/string-functions-ir.cc@475
PS4, Line 475: tionContext* context,
> Sure, we can keep passing StringVal(" ") for the no-arg case. It's not used
There are test cases that pass NULL for chars_to_trim, so that's taken care.


http://gerrit.cloudera.org:8080/#/c/8349/6/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/8349/6/be/src/exprs/string-functions-ir.cc@451
PS6, Line 451: direction == LEADING || direction == BOTH
> To avoid a logic or, we can consider doing:
Changed this to a templatized implementation to avoid branching overhead.


http://gerrit.cloudera.org:8080/#/c/8349/6/be/src/exprs/string-functions-ir.cc@464
PS6, Line 464:   //return DoTrimStringImpl(str, unique_chars, direction);
> delete
Done


http://gerrit.cloudera.org:8080/#/c/8349/6/be/src/exprs/string-functions-ir.cc@466
PS6, Line 466:
 : StringVal StringFunctions::Trim(FunctionContext* context, const 
StringVal& str) {
 :   return DoTrimString(context, str, StringVal(" "), BOTH);
 : }
 :
 : StringVal StringFunctions::Ltrim(FunctionContext* context, const 
StringVal& str) {
 :   return DoTrimString(context, str, StringVal(" "), LEADING);
 : }
 :
 : StringVal StringFunctions::Rtrim(FunctionContext* context, const 
StringVal& str) {
 :   return DoTrimString(context, str, StringVal(" "), TRAILING);
 : }
> Can you please do a quick benchmark to make sure we didn't regress the perf
Have measured the performance of the new code and old trim/ltrim/rtrim on a 
tpch_parquet table that has 1.5 billion rows. The test query basically scans 
all the string columns of the table (8 of them) and calls trim/ltrim/rtrim on 
each element, where each operation is something like select 
count(trim(l_shipinstruct)), etc. The idea is to check for any unacceptable 
perf regression due to how the code has been refactored. Time spent in these 
function calls aggregated by count()s are the same in the old and new code, 
within margin of error.



--
To view, visit http://gerrit.cloudera.org:8080/8349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8a5ae3f59762e70c3268a01e14ed57a9e36b8d79
Gerrit-Change-Number: 8349
Gerrit-PatchSet: 6
Gerrit-Owner: Zoram Thanga 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Comment-Date: Mon, 08 Jan 2018 22:06:12 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6231: Implement decimal v2 fuzz test

2018-01-08 Thread Michael Brown (Code Review)
Michael Brown has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8898 )

Change subject: IMPALA-6231: Implement decimal_v2 fuzz test
..


Patch Set 3: Code-Review+1

Zach, since you were taking a look, too, do you want to give +2 when you're 
satisfied?


--
To view, visit http://gerrit.cloudera.org:8080/8898
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4328125de5c583ec8ead1f78d9a08703b18b2d85
Gerrit-Change-Number: 8898
Gerrit-PatchSet: 3
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Comment-Date: Mon, 08 Jan 2018 21:29:20 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6231: Implement decimal v2 fuzz test

2018-01-08 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/8898 )

Change subject: IMPALA-6231: Implement decimal_v2 fuzz test
..

IMPALA-6231: Implement decimal_v2 fuzz test

Implement a test that generates random decimal numbers in the pytest
framework, performs a random mathemtaical operation in Impala and
verifies that the result is correct by doing the same operating using
the Python decimal module. We try to generate not only completely random
decimal numbers, but also numbers that have interesting properties, such
as the number being a power of two.

Change-Id: I4328125de5c583ec8ead1f78d9a08703b18b2d85
---
A tests/query_test/test_decimal_fuzz.py
1 file changed, 248 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/98/8898/3
--
To view, visit http://gerrit.cloudera.org:8080/8898
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4328125de5c583ec8ead1f78d9a08703b18b2d85
Gerrit-Change-Number: 8898
Gerrit-PatchSet: 3
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Zach Amsden 


[Impala-ASF-CR] IMPALA-4835: Part 2: Allocate scan range buffers upfront

2018-01-08 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8707 )

Change subject: IMPALA-4835: Part 2: Allocate scan range buffers upfront
..


Patch Set 12:

Discovered a bug with partial reads. PS12 extends the relevant unit test and 
fixes it.


--
To view, visit http://gerrit.cloudera.org:8080/8707
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia243bf8b62feeea602b122e0503ea4ba7d3ee70f
Gerrit-Change-Number: 8707
Gerrit-PatchSet: 12
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 08 Jan 2018 20:59:28 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4835: Part 2: Allocate scan range buffers upfront

2018-01-08 Thread Tim Armstrong (Code Review)
Hello Tianyi Wang,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8707

to look at the new patch set (#12).

Change subject: IMPALA-4835: Part 2: Allocate scan range buffers upfront
..

IMPALA-4835: Part 2: Allocate scan range buffers upfront

This change is a step towards reserving memory for buffers from the
buffer pool and constraining per-scanner memory requirements. This
change restructures the DiskIoMgr code so that each ScanRange operates
with a fixed set of buffers that are allocated upfront and recycled as
the I/O mgr works through the ScanRange.

One major change is that ScanRanges get blocked when a buffer is not
available and get unblocked when a client returns a buffer via
ReturnBuffer(). I was able to remove the logic to maintain the
blocked_ranges_ list by instead adding a separate set for all ranges
that are active.

I plan to merge this along with the actual buffer pool switch, but
separated it out to allow review of the DiskIoMgr changes separate from
other aspects of the buffer pool switchover.

Testing:
* Ran core tests.
* TODO: run exhaustive tests.

Change-Id: Ia243bf8b62feeea602b122e0503ea4ba7d3ee70f
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-scan-node-mt.cc
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/scanner-context.cc
M be/src/runtime/bufferpool/buffer-pool.h
M be/src/runtime/io/disk-io-mgr-stress-test.cc
M be/src/runtime/io/disk-io-mgr-stress.cc
M be/src/runtime/io/disk-io-mgr-test.cc
M be/src/runtime/io/disk-io-mgr.cc
M be/src/runtime/io/disk-io-mgr.h
M be/src/runtime/io/request-context.cc
M be/src/runtime/io/request-context.h
M be/src/runtime/io/request-ranges.h
M be/src/runtime/io/scan-range.cc
M be/src/runtime/tmp-file-mgr.cc
M be/src/runtime/tmp-file-mgr.h
M be/src/util/bit-util-test.cc
M be/src/util/bit-util.h
M be/src/util/runtime-profile-counters.h
19 files changed, 776 insertions(+), 425 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/07/8707/12
--
To view, visit http://gerrit.cloudera.org:8080/8707
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia243bf8b62feeea602b122e0503ea4ba7d3ee70f
Gerrit-Change-Number: 8707
Gerrit-PatchSet: 12
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Tianyi Wang 


[Impala-ASF-CR] IMPALA-5191: Standardize column alias behavior

2018-01-08 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8801 )

Change subject: IMPALA-5191: Standardize column alias behavior
..


Patch Set 8:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/8801/8/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
File fe/src/main/java/org/apache/impala/analysis/QueryStmt.java:

http://gerrit.cloudera.org:8080/#/c/8801/8/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java@273
PS8, Line 273:   protected void checkForAmbiguousAliases(Expr expr, String 
errorPrefix)
Let's inline this check into the single caller.


http://gerrit.cloudera.org:8080/#/c/8801/8/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java@283
PS8, Line 283: corresponding
"corresponding" is not really explained, I think we should explain ordinal and 
alias substitution a little more.

Returns the substitution of top-level integer NumericLiteral with the 
corresponding select-list expr


http://gerrit.cloudera.org:8080/#/c/8801/8/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java@285
PS8, Line 285:* 'expr' is analyzed in this function regardless of whether
The returned expr is analyzed regardless of whether substitution was performed.


http://gerrit.cloudera.org:8080/#/c/8801/8/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java@288
PS8, Line 288:   protected Expr substituteOrdinalAlias(Expr expr, String 
errorPrefix,
substituteOrdinalOrAlias()


http://gerrit.cloudera.org:8080/#/c/8801/8/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java@290
PS8, Line 290: checkForAmbiguousAliases(expr, errorPrefix);
Let's move this after attempting to substitute ordinals to make the flow 
clearer.


http://gerrit.cloudera.org:8080/#/c/8801/8/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java@291
PS8, Line 291: // We can substitute either by ordinal or by alias.
Integrate this into the function comment.


http://gerrit.cloudera.org:8080/#/c/8801/8/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java@296
PS8, Line 296: if (substituteExpr == null) {
Slightly easier for me to read/reason with inverted logic, i.e.

if (substituteExpr != null) return substituteExpr;
if (expr instanceof SlotRef) ...


http://gerrit.cloudera.org:8080/#/c/8801/8/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java@314
PS8, Line 314:   protected void substituteOrdinalsAliases(List exprs, 
String errorPrefix,
substituteOrdinalsAndAliases()


http://gerrit.cloudera.org:8080/#/c/8801/8/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java@316
PS8, Line 316: ListIterator i = exprs.listIterator();
Less code to use a loop over indexes:

for (int i = 0; ...) {
  exprs.set(i, substitute(...));
}


http://gerrit.cloudera.org:8080/#/c/8801/7/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
File fe/src/main/java/org/apache/impala/analysis/SelectStmt.java:

http://gerrit.cloudera.org:8080/#/c/8801/7/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@548
PS7, Line 548: substituteOrdi
> substituteOrdinalsAliases() takes a list of Exprs as param, while we only h
Thanks for confirming and fixing!


http://gerrit.cloudera.org:8080/#/c/8801/7/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java:

http://gerrit.cloudera.org:8080/#/c/8801/7/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java@985
PS7, Line 985: // IMPALA-5191: In GROUP BY, HAVING, ORDER BY, aliases and 
ordinals must only be
> I found it cumbersome to combine the tests, because some of them pass, some
That's fine, maybe it's not worth consolidating the tests. I'll think more 
carefully about it.

Good catch with the weird ordinal substitution. It's great that we keep 
finding/fixing more edge cases!

To address it I think we should change SelectStmt.rewriteExprs() to not accept 
an expr rewrite in the GROUP BY, HAVING, and ORDER BY clauses if the original 
expression was not a NumericLiteral eligible to be substituted, but the 
rewritten expression is a NumericLiteral eligible for ordinal substitution.

Here's an example that demonstrates inconsistent behavior in the GROUP BY 
clause:

select int_col, count(*) from functional.alltypes group by 1*2, int_col;

The query throws an error with enable_expr_rewrites=true, but works with 
enable_expr_rewrites=false

What do you think?


http://gerrit.cloudera.org:8080/#/c/8801/8/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java:

http://gerrit.cloudera.org:8080/#/c/8801/8/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java@987
PS8, Line 987: AnalyzesOk("with w_test as (select '1' as one, 2 as two, '3' 
as three) "
Let's move this complex test to the end. Easier to follow the tests if we start 
with the simple cases first.

nit: Let's be consistent 

[Impala-ASF-CR] IMPALA-3526: update FE tests to pass on S3

2018-01-08 Thread Vuk Ercegovac (Code Review)
Hello Thomas Tauber-Marshall, Taras Bobrovytsky, Lars Volker, Dimitris 
Tsirogiannis, Alex Behm,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8890

to look at the new patch set (#3).

Change subject: IMPALA-3526: update FE tests to pass on S3
..

IMPALA-3526: update FE tests to pass on S3

Plans for some queries/same data may differ when
the same data is stored on S3 vs. HDFS. This is due
to block size differences used to enumerate range scans
on the different file systems. As a result, FE tests
have been disabled for S3 configurations. This has also
led to staleness in the tests that were specific to S3.
Most of the broken tests are due to staleness, but several
are due to true plan differences.

This change fixes stale tests and separates out those
test queries whose plans differ across S3 and HDFS.

Tests:
- ran FE tests on S3

Change-Id: I4c8221949e76b0a0b9192e6b56c4da5eeae04141
---
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/planner/S3PlannerTest.java
A testdata/workloads/functional-planner/queries/PlannerTest/join-order-hdfs.test
A testdata/workloads/functional-planner/queries/PlannerTest/join-order-s3.test
M testdata/workloads/functional-planner/queries/PlannerTest/join-order.test
M testdata/workloads/functional-planner/queries/PlannerTest/s3.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test
A testdata/workloads/functional-planner/queries/PlannerTest/tpch-hdfs.test
A testdata/workloads/functional-planner/queries/PlannerTest/tpch-s3.test
9 files changed, 2,171 insertions(+), 1,064 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/90/8890/3
--
To view, visit http://gerrit.cloudera.org:8080/8890
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4c8221949e76b0a0b9192e6b56c4da5eeae04141
Gerrit-Change-Number: 8890
Gerrit-PatchSet: 3
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-6307: CTAS statement fails with duplicate column exception.

2018-01-08 Thread Zoram Thanga (Code Review)
Zoram Thanga has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8930 )

Change subject: IMPALA-6307: CTAS statement fails with duplicate column 
exception.
..


Patch Set 1:

(3 comments)

Thanks for the comments. Please have a look at patch set #2.

http://gerrit.cloudera.org:8080/#/c/8930/1/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
File fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java:

http://gerrit.cloudera.org:8080/#/c/8930/1/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java@223
PS1, Line 223: Preconditions.checkState(colDefs.size() + 
partitionColDefs.size() == types.size());
 : for (int i = 0; i < colDefs.size(); ++i) 
colDefs.get(i).setType(types.get(i));
 : for (int i = 0; i < partitionColDefs.size(); ++i) {
 :   partitionColDefs.get(i).setType(types.get(i + 
colDefs.size()));
 : }
> I believe it would be nice to add a comment in AnalysisContext.java L405 to
Done


http://gerrit.cloudera.org:8080/#/c/8930/1/fe/src/main/java/org/apache/impala/analysis/TableDef.java
File fe/src/main/java/org/apache/impala/analysis/TableDef.java:

http://gerrit.cloudera.org:8080/#/c/8930/1/fe/src/main/java/org/apache/impala/analysis/TableDef.java@160
PS1, Line 160: dataLayout_.getPartitionColumnDefs().clear();
> I would add a reset() function in TableDataLayout() and put that there.
Done


http://gerrit.cloudera.org:8080/#/c/8930/1/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java:

http://gerrit.cloudera.org:8080/#/c/8930/1/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@1510
PS1, Line 1510: AnalyzesOk("create table functional.ctas_tbl partitioned by 
(year) as " +
  : "with tmp as (select a.timestamp_col, a.year from 
functional.alltypes a " +
  : "left join functional.alltypes b " +
  : "on b.timestamp_col between a.timestamp_col and 
a.timestamp_col) " +
  : "select a.timestamp_col, a.year from tmp a");
> Add a comment. It will not be clear to everyone what this thing is testing.
Done



--
To view, visit http://gerrit.cloudera.org:8080/8930
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iee053abecd4384e15eec8db10cb06f5ace159da2
Gerrit-Change-Number: 8930
Gerrit-PatchSet: 1
Gerrit-Owner: Zoram Thanga 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Comment-Date: Mon, 08 Jan 2018 19:52:20 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6307: CTAS statement fails with duplicate column exception.

2018-01-08 Thread Zoram Thanga (Code Review)
Hello Dimitris Tsirogiannis, Alex Behm,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8930

to look at the new patch set (#2).

Change subject: IMPALA-6307: CTAS statement fails with duplicate column 
exception.
..

IMPALA-6307: CTAS statement fails with duplicate column exception.

A CTAS statement with a 'partition by' clause causes the statement
to fail with a duplicate column name exception. This is happening
because on expression rewrite, the partition defs state is not reset.

IMPALA-5796 added TableDef::reset(). This patch expands the method by
adding calls to reset ColumnDefs and PartitionColumnDefs.

Testing:
  * Regression test added to AnalyzeDDLTest.
  * Exhaustive Jenkins build and test.

Change-Id: Iee053abecd4384e15eec8db10cb06f5ace159da2
---
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/TableDataLayout.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
5 files changed, 19 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/8930/2
--
To view, visit http://gerrit.cloudera.org:8080/8930
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iee053abecd4384e15eec8db10cb06f5ace159da2
Gerrit-Change-Number: 8930
Gerrit-PatchSet: 2
Gerrit-Owner: Zoram Thanga 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Zoram Thanga 


[Impala-ASF-CR] IMPALA-3526: update FE tests to pass on S3

2018-01-08 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8890 )

Change subject: IMPALA-3526: update FE tests to pass on S3
..


Patch Set 2:

For an explanation for the plan difference, pls have a look at this comment I 
posted on the JIRA: 
https://issues.apache.org/jira/browse/IMPALA-3526?focusedCommentId=16296008=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16296008

I think the "all" is there to contrast with other tpch tests (kudu, nested, 
views). I may have misread that one though.

My take is that there is a separate design discussion for this issue (see the 
JIRA comment). This patch is just for enabling FE tests for S3. I'm also 
keeping ADLS testing as separate for now.


--
To view, visit http://gerrit.cloudera.org:8080/8890
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4c8221949e76b0a0b9192e6b56c4da5eeae04141
Gerrit-Change-Number: 8890
Gerrit-PatchSet: 2
Gerrit-Owner: Vuk Ercegovac 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Mon, 08 Jan 2018 19:28:00 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-1767: [DOCS] Document new Boolean operators

2018-01-08 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8942 )

Change subject: IMPALA-1767: [DOCS] Document new Boolean operators
..


Patch Set 2:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/8942/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8942/2//COMMIT_MSG@7
PS2, Line 7: IMPALA-1767: [DOCS] Document new Boolean operators
please make this more informative (just include mention of IS [NOT]  TRUE | 
FALSE | UNKNOWN).


http://gerrit.cloudera.org:8080/#/c/8942/2/docs/shared/impala_common.xml
File docs/shared/impala_common.xml:

http://gerrit.cloudera.org:8080/#/c/8942/2/docs/shared/impala_common.xml@779
PS2, Line 779: IS [NOT] FALSE as equivalents for the 
functions
"builtin functions" instead of "functions" to match the blurb in the detailed 
section.


http://gerrit.cloudera.org:8080/#/c/8942/2/docs/topics/impala_operators.xml
File docs/topics/impala_operators.xml:

http://gerrit.cloudera.org:8080/#/c/8942/2/docs/topics/impala_operators.xml@1251
PS2, Line 1251: lets
nit: let


http://gerrit.cloudera.org:8080/#/c/8942/2/docs/topics/impala_operators.xml@1320
PS2, Line 1320: lets
nit: let


http://gerrit.cloudera.org:8080/#/c/8942/2/docs/topics/impala_operators.xml@1327
PS2, Line 1327: These operators are equivalent to the built-in conditional 
functions
sync this blurb with the one in the index (see comment there).


http://gerrit.cloudera.org:8080/#/c/8942/2/docs/topics/impala_operators.xml@1337
PS2, Line 1337: query error
must be same error as for "IS [NOT] NULL" for complex types? perhaps its less 
surprising if this is pointed out (or linked to)?


http://gerrit.cloudera.org:8080/#/c/8942/2/docs/topics/impala_operators.xml@1345
PS2, Line 1345: select assertion, b, b is true, b is false, b is unknown
  :   from boolean_test;
just for my own info, but why is this lower-case whereas sql inlined in text is 
upper-case?



--
To view, visit http://gerrit.cloudera.org:8080/8942
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iefebf210418ec2d47b154bd37166b76720f085bb
Gerrit-Change-Number: 8942
Gerrit-PatchSet: 2
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Mon, 08 Jan 2018 18:57:13 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3193: Show table's comment on show tables

2018-01-08 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8851 )

Change subject: IMPALA-3193: Show table's comment on show tables
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8851/4/fe/src/main/java/org/apache/impala/service/JniFrontend.java
File fe/src/main/java/org/apache/impala/service/JniFrontend.java:

http://gerrit.cloudera.org:8080/#/c/8851/4/fe/src/main/java/org/apache/impala/service/JniFrontend.java@304
PS4, Line 304: for (Table table: tables) {
 : org.apache.hadoop.hive.metastore.api.Table msTable
 :   = table.getMetaStoreTable(msClient);
 : if (msTable != null) {
 :   String comment = 
msTable.getParameters().get("comment");
 :   comments.add(comment != null ? comment : "");
 : }
 :   }
> Regarding your suggestion, there was no loaded tables when I attached debug
The catalog server is the only entity responsible for loading metadata and 
making calls to HMS (for that purpose). On the other hand, Impalad's catalog 
cache should not be doing anything like that (which is what you do in this 
block). JniFronted::getTableNames() retrieves and returns the names of tables 
that the local catalog cache knows of (no external communication with either 
the catalog server or the HMS).

So, what you should do is the following: If the table is fully loaded and has a 
comment, you can retrieve it directly from the Table object. If the table is 
not fully loaded, you don't do anything (return empty string)  even if the 
table has a comment stored in HMS. If the table is loaded but has no comment, 
return an empty string.

If the concepts of fully loaded metadata, catalog server and impalad catalog 
cache are not clear, I suggest you spend some time at the code before working 
on this change. Impala's metadata management system is a bit complex, so it 
takes some time to grasp all the details and the information flow.



--
To view, visit http://gerrit.cloudera.org:8080/8851
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I44f814af05db6f3c027718ade9f474f8b8153bcd
Gerrit-Change-Number: 8851
Gerrit-PatchSet: 4
Gerrit-Owner: Kim Jin Chul 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Kim Jin Chul 
Gerrit-Comment-Date: Mon, 08 Jan 2018 18:34:05 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4835: Part 2: Allocate scan range buffers upfront

2018-01-08 Thread Tim Armstrong (Code Review)
Hello Tianyi Wang,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8707

to look at the new patch set (#11).

Change subject: IMPALA-4835: Part 2: Allocate scan range buffers upfront
..

IMPALA-4835: Part 2: Allocate scan range buffers upfront

This change is a step towards reserving memory for buffers from the
buffer pool and constraining per-scanner memory requirements. This
change restructures the DiskIoMgr code so that each ScanRange operates
with a fixed set of buffers that are allocated upfront and recycled as
the I/O mgr works through the ScanRange.

One major change is that ScanRanges get blocked when a buffer is not
available and get unblocked when a client returns a buffer via
ReturnBuffer(). I was able to remove the logic to maintain the
blocked_ranges_ list by instead adding a separate set for all ranges
that are active.

I plan to merge this along with the actual buffer pool switch, but
separated it out to allow review of the DiskIoMgr changes separate from
other aspects of the buffer pool switchover.

Testing:
* Ran core tests.
* TODO: run exhaustive tests.

Change-Id: Ia243bf8b62feeea602b122e0503ea4ba7d3ee70f
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-scan-node-mt.cc
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/scanner-context.cc
M be/src/runtime/bufferpool/buffer-pool.h
M be/src/runtime/io/disk-io-mgr-stress-test.cc
M be/src/runtime/io/disk-io-mgr-stress.cc
M be/src/runtime/io/disk-io-mgr-test.cc
M be/src/runtime/io/disk-io-mgr.cc
M be/src/runtime/io/disk-io-mgr.h
M be/src/runtime/io/request-context.cc
M be/src/runtime/io/request-context.h
M be/src/runtime/io/request-ranges.h
M be/src/runtime/io/scan-range.cc
M be/src/runtime/tmp-file-mgr.cc
M be/src/runtime/tmp-file-mgr.h
M be/src/util/bit-util-test.cc
M be/src/util/bit-util.h
M be/src/util/runtime-profile-counters.h
19 files changed, 721 insertions(+), 407 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/07/8707/11
--
To view, visit http://gerrit.cloudera.org:8080/8707
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia243bf8b62feeea602b122e0503ea4ba7d3ee70f
Gerrit-Change-Number: 8707
Gerrit-PatchSet: 11
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Tianyi Wang 


[Impala-ASF-CR] IMPALA-3833: Fix invalid data handling in Sequence and RCFile scanners

2018-01-08 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8936 )

Change subject: IMPALA-3833: Fix invalid data handling in Sequence and RCFile 
scanners
..


Patch Set 2:

It's ok if we don't have a reproducible test case for bugs caught by the 
fuzzer, unless the fuzzing revealed a really big hole in test coverage. It's 
good if we have a test case but for cases where we weren't handling corrupt 
data correctly there are so many ways files could be corrupted it wasn't 
feasible to have a standalone test for each case.


--
To view, visit http://gerrit.cloudera.org:8080/8936
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic9cfc38af3f30c65ada9734eb471dbfa6ecdd74a
Gerrit-Change-Number: 8936
Gerrit-PatchSet: 2
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Pranay Singh
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Mon, 08 Jan 2018 17:41:43 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6231: Implement decimal v2 fuzz test

2018-01-08 Thread Michael Brown (Code Review)
Michael Brown has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8898 )

Change subject: IMPALA-6231: Implement decimal_v2 fuzz test
..


Patch Set 2:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/8898/1/tests/query_test/test_decimal_fuzz.py
File tests/query_test/test_decimal_fuzz.py:

http://gerrit.cloudera.org:8080/#/c/8898/1/tests/query_test/test_decimal_fuzz.py@46
PS1, Line 46:
> I'm just trying to remove all test dimensions that we normally use (such as
I have seen this done using two methods.

1. Do not include "vector" in the test method's formal parameter list.

2. Use create_single_exec_option_dimension()


http://gerrit.cloudera.org:8080/#/c/8898/1/tests/query_test/test_decimal_fuzz.py@70
PS1, Line 70:
> Yes, I think we want 38. It's ok that 38 shows up in both extreme precision
Done


http://gerrit.cloudera.org:8080/#/c/8898/1/tests/query_test/test_decimal_fuzz.py@230
PS1, Line 230:   expected_result = decimal.Decimal(value1) + 
decimal.Decimal(value2)
 : elif op == '-':
 :   expected_result = decimal.Decimal(value1) - 
decimal.Decimal(value2)
 : elif op == '*':
 :   expected_result = decimal.Decimal(value1) * 
decimal.Decimal(value2)
 : elif op == '/':
 :   expected_result = decimal.Decimal(value1) / 
decimal.Decimal(value2)
 : elif op == '%':
 :   expected_result = decimal.Decimal(value1) % 
decimal.Decimal(value2)
 : else:
> Are you suggesting to create a mapping m that maps, for example, "+" to ope
Done


http://gerrit.cloudera.org:8080/#/c/8898/1/tests/query_test/test_decimal_fuzz.py@248
PS1, Line 248:   for _ in xrange(self.itera
> Cool tip, but I decided against making each iteration a separate test. Too
Done



--
To view, visit http://gerrit.cloudera.org:8080/8898
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4328125de5c583ec8ead1f78d9a08703b18b2d85
Gerrit-Change-Number: 8898
Gerrit-PatchSet: 2
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Comment-Date: Mon, 08 Jan 2018 17:22:16 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3833: Fix invalid data handling in Sequence and RCFile scanners

2018-01-08 Thread Pranay Singh (Code Review)
Pranay Singh has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8936 )

Change subject: IMPALA-3833: Fix invalid data handling in Sequence and RCFile 
scanners
..


Patch Set 2:

> Can we test all the cases fixed this with 100% reproducible cases?
 > For Ex:
 > I commented out the change in  decompress.cc and ran the fuzz test
 > and it passed.

 > Can we test all the cases fixed this with 100% reproducible cases?
 > For Ex:
 > I commented out the change in  decompress.cc and ran the fuzz test
 > and it passed.

If you run a couple of times without the fix the assertion will be hit. I'll 
see if I can change the test so that we hit the assert every time


--
To view, visit http://gerrit.cloudera.org:8080/8936
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic9cfc38af3f30c65ada9734eb471dbfa6ecdd74a
Gerrit-Change-Number: 8936
Gerrit-PatchSet: 2
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Pranay Singh
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Mon, 08 Jan 2018 16:59:34 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6371: Additional check for delimiters

2018-01-08 Thread Adam Holley (Code Review)
Adam Holley has restored this change. ( http://gerrit.cloudera.org:8080/8959 )

Change subject: IMPALA-6371: Additional check for delimiters
..


Restored
--
To view, visit http://gerrit.cloudera.org:8080/8959
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: restore
Gerrit-Change-Id: If8dc335d39dd02f602cf93682bccf84b2c099dde
Gerrit-Change-Number: 8959
Gerrit-PatchSet: 3
Gerrit-Owner: Adam Holley 
Gerrit-Reviewer: Adam Holley 


[Impala-ASF-CR] IMPALA-6371: Additional check for delimiters

2018-01-08 Thread Adam Holley (Code Review)
Adam Holley has abandoned this change. ( http://gerrit.cloudera.org:8080/8959 )

Change subject: IMPALA-6371: Additional check for delimiters
..


Abandoned

Misplaced length check.
--
To view, visit http://gerrit.cloudera.org:8080/8959
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: abandon
Gerrit-Change-Id: If8dc335d39dd02f602cf93682bccf84b2c099dde
Gerrit-Change-Number: 8959
Gerrit-PatchSet: 3
Gerrit-Owner: Adam Holley 
Gerrit-Reviewer: Adam Holley 


[Impala-ASF-CR] IMPALA-6371: Additional check for delimiters

2018-01-08 Thread Adam Holley (Code Review)
Adam Holley has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/8959 )

Change subject: IMPALA-6371: Additional check for delimiters
..

IMPALA-6371: Additional check for delimiters

The check validates the codepoint of the Java char.

Testing:
- Added tests for valid/invalid unicode in HdfsStorageDescriptorTest.

Change-Id: If8dc335d39dd02f602cf93682bccf84b2c099dde
---
M fe/src/main/java/org/apache/impala/catalog/HdfsStorageDescriptor.java
M fe/src/test/java/org/apache/impala/catalog/HdfsStorageDescriptorTest.java
2 files changed, 24 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/8959/3
--
To view, visit http://gerrit.cloudera.org:8080/8959
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If8dc335d39dd02f602cf93682bccf84b2c099dde
Gerrit-Change-Number: 8959
Gerrit-PatchSet: 3
Gerrit-Owner: Adam Holley 
Gerrit-Reviewer: Adam Holley 


[Impala-ASF-CR] IMPALA-6371: Additional check for delimiters

2018-01-08 Thread Adam Holley (Code Review)
Adam Holley has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/8959 )

Change subject: IMPALA-6371: Additional check for delimiters
..

IMPALA-6371: Additional check for delimiters

The check validates the codepoint of the Java char.

Testing:
- Added tests for valid/invalid unicode in HdfsStorageDescriptorTest.

Change-Id: If8dc335d39dd02f602cf93682bccf84b2c099dde
---
M fe/src/main/java/org/apache/impala/catalog/HdfsStorageDescriptor.java
M fe/src/test/java/org/apache/impala/catalog/HdfsStorageDescriptorTest.java
2 files changed, 21 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/8959/2
--
To view, visit http://gerrit.cloudera.org:8080/8959
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If8dc335d39dd02f602cf93682bccf84b2c099dde
Gerrit-Change-Number: 8959
Gerrit-PatchSet: 2
Gerrit-Owner: Adam Holley 
Gerrit-Reviewer: Adam Holley 


[Impala-ASF-CR] IMPALA-5993: Fix the file offset in value parsing error

2018-01-08 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8747 )

Change subject: IMPALA-5993: Fix the file offset in value parsing error
..


Patch Set 6:

(25 comments)

http://gerrit.cloudera.org:8080/#/c/8747/6/be/src/exec/hdfs-scanner.h
File be/src/exec/hdfs-scanner.h:

http://gerrit.cloudera.org:8080/#/c/8747/6/be/src/exec/hdfs-scanner.h@368
PS6, Line 368:   /// This is called from WriteAlignedTuples.
Explain what error_file_offsets does in comment


http://gerrit.cloudera.org:8080/#/c/8747/6/be/src/exec/hdfs-scanner.h@382
PS6, Line 382: column
Wouldn't this report an invalid value (column inside row)?


http://gerrit.cloudera.org:8080/#/c/8747/6/be/src/exec/hdfs-scanner.h@382
PS6, Line 382:   /// Utility function to append an error message for an invalid 
column.
It seems confusing to me that there is now a LogColumnParseError and 
ReportColumnParseError. Do we need both?


http://gerrit.cloudera.org:8080/#/c/8747/6/be/src/exec/hdfs-scanner.h@391
PS6, Line 391:   //   - error_offsets is an out array. error_offsets[i] will be 
set to file offset
Start comments with three /// like the other lines.


http://gerrit.cloudera.org:8080/#/c/8747/6/be/src/exec/hdfs-scanner.h@393
PS6, Line 393:  is an out 8 byte integer
No need to mention its type since it's the same as in the declaration itself.


http://gerrit.cloudera.org:8080/#/c/8747/6/be/src/exec/hdfs-scanner.h@408
PS6, Line 408:   uint8_t* error_in_row, int64_t* error_file_offsets, 
int64_t* curr_file_offset)
This looks like it could have some performance impact. Did you do any perf 
experiments?

Have you considered not setting error_file_offsets inside WriteCompleteTuple 
and computing it from the field lengths during error reporting?


http://gerrit.cloudera.org:8080/#/c/8747/6/be/src/exec/hdfs-scanner.cc
File be/src/exec/hdfs-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8747/6/be/src/exec/hdfs-scanner.cc@233
PS6, Line 233: uint8_t* error_fields, uint8_t* error_in_row,
nit: wrap at 90 characters.


http://gerrit.cloudera.org:8080/#/c/8747/6/be/src/exec/hdfs-scanner.cc@254
PS6, Line 254: *curr_offset += len + 1;
Is +1 for the delimiter? If so, can you add a brief comment?


http://gerrit.cloudera.org:8080/#/c/8747/6/be/src/exec/hdfs-scanner.cc@578
PS6, Line 578: builder.CreateAdd(builder.CreateZExt(len, 
codegen->GetType(TYPE_BIGINT)),
Please indent function calls 4 characters, here and elsewhere.


http://gerrit.cloudera.org:8080/#/c/8747/6/be/src/exec/hdfs-scanner.cc@759
PS6, Line 759:   "Error parsing row: file: $0, (file offset: $1)",
nit: line wrapping.


http://gerrit.cloudera.org:8080/#/c/8747/6/be/src/exec/hdfs-scanner.cc@761
PS6, Line 761:   state_->LogError(ErrorMsg(TErrorCode::GENERAL, s));
Why does this not check state_->HasLogSpace() but the method below does?


http://gerrit.cloudera.org:8080/#/c/8747/6/be/src/exec/hdfs-scanner.cc@768
PS6, Line 768: at
I think the message was clearer before the change.


http://gerrit.cloudera.org:8080/#/c/8747/6/testdata/data/README
File testdata/data/README:

http://gerrit.cloudera.org:8080/#/c/8747/6/testdata/data/README@136
PS6, Line 136: problematic_rows_impala_5993
Can you think of a name that captures the nature of the problems? Most of the 
files in here are somewhat problematic. It could be "invalid_int.csv" (if it is 
an invalid int column).


http://gerrit.cloudera.org:8080/#/c/8747/6/tests/query_test/test_scanners.py
File tests/query_test/test_scanners.py:

http://gerrit.cloudera.org:8080/#/c/8747/6/tests/query_test/test_scanners.py@30
PS6, Line 30: from os.path import join as join_path
I think elsewhere we just use "os.path.join()" when we need it, so for 
consistency it would be good to do the same here.


http://gerrit.cloudera.org:8080/#/c/8747/6/tests/query_test/test_scanners.py@841
PS6, Line 841:  Tests that wrong file offset in value parsing error messages 
when
 :   scanning text files.
nit: missing word?


http://gerrit.cloudera.org:8080/#/c/8747/6/tests/query_test/test_scanners.py@842
PS6, Line 842:   scanning text files. When the text scanner hits an error, it 
always prints the end of
 :   the file as the offset, even if the error occurs in the middle 
of the file.
Rephrase to say what it should do, not what the problem was.


http://gerrit.cloudera.org:8080/#/c/8747/6/tests/query_test/test_scanners.py@852
PS6, Line 852: if pytest.config.option.namenode_http_address is None:
If you change this to "if p.c.o...: " it will also cover the empty string. We 
often return at the end of the if-branch and put the on the top level. You can 
move the initialization of hdfs_loc up to achieve that.


http://gerrit.cloudera.org:8080/#/c/8747/6/tests/query_test/test_scanners.py@860
PS6, Line 860:   def prepare(self):
I think it might be cleaner if every test prepared their own test data 
separately.



[Impala-ASF-CR] IMPALA-6371: Additional check for delimiters

2018-01-08 Thread Adam Holley (Code Review)
Adam Holley has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8959 )

Change subject: IMPALA-6371: Additional check for delimiters
..


Patch Set 1:

Jenkins build successfully run here: 
https://jenkins.impala.io/job/gerrit-verify-dryrun-external/58/.


--
To view, visit http://gerrit.cloudera.org:8080/8959
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If8dc335d39dd02f602cf93682bccf84b2c099dde
Gerrit-Change-Number: 8959
Gerrit-PatchSet: 1
Gerrit-Owner: Adam Holley 
Gerrit-Reviewer: Adam Holley 
Gerrit-Comment-Date: Mon, 08 Jan 2018 15:07:38 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4167: Support insert plan hints for CREATE TABLE AS SELECT

2018-01-08 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8400 )

Change subject: IMPALA-4167: Support insert plan hints for CREATE TABLE AS 
SELECT
..


Patch Set 6:

(8 comments)

Apologies for the delayed review. I had a few more questions. Let's discuss the 
parser changes in person if you feel I misunderstood the approach.

http://gerrit.cloudera.org:8080/#/c/8400/6//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8400/6//COMMIT_MSG@19
PS6, Line 19: more
add "more memory efficient"? That makes it more obvious why it is more 
efficient to do additional work (sorting).


http://gerrit.cloudera.org:8080/#/c/8400/6//COMMIT_MSG@27
PS6, Line 27: Add exchange node before insert even in case of unpartitioned 
tables.
I don't think this is true, is it? For unpartitioned tables, what would the 
shuffle / exchange do? Instead "shuffle" forces an exchange, "noshuffle" 
prevents adding one, and leaving it out results in the planner making a 
cost-based decision (DistributedPlanner.java:218).


http://gerrit.cloudera.org:8080/#/c/8400/6//COMMIT_MSG@35
PS6, Line 35: was moved from tbl_def_without_col_defs to statement rules.
Mention how you tested it in the commit message.


http://gerrit.cloudera.org:8080/#/c/8400/6/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

http://gerrit.cloudera.org:8080/#/c/8400/6/fe/src/main/cup/sql-parser.cup@1055
PS6, Line 1055: plan_hints:hints
Does opt_plan_hints not work here?


http://gerrit.cloudera.org:8080/#/c/8400/6/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
File fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java:

http://gerrit.cloudera.org:8080/#/c/8400/6/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java@69
PS6, Line 69:* Helper class for parsing.
Is this class still needed or a left-over from earlier patch-sets? Can you 
expand the class comment to explain why we need it and what it does?


http://gerrit.cloudera.org:8080/#/c/8400/6/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java:

http://gerrit.cloudera.org:8080/#/c/8400/6/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@1701
PS6, Line 1701: Multiple non-conflicting
Can you add one that has multiple distinct hints, like shuffle, clustered?


http://gerrit.cloudera.org:8080/#/c/8400/6/testdata/workloads/functional-planner/queries/PlannerTest/ddl.test
File testdata/workloads/functional-planner/queries/PlannerTest/ddl.test:

http://gerrit.cloudera.org:8080/#/c/8400/6/testdata/workloads/functional-planner/queries/PlannerTest/ddl.test@233
PS6, Line 233: shuffle
Is that true? Without specifying a hint there should be cases where the cost 
based heuristic decides against adding an exchange node.


http://gerrit.cloudera.org:8080/#/c/8400/6/testdata/workloads/functional-planner/queries/PlannerTest/ddl.test@344
PS6, Line 344: 01:EXCHANGE [UNPARTITIONED]
At first I didn't understand what this one does (I saw that insert.test:574 has 
the same). On my machine this exchange collects all rows at the coordinator. 
Maybe you want to add that in a comment?



--
To view, visit http://gerrit.cloudera.org:8080/8400
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8d74bca999da8ae1bb89427c70841f33e3c56ab0
Gerrit-Change-Number: 8400
Gerrit-PatchSet: 6
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Lars Volker 
Gerrit-Comment-Date: Mon, 08 Jan 2018 14:04:10 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5654: Disallow setting Kudu table name in CREATE TABLE

2018-01-08 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8820 )

Change subject: IMPALA-5654: Disallow setting Kudu table name in CREATE TABLE
..


Patch Set 10:

(4 comments)

Thanks Dimitris for the review!

http://gerrit.cloudera.org:8080/#/c/8820/9/fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
File 
fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java:

http://gerrit.cloudera.org:8080/#/c/8820/9/fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java@124
PS9, Line 124: if (!Table.isExternalTable(table_.getMetaStoreTable()) && 
!setsToExternal) {
 :   AnalysisUtils.throwI
> nit: check if they fit in a single line (I could be wrong)
Indeed, it fits :) Done


http://gerrit.cloudera.org:8080/#/c/8820/5/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
File fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java:

http://gerrit.cloudera.org:8080/#/c/8820/5/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java@315
PS5, Line 315: throw new AnalysisExce
> In principal yes, in this case no. The name of this function doesn't reveal
renamed to analyzeManagedKuduTableName(). Hope that is a bit more suitable here 
:)


http://gerrit.cloudera.org:8080/#/c/8820/9/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
File fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java:

http://gerrit.cloudera.org:8080/#/c/8820/9/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java@67
PS9, Line 67: public CreateTableStmt(TableDef tableDef) {
: Preconditions.checkNotNull(tableDef);
> Since all the tblproperties live in TableDef, I believe it makes more sense
Sure, done.


http://gerrit.cloudera.org:8080/#/c/8820/9/tests/query_test/test_kudu.py
File tests/query_test/test_kudu.py:

http://gerrit.cloudera.org:8080/#/c/8820/9/tests/query_test/test_kudu.py@327
PS9, Line 327: try:
 :   cursor.execute("ALTER TABLE %s.foo SET 
TBLPROPERTIES('kudu.table_name'='blah')"
 :   % (unique_database))
 : except Exception as e:
 :   expected_error = "AnalysisException: Not allowed to set 
'kudu.table_name' " + \
 :   "manually for managed Kudu tables"
 :   assert expected_error in str(e)
> That should be an analysis test.
There is also a test for this in AnalyzeDDLTest.java. Do you think this one 
should be dropped then?



--
To view, visit http://gerrit.cloudera.org:8080/8820
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ieca037498abf8f5fde67b77e824b720482cdbe6f
Gerrit-Change-Number: 8820
Gerrit-PatchSet: 10
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 08 Jan 2018 10:39:50 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5654: Disallow setting Kudu table name in CREATE TABLE

2018-01-08 Thread Gabor Kaszab (Code Review)
Hello Laszlo Gaal, Zoltan Borok-Nagy, Attila Jeges, Dimitris Tsirogiannis, Tim 
Armstrong, Csaba Ringhofer, Alex Behm,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8820

to look at the new patch set (#10).

Change subject: IMPALA-5654: Disallow setting Kudu table name in CREATE TABLE
..

IMPALA-5654: Disallow setting Kudu table name in CREATE TABLE

This change disallows explicitly setting the Kudu table name property
for managed Kudu tables in a CREATE TABLE statement. The Kudu table
name property gets a generated value as the following:
'impala::db_name.table_name' where table_name is the one given in
the CREATE TABLE statement.
Providing the Kudu table name property when creating a managed Kudu
table results in an error without creating the table. E.g.:
CREATE TABLE t (i INT) STORED AS KUDU
  TBLPROPERTIES('kudu.table_name'='some_name');

Alongside the CREATE TABLE statement also the ALTER TABLE statement
is changed not to allow the modification of Kudu.table_name of
managed Kudu tables.

Change-Id: Ieca037498abf8f5fde67b77e824b720482cdbe6f
---
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M testdata/workloads/functional-query/queries/QueryTest/kudu_alter.test
M tests/query_test/test_kudu.py
7 files changed, 167 insertions(+), 100 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/20/8820/10
--
To view, visit http://gerrit.cloudera.org:8080/8820
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ieca037498abf8f5fde67b77e824b720482cdbe6f
Gerrit-Change-Number: 8820
Gerrit-PatchSet: 10
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy