[Impala-ASF-CR] IMPALA-4617: remove IsConstant() analysis from be

2017-01-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4617: remove IsConstant() analysis from be
..


Patch Set 10:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/222/

-- 
To view, visit http://gerrit.cloudera.org:8080/5415
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I24b4c9d6e8e57fd73d906a9c36200e1a84497b90
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4617: remove IsConstant() analysis from be

2017-01-30 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4617: remove IsConstant() analysis from be
..


Patch Set 10: Code-Review+2

rebase

-- 
To view, visit http://gerrit.cloudera.org:8080/5415
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I24b4c9d6e8e57fd73d906a9c36200e1a84497b90
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] [DOCS] Major update to Impala + Kudu page

2017-01-30 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: [DOCS] Major update to Impala + Kudu page
..


Patch Set 13: Code-Review+1

(5 comments)

I'll let MJ and/or Todd make a final pass.

http://gerrit.cloudera.org:8080/#/c/5649/13/docs/topics/impala_create_table.xml
File docs/topics/impala_create_table.xml:

PS13, Line 85: [PRIMARY KEY (col_name[, ...]]
This doesn't belong here.


PS13, Line 244: CREATE TABLE [IF NOT EXISTS] 
db_name.]table_name
  :   [COMMENT 'table_comment']
  :   STORED AS KUDU
  :   [TBLPROPERTIES 
('key1'='value1', 
'key2'='value2', ...)]
  : AS
  :   select_statement
This is not correct. You're missing the PRIMARY KEY and PARTITION BY clauses.


PS13, Line 411: impala::username.kudu_t1
nit: I don't think this is a good example. I would be really surprised if 
someone actually created a Kudu table outside of Impala named as 
"impala::username". Pls change the table name.


http://gerrit.cloudera.org:8080/#/c/5649/13/docs/topics/impala_kudu.xml
File docs/topics/impala_kudu.xml:

PS13, Line 93: kudu.master_addresses
That's the name of the table property. Can you verify that name of the impalad 
configuration param?


PS13, Line 148: There is a many-to-many relationship
  :   between the tablets and tablet servers, managed 
automatically by Kudu.
Isn't more clear to say that each tablet server can store multiple tablets and 
that tablets are replicated across different tablet servers?


-- 
To view, visit http://gerrit.cloudera.org:8080/5649
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I76dcb948dab08532fe41326b22ef78d73282db2c
Gerrit-PatchSet: 13
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Ambreen Kazi 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[Impala-ASF-CR] Add .pep8rc for Impala's Python style

2017-01-30 Thread Lars Volker (Code Review)
Lars Volker has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/5829

Change subject: Add .pep8rc for Impala's Python style
..

Add .pep8rc for Impala's Python style

Change-Id: I65a99fc6e364eb1e49e21d0ba97546bc25f65a27
---
A .pep8rc
1 file changed, 8 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/29/5829/1
-- 
To view, visit http://gerrit.cloudera.org:8080/5829
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I65a99fc6e364eb1e49e21d0ba97546bc25f65a27
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 


[Impala-ASF-CR] IMPALA-4705, IMPALA-4779, IMPALA-4780: Fix some Expr bugs with codegen

2017-01-30 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4705, IMPALA-4779, IMPALA-4780: Fix some Expr bugs with 
codegen
..


Patch Set 4:

I think it may have been accidentally disabled by the single node optimisation

-- 
To view, visit http://gerrit.cloudera.org:8080/5732
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I40fdb035a565ae2f9c9fbf4db48a548653ef7608
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4820: avoid writing unencrypted data during write cancellation

2017-01-30 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4820: avoid writing unencrypted data during write 
cancellation
..


Patch Set 3: Code-Review+2

Carry +2

-- 
To view, visit http://gerrit.cloudera.org:8080/5788
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I956bae0685e863f30be23634b29aa076394db184
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4820: avoid writing unencrypted data during write cancellation

2017-01-30 Thread Tim Armstrong (Code Review)
Tim Armstrong has submitted this change and it was merged.

Change subject: IMPALA-4820: avoid writing unencrypted data during write 
cancellation
..


IMPALA-4820: avoid writing unencrypted data during write cancellation

The bug was that unencrypted data could be written to disk if
the write was cancelled before it completed. This bug was introduced
after Impala 2.8.0 was branched in the commit "IMPALA-3202,IMPALA-2079:
rework scratch file I/O", so does not appear in any released versions
of Impala.

The fix is to only start decrypting data after the write is
complete.

Testing:
Added a regression test that reproduced the problem (after adding a
delay to the write).

Change-Id: I956bae0685e863f30be23634b29aa076394db184
Reviewed-on: http://gerrit.cloudera.org:8080/5788
Tested-by: Impala Public Jenkins
Reviewed-by: Tim Armstrong 
---
M be/src/common/global-flags.cc
M be/src/runtime/disk-io-mgr.cc
M be/src/runtime/tmp-file-mgr-test.cc
M be/src/runtime/tmp-file-mgr.cc
4 files changed, 73 insertions(+), 5 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Tim Armstrong: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/5788
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I956bae0685e863f30be23634b29aa076394db184
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] Patch references to Cloudera and CDH in Impala tutorial

2017-01-30 Thread Laurel Hale (Code Review)
Laurel Hale has posted comments on this change.

Change subject: Patch references to Cloudera and CDH in Impala tutorial
..


Patch Set 1: Code-Review-1

(2 comments)

The use of keyref to change "CDH" to a Hadoop distribution number didn't seem 
to work (see line 60) and use of parentheses not necessary in line 1700.

http://gerrit.cloudera.org:8080/#/c/5663/1/docs/topics/impala_tutorial.xml
File docs/topics/impala_tutorial.xml:

Line 60: If you already have a  
environment set up and just need to add Impala to it, follow the installation
This doesn't seem to work. It's showing up as "CDH" in the build.


PS1, Line 1700: (Currently, this technique only works for Parquet files.)
Parentheses not necessary.


-- 
To view, visit http://gerrit.cloudera.org:8080/5663
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I44245b65ce6f247ae8771f582f4b33c3712145ae
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Laurel Hale 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4820: avoid writing unencrypted data during write cancellation

2017-01-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4820: avoid writing unencrypted data during write 
cancellation
..


Patch Set 3: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/5788
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I956bae0685e863f30be23634b29aa076394db184
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3524: Don't process spilled partitions with 0 probe rows

2017-01-30 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-3524: Don't process spilled partitions with 0 probe rows
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5389/6/be/src/exec/partitioned-hash-join-node.cc
File be/src/exec/partitioned-hash-join-node.cc:

Line 677:   output_unmatched_batch_->Reset();
> So then, if I understand correctly, you mentioned a build partition with >8
Is there a dcheck we could add to detect the incorrect state?


-- 
To view, visit http://gerrit.cloudera.org:8080/5389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I175b32dd9031e51218b38c37693ac3e31dfab47b
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4818: Ensure the same number of tests are run every time

2017-01-30 Thread Michael Brown (Code Review)
Michael Brown has posted comments on this change.

Change subject: IMPALA-4818: Ensure the same number of tests are run every time
..


Patch Set 2: Code-Review+1

(1 comment)

Maybe this is a candidate for upvoting?

http://gerrit.cloudera.org:8080/#/c/5784/2/tests/query_test/test_cancellation.py
File tests/query_test/test_cancellation.py:

PS2, Line 195: if cls.exploration_strategy() != 'exhaustive':
 :   cls.ImpalaTestMatrix.add_constraint(lambda v: 
v.get_value('cancel_delay') in [3])
> I don't quite see what you're getting at here. It seems like line 198-199 i
This ends up falling under IMPALA-3947. Sorry for the noise.


-- 
To view, visit http://gerrit.cloudera.org:8080/5784
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I22cecfbe7c9a102f788d01eb80aa188579ef6d7e
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3524: Don't process spilled partitions with 0 probe rows

2017-01-30 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change.

Change subject: IMPALA-3524: Don't process spilled partitions with 0 probe rows
..


Patch Set 9:

This passed a core ASAN run:
http://sandbox.jenkins.cloudera.com/job/impala-umbrella-build-and-test/7228/

-- 
To view, visit http://gerrit.cloudera.org:8080/5389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I175b32dd9031e51218b38c37693ac3e31dfab47b
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4818: Ensure the same number of tests are run every time

2017-01-30 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: IMPALA-4818: Ensure the same number of tests are run every time
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5784/2/tests/query_test/test_cancellation.py
File tests/query_test/test_cancellation.py:

PS2, Line 195: if cls.exploration_strategy() != 'exhaustive':
 :   cls.ImpalaTestMatrix.add_constraint(lambda v: 
v.get_value('cancel_delay') in [3])
> L196 will rarely if ever get executed, because the workload as derived from
I don't quite see what you're getting at here. It seems like line 198-199 in 
HEAD get executed for every "core" run.

In the history of this patch, line 195 was added because tests were too slow. 
Generally, I'd like to confine this patch to just derandomizing this method and 
worry about which test should be run later if we see a problem.


-- 
To view, visit http://gerrit.cloudera.org:8080/5784
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I22cecfbe7c9a102f788d01eb80aa188579ef6d7e
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: Yes


[Impala-ASF-CR] Improvements to "Unknown disk-ID" warning

2017-01-30 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/5828

Change subject: Improvements to "Unknown disk-ID" warning
..

Improvements to "Unknown disk-ID" warning

This commit,
- Appends the table name to the "Unknown disk-ID" warning when
  disk IDs are missing, for supportability.

- Removes reference to enabling dfs block metadata configuration,
  since it doesn't apply anymore.

- Removes VolumeId terminology from the runtime profile.

Change-Id: Iddb132ff7ad66f3291b93bf9d8061bd0525ef1b2
---
M be/src/exec/hdfs-scan-node-base.cc
M common/thrift/generate_error_codes.py
2 files changed, 5 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/5828/1
-- 
To view, visit http://gerrit.cloudera.org:8080/5828
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Iddb132ff7ad66f3291b93bf9d8061bd0525ef1b2
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 


[Impala-ASF-CR] IMPALA-4705, IMPALA-4779, IMPALA-4780: Fix some Expr bugs with codegen

2017-01-30 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-4705, IMPALA-4779, IMPALA-4780: Fix some Expr bugs with 
codegen
..


Patch Set 4:

> Quite a bit longer. 5~6 mins in debug builds on my machine. I think
 > it's worth it given that it covers pretty much all the built-in
 > functions.

yeah, I'm guessing this was originally disabled due to the long execution time. 
but i agree we need to get this coverage somehow.  are there anythings we can 
do (maybe in the future) to speed this up?

-- 
To view, visit http://gerrit.cloudera.org:8080/5732
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I40fdb035a565ae2f9c9fbf4db48a548653ef7608
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3909: Populate min/max statistics in Parquet writer

2017-01-30 Thread Lars Volker (Code Review)
Hello Tim Armstrong,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5611

to look at the new patch set (#10).

Change subject: IMPALA-3909: Populate min/max statistics in Parquet writer
..

IMPALA-3909: Populate min/max statistics in Parquet writer

Change-Id: I8368ee58daa50c07a3b8ef65be70203eb941f619
---
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/hdfs-parquet-table-writer.h
A be/src/exec/parquet-column-stats.h
M be/src/exec/parquet-common.h
M be/src/runtime/string-value.h
M be/src/util/dict-test.cc
M tests/query_test/test_insert_parquet.py
M tests/util/get_parquet_metadata.py
8 files changed, 754 insertions(+), 73 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/5611/10
-- 
To view, visit http://gerrit.cloudera.org:8080/5611
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8368ee58daa50c07a3b8ef65be70203eb941f619
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Ivanfi 


[Impala-ASF-CR] IMPALA-3909: Populate min/max statistics in Parquet writer

2017-01-30 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-3909: Populate min/max statistics in Parquet writer
..


Patch Set 9:

(17 comments)

Thanks for the review. Please see my inline comments and the new PS.

http://gerrit.cloudera.org:8080/#/c/5611/6/be/src/exec/hdfs-parquet-table-writer.cc
File be/src/exec/hdfs-parquet-table-writer.cc:

PS6, Line 178: ProcessValue
> that was just an off-the-cuff suggestion. if 'append' is more specific, the
I tend to favor ProcessValue, since Append (like Encode) leaves out the 
statistics part. While ProcessValue is vague, it implies that the values is 
"consumed" afterwards, and that it may be necessary to look at the comment for 
further details. That being said, how about "ConsumeValue"? Does that imply 
ownership transfer too much?


http://gerrit.cloudera.org:8080/#/c/5611/9/be/src/exec/hdfs-parquet-table-writer.cc
File be/src/exec/hdfs-parquet-table-writer.cc:

Line 134:   void EncodeColumnStats(ColumnMetaData* meta_data) {
> find a better name. 'column stats' is not a thrift concept. these are speci
Done


Line 236:   // Created and set by the derived class.
> owner? same for the other pointer members.
Done


Line 339:   int64_t encoded_value_size_;
> this seems to be the plain encoding size. even for dict-encoded cols?
Yes, the dictionary encoders use it as the size of the encoded values. Should 
we rename it to plain_encoded_value_size_?


Line 347:   // Tracks statistics per row group. This gets reset when starting a 
new file.
> hopefully when starting a new row group
Yes, Done.


Line 643:   DCHECK(page_stats_base_ != nullptr);
> how does this handle unsupported types?
Currently the ColumnStats gets created for all supported types, and the 
ColumnStats c'tor ensures that the class can only be instantiated for supported 
types. However, for some types we don't write out the statistics to the 
resulting parquet file by disabling the Update() logic, thus ensuring that the 
stats objects stay empty. This allows us to keep the interface of ColumnStats 
simpler.


Line 1028: 
columns_[i]->EncodeColumnStats(_row_group_->columns[i].meta_data);
> where do the row group stats get reset?
Line 281 of this PS.


http://gerrit.cloudera.org:8080/#/c/5611/9/be/src/exec/hdfs-parquet-table-writer.h
File be/src/exec/hdfs-parquet-table-writer.h:

Line 103:   /// Maximum statistics size. If the combined size of the min and 
max values of
> does this refer to a single thrift Statistics struct? if so, spell that out
Done


http://gerrit.cloudera.org:8080/#/c/5611/9/be/src/exec/parquet-column-stats.h
File be/src/exec/parquet-column-stats.h:

Line 65:   void EncodeToThrift(T* parent) const {
> this feels more convoluted than it needs to be. i think it would be better 
Done


Line 88:   // We explicitly require types to be listed here in order to support 
column statistics.
> i don't understand, i thought those listed types are specifically not suppo
We don't update statistics for these types, but creating ColumnStats for them 
is still supported. This is because we want to support them soon. If we 
disallow creating them, then the caller will have to deal with only creating 
ColumnStats for supported column types.


Line 90:   // follow the ordering semantics of parquet's min/max statistics for 
the new type.
> what are the ordering semantics? (that order as byte sequence == value orde
The values should be ordered according to what parquet-mr says (which is 
hopefully in line with what Hive and Spark do). However, there is currently a 
lack of specification, especially for logical types, that we're working on 
clarifying with the parquet-format project. I added a comment explaining where 
to look for the ordering semantics.


Line 97:   T>::type;
> i find the formatting hard to decipher. please reformat by hand (for instan
Done


Line 127:   // statistics behavior from any implicit behavior of the types?
> but shouldn't the stats reflect the behavior of the underlying types. ie, w
The stats should follow the ordering semantics defined in parquet-format. Those 
are currently underspecified, we're working on getting them fixed so we can 
write stats for other types, too (see my other comments in this review). While 
I agree that they should ideally be the same, I don't think we can guarantee or 
enforce that, since ultimately the parquet-format project serves as the 
reference.


Line 148:   /// Encodes a single value into an output string using parquet's 
plain encoding.
> 'an output string' makes it sound like this gets converted into a string ty
Done


Line 159: return encoded_value_size_ < 0 ? 
ParquetPlainEncoder::ByteSize(v) :
> reformat by hand
Done


http://gerrit.cloudera.org:8080/#/c/5611/9/be/src/exec/parquet-common.h
File be/src/exec/parquet-common.h:

Line 89:   static int ByteSize(const T& v) { return sizeof(T); }
> does this function make sense at all? why not simply 

[Impala-ASF-CR] Release note updates for Impala 2.8

2017-01-30 Thread John Russell (Code Review)
John Russell has posted comments on this change.

Change subject: Release note updates for Impala 2.8
..


Patch Set 3:

(5 comments)

Only minor changes (trailing spaces) to come in next patch set. Right now I'm 
stuck at "failed to push some refs", so that might not be today.

http://gerrit.cloudera.org:8080/#/c/5668/3/docs/topics/impala_incompatible_changes.xml
File docs/topics/impala_incompatible_changes.xml:

Line 56:   
> I do not see an answer to this yet.
These are all from JIRAs that are marked 'fixed in 2.8.0' and/or coding work 
completed back in December, before 2.8.0 was branched off. Is there an 
additional round of checking you are thinking of? The current behavior is as 
described elsewhere in the docs. The previous behavior (at least for the 
Kudu-related bullets) is from the impala_kudu fork.


PS3, Line 58: Impala Incompatible Changes Introduced in Impala 2.8.x
> There was something I saw relating to cross-reference links, but I think th
Done


Line 1523:   
> My experience with the CHANGES files in the root directories of open-source
Let's continue this discussion as part of the general cleanup of 
Cloudera-specific info. That is, if we preserve some historical info as-is, 
we'll have to adjust expectations a little bit for how many instances of 
Cloudera / CDH / Cloudera Manager remain in the source files.


http://gerrit.cloudera.org:8080/#/c/5668/6/docs/topics/impala_incompatible_changes.xml
File docs/topics/impala_incompatible_changes.xml:

Line 71: such as COMPRESSION, 
DEFAULT, and ENCODING, that
> new spaces at end of lines
Done. I have a pre-commit hook that suppresses these but haven't adapted it yet 
for the Apache repo.


http://gerrit.cloudera.org:8080/#/c/5668/6/docs/topics/impala_new_features.xml
File docs/topics/impala_new_features.xml:

PS6, Line 105: , and each partition is onl
> Do we describe the behavior in more detail someplace else?
Yes, in the MT_DOP page linked from this bullet point.


-- 
To view, visit http://gerrit.cloudera.org:8080/5668
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7c47f422e509cec6d3eb8aaa82294b584f393aed
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Ambreen Kazi 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Laurel Hale 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Silvius Rus 
Gerrit-HasComments: Yes


[Impala-ASF-CR] [DOCS] Major update to Impala + Kudu page

2017-01-30 Thread John Russell (Code Review)
John Russell has uploaded a new patch set (#13).

Change subject: [DOCS] Major update to Impala + Kudu page
..

[DOCS] Major update to Impala + Kudu page

Upgrade with details of latest syntax.

Fine-tune discussion of PK and other Kudu
notions.

The impala_kudu diff looks larger than actual changes
to the page, because subtopics got moved
around and promoted/demoted (which changes the
indentation). Best to review that page start-to-finish.

CREATE TABLE details for Impala + Kudu.

ALTER TABLE details for Impala + Kudu.

Unhide the Impala partitioning + Kudu topic.
Mainly a brief intro then a link to delegate
details to the main Kudu page, which already
has a partitioning subtopic.

Include changes to reserved words. Entirely
from Kudu integration work.

Add Kudu considerations for misc SQL statements.

Addressed Todd's and Dimitris's comments for certain files.
(Up to the beginning of the "Partitioning" section in
impala_kudu.xml.)

Added Kudu blurbs to data type topics:
- Some aren't supported.
- Others are supported but can't go in the primary key.

Added walkthrough of renaming internal/external tables.

Split out Kudu CREATE TABLE syntax from other file formats.

Correct info about CTAS for Kudu tables.

Add examples of basic Kudu, external Kudu, and Kudu CTAS.

Change-Id: I76dcb948dab08532fe41326b22ef78d73282db2c
---
M docs/impala_keydefs.ditamap
M docs/shared/impala_common.xml
M docs/topics/impala_alter_table.xml
M docs/topics/impala_array.xml
M docs/topics/impala_boolean.xml
M docs/topics/impala_char.xml
M docs/topics/impala_compute_stats.xml
M docs/topics/impala_create_table.xml
M docs/topics/impala_decimal.xml
M docs/topics/impala_describe.xml
M docs/topics/impala_double.xml
M docs/topics/impala_drop_table.xml
M docs/topics/impala_explain.xml
M docs/topics/impala_float.xml
M docs/topics/impala_grant.xml
M docs/topics/impala_invalidate_metadata.xml
M docs/topics/impala_kudu.xml
M docs/topics/impala_literals.xml
M docs/topics/impala_map.xml
M docs/topics/impala_partitioning.xml
M docs/topics/impala_refresh.xml
M docs/topics/impala_reserved_words.xml
M docs/topics/impala_revoke.xml
M docs/topics/impala_show.xml
M docs/topics/impala_struct.xml
M docs/topics/impala_tables.xml
M docs/topics/impala_timestamp.xml
M docs/topics/impala_truncate_table.xml
M docs/topics/impala_varchar.xml
29 files changed, 2,310 insertions(+), 110 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/5649/13
-- 
To view, visit http://gerrit.cloudera.org:8080/5649
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I76dcb948dab08532fe41326b22ef78d73282db2c
Gerrit-PatchSet: 13
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Ambreen Kazi 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Todd Lipcon 


[Impala-ASF-CR] [DOCS] Major update to Impala + Kudu page

2017-01-30 Thread John Russell (Code Review)
John Russell has posted comments on this change.

Change subject: [DOCS] Major update to Impala + Kudu page
..


Patch Set 7:

(18 comments)

Addressed all comments, esp. splitting CREATE TABLE syntax for HDFS vs. Kudu.

http://gerrit.cloudera.org:8080/#/c/5649/7/docs/topics/impala_explain.xml
File docs/topics/impala_explain.xml:

PS7, Line 250:  
> "in a SCAN KUDU node"
Done


PS7, Line 251: , and might involve transmitting
 :   non-matching rows that are filtered out on the Impala side.
> remove
Done


http://gerrit.cloudera.org:8080/#/c/5649/7/docs/topics/impala_kudu.xml
File docs/topics/impala_kudu.xml:

PS7, Line 76: scan performance close to that of Parquet
> Make sure you check with Mostafa about this claim.
Done


PS7, Line 147: The work is parallelized
 :   across units of computing called tablet 
servers.
> I believe the unit of computing is the tablet not the tablet server, unless
The tablet server is the compute resource, the tablet holds the actual data.


PS7, Line 149: between the tablets and tablet servers.
> I think it would be nice to mention that the user is not responsible for ma
Done


PS7, Line 168: (CREATE TABLE and ALTER TABLE)
> It is weird to mention CREATE without DROP. Please remove this and mention 
Since this section is only talking about the delta in syntax, I'll leave as-is.


PS7, Line 397: 
 : For non-Kudu tables, Impala allows any column to 
contain NULL
 : values, because it is not practical to enforce a 
not null constraint on HDFS
 : data files that could be prepared using external 
tools and ETL processes.
 :   
 : 
 :   
 : 
 :   
 : For example, a table containing geographic 
information might require the latitude
 : and longitude coordinates to always be specified. 
Other attributes might be allowed
 : to be NULL. For example, a location 
might not have a designated
 : place name, its altitude might be unimportant, and 
its population might be initially
 : unknown, to be filled in later.
 :   
> You may want to swap these two paragraphs.
Done


PS7, Line 717: PARTITION BY
> Impala still uses "PARTITIONED BY" for HDFS tables.
Done


PS7, Line 727: . By setting
 :   up an effective partitioning scheme for a Kudu table, 
you can ensure that the work for
 :   a query can be parallelized evenly across the hosts in 
a cluster.
> Remove. Sometimes the goal is to scan as little as possible. You can say th
Whole paragraph was hidden per comment on a previous patch set. (Instead, we 
link people to the Kudu white paper for those kinds of details.)


PS7, Line 936: To see the distribution of data in a Kudu table across the 
underlying buckets and
 : partitions, use the SHOW TABLE 
STATS statement.
> This is unfortunately not accurate. SHOW TABLE STATS will only show the dis
Done. Reworded rather than deleting entirely.


PS7, Line 1122: change
> "changes"
Done


PS7, Line 1159: strong consistency for order of operations
> I am not sure I know what that means.
Done


PS7, Line 1159: total
  : success or total failure of a multi-row statement
> This is "atomicity". Maybe just mention atomic multi-row statements.
Done


PS7, Line 1160: or data that is read while a write
  : operation is in progress
> Isolation.
I added the technical terminology into my wording.


PS7, Line 1288: Memory Usage for Operations on Kudu Tables
  : 
  :   
  : 
  : 
  :   The Apache Kudu architecture, topology, and data 
storage techniques result in
  :   different patterns of memory usage for Impala 
statements than with HDFS-backed tables.
  : 
> I don't find this particularly informative and suggest we remove it unless 
Yes, audience="hidden" earlier in the  tag will make it invisible.


http://gerrit.cloudera.org:8080/#/c/5649/7/docs/topics/impala_literals.xml
File docs/topics/impala_literals.xml:

PS7, Line 408: most Impala tables
> Impala tables backed by HDFS or S3? "most" is kind of vague
Done


http://gerrit.cloudera.org:8080/#/c/5649/7/docs/topics/impala_revoke.xml
File docs/topics/impala_revoke.xml:

PS7, Line 115: access to a Kudu table is all or nothing.
> "only table-level permissions are enforced in Kudu tables. Column-level per
Done


http://gerrit.cloudera.org:8080/#/c/5649/7/docs/topics/impala_tables.xml
File docs/topics/impala_tables.xml:

PS7, Line 293: using the Apache Kudu storage system
> "stored in Apache Kudu"
Done


-- 
To view, visit 

[Impala-ASF-CR] IMPALA-4820: avoid writing unencrypted data during write cancellation

2017-01-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4820: avoid writing unencrypted data during write 
cancellation
..


Patch Set 3:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/221/

-- 
To view, visit http://gerrit.cloudera.org:8080/5788
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I956bae0685e863f30be23634b29aa076394db184
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4735: Upgrade pytest in python env to version 2.9.2.

2017-01-30 Thread Michael Brown (Code Review)
Michael Brown has posted comments on this change.

Change subject: IMPALA-4735: Upgrade pytest in python env to version 2.9.2.
..


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5640/7/tests/run-tests.py
File tests/run-tests.py:

PS7, Line 144: other_tests
Maybe use the name explicit_tests ?


-- 
To view, visit http://gerrit.cloudera.org:8080/5640
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I40d129e0e63ca5bee126bac6ac923abb3c7e0a67
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: David Knupp 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4818: Ensure the same number of tests are run every time

2017-01-30 Thread Michael Brown (Code Review)
Michael Brown has posted comments on this change.

Change subject: IMPALA-4818: Ensure the same number of tests are run every time
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5784/2/tests/query_test/test_cancellation.py
File tests/query_test/test_cancellation.py:

PS2, Line 195: if cls.exploration_strategy() != 'exhaustive':
 :   cls.ImpalaTestMatrix.add_constraint(lambda v: 
v.get_value('cancel_delay') in [3])
L196 will rarely if ever get executed, because the workload as derived from L60 
is "tpch", not "functional-query". Typical exhaustive tests tend only to apply 
to tests whose workload is functional-query, via 
"--workload_exploration_strategy functional-query:exhaustive".

I believe this is so because, unlike the functional database, the tpch database 
will not exist in all the typical exhaustive formats when doing a data load, 
which means typical "exhaustive vectors" aren't possible with tpch. However, 
the queries (L35) here come from tpch, hence the value of get_workload().

Some options:

- Remove L195. If tests are quick, this is OK.
or
- Hack up test dimensions and workload to avoid redundant tests but also truly 
run various cancel delays when workload_exploration_strategy is 
functional-query:exhaustive .
or
- Remove L195-196 if loss of coverage is only negligible


-- 
To view, visit http://gerrit.cloudera.org:8080/5784
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I22cecfbe7c9a102f788d01eb80aa188579ef6d7e
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: Yes