[Impala-ASF-CR] IMPALA-4643: [DOCS] Phase 1 of genericizing cloudera.com links

2017-03-09 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4643: [DOCS] Phase 1 of genericizing cloudera.com links
..


Patch Set 1: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6338
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8ddb3d7f0a398e70a0f741410bc6b1a309bd1686
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Laurel Hale 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4643: [DOCS] Phase 1 of genericizing cloudera.com links

2017-03-09 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-4643: [DOCS] Phase 1 of genericizing cloudera.com links
..


IMPALA-4643: [DOCS] Phase 1 of genericizing cloudera.com links

This change primarily sets up the infrastructure in the
keydefs file to allow shortening and genericizing
cross-doc links within the Impala library.

Many links to CDH-related pages such as cdh_*.xml
and cm_*.xml will be replaced by generic plain text.
Most of that will be in the next phase.

For phase 1, you can verify that things are working by
viewing the HTML output pages:

impala_new_features.html
impala_perf_hdfs_caching.html

and doublechecking that at the spots where formerly was a link to
a CDH page describing HDFS caching, now there is only the plain text
phrase: "see the documentation for HDFS caching".

Blanking out all the Cloudera doc links (in the next
phase) will remove many instances of www.cloudera.com
that appear both in the doc source and in the HTML output.
Change-Id: I8ddb3d7f0a398e70a0f741410bc6b1a309bd1686
Reviewed-on: http://gerrit.cloudera.org:8080/6338
Reviewed-by: Laurel Hale 
Reviewed-by: John Russell 
Tested-by: Impala Public Jenkins
---
M docs/impala_keydefs.ditamap
M docs/topics/impala_new_features.xml
M docs/topics/impala_perf_hdfs_caching.xml
3 files changed, 377 insertions(+), 126 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Laurel Hale: Looks good to me, but someone else must approve
  John Russell: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/6338
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I8ddb3d7f0a398e70a0f741410bc6b1a309bd1686
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Laurel Hale 


[Impala-ASF-CR] IMPALA-4643: [DOCS] Phase 1 of genericizing cloudera.com links

2017-03-09 Thread John Russell (Code Review)
John Russell has posted comments on this change.

Change subject: IMPALA-4643: [DOCS] Phase 1 of genericizing cloudera.com links
..


Patch Set 1: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6338
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8ddb3d7f0a398e70a0f741410bc6b1a309bd1686
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Laurel Hale 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4643: [DOCS] Phase 1 of genericizing cloudera.com links

2017-03-09 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4643: [DOCS] Phase 1 of genericizing cloudera.com links
..


Patch Set 1:

Build started: http://jenkins.impala.io:8080/job/gerrit-docs-submit/67/

-- 
To view, visit http://gerrit.cloudera.org:8080/6338
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8ddb3d7f0a398e70a0f741410bc6b1a309bd1686
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Laurel Hale 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4008: Don't bake fields into generated IR functions of OldHashTable

2017-03-09 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4008: Don't bake fields into generated IR functions of 
OldHashTable
..


Patch Set 2: Code-Review+2

(1 comment)

I feel pretty confident +2ing - we have good test coverage of the in-memory 
joins and this patch is only changing the internal implementation of deprecated 
code.

I assume you did a test run using this code path but if not it would be good to 
do before merging.

http://gerrit.cloudera.org:8080/#/c/6263/2/be/src/codegen/llvm-codegen.cc
File be/src/codegen/llvm-codegen.cc:

Line 1458:   // Load _expr_ctxs_[0]
Stray comment.


-- 
To view, visit http://gerrit.cloudera.org:8080/6263
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I75500827dff56b1fa9e5296e8e8d8667ab54aef8
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4008: Don't bake fields into generated IR functions of OldHashTable

2017-03-09 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4008: Don't bake fields into generated IR functions of 
OldHashTable
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6263/1//COMMIT_MSG
Commit Message:

Line 16: 
> I looked at that instance at ScalarFnCall(). It t should be fine for MT as 
I looked and it seemed like the fallback would never be used since the 
GetCodegendComputeFn() should never return OK + NULL. Not a big deal anyway.


-- 
To view, visit http://gerrit.cloudera.org:8080/6263
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I75500827dff56b1fa9e5296e8e8d8667ab54aef8
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3742: partitions DMLs for Kudu tables

2017-03-09 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-3742: partitions DMLs for Kudu tables
..


Patch Set 6:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/6037/6/be/src/runtime/data-stream-sender.cc
File be/src/runtime/data-stream-sender.cc:

PS6, Line 446: TupleRow* current_row = batch->GetRow(i);
 :   uint32_t partition;
 :   RETURN_IF_ERROR(partitioner_->Partition(current_row, 
));
 :   RETURN_IF_ERROR(channels_[partition % 
num_channels]->AddRow(current_row));
As we discussed, I think Henry is right that we need to avoid the v-f-calls per 
row.

I think we can consider two approaches:
1) we can continue branching separately for kudu and for hash partitioning 
here, and calling non virtual fns on the partitioner. It's still good to have 
pulled all the Kudu code into a separate class.
2) the partitioner class you added can take row batches at a time as well as 
well as the channels_, and handle the per-row calls internally.

#2 sounds cleaner but I'm open to suggestions.


http://gerrit.cloudera.org:8080/#/c/6037/6/bin/impala-config.sh
File bin/impala-config.sh:

PS6, Line 124: 4cd7d85
btw this won't be in your change, right? we still need to get the Kudu API in


http://gerrit.cloudera.org:8080/#/c/6037/6/common/thrift/Partitions.thrift
File common/thrift/Partitions.thrift:

PS6, Line 42: TKuduDataPartition
brief comment


PS6, Line 43: target_table_id
mention this is necessary for the kudu Partitioner API


http://gerrit.cloudera.org:8080/#/c/6037/6/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java:

Line 683: // Hdfs folder structure correctly.
, or the indexes to construct rows passed to the Kudu partitioning API.

(Or something similar)


http://gerrit.cloudera.org:8080/#/c/6037/6/fe/src/main/java/org/apache/impala/planner/DataPartition.java
File fe/src/main/java/org/apache/impala/planner/DataPartition.java:

PS6, Line 46: partition
partitioning


PS6, Line 52: number
index


Line 55:   private DataPartition(
nit: comment should be called only by the static factory method for Kudu 
partitioned tables


http://gerrit.cloudera.org:8080/#/c/6037/6/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java:

PS6, Line 274: List partitionExprs = 
Lists.newArrayList(modifyStmt.getPartitionExprs());
 : List targetColIdxs = 
Lists.newArrayList(modifyStmt.getTargetColIdxs());
maybe just create these inline as params to the static constructor method 
below, since they're not needed otherwise.


http://gerrit.cloudera.org:8080/#/c/6037/6/fe/src/main/java/org/apache/impala/planner/Planner.java
File fe/src/main/java/org/apache/impala/planner/Planner.java:

Line 153: } else {
a brief comment here may be helpful


http://gerrit.cloudera.org:8080/#/c/6037/6/testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test
File testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test:

PS6, Line 64: [KUDU(a.id)]
seems to be printing the alias rather than the base table name as we see in the 
line above


http://gerrit.cloudera.org:8080/#/c/6037/6/testdata/workloads/functional-planner/queries/PlannerTest/kudu-upsert.test
File testdata/workloads/functional-planner/queries/PlannerTest/kudu-upsert.test:

PS6, Line 29: 01:EXCHANGE [KUDU(1)]
hm this looks fishy with unions. i'm not sure if the approach doesn't make 
sense with unions, at least those with constant operands.

can you add a test case which has unions with selecting from a table, e.g.

upsert into foo
select x,y,z from a union all
select x,y,z from b


PS6, Line 117: upsert into functional_kudu.testtbl /*+ clustered */
 : select * from functional_kudu.testtbl
 :  PLAN
 : UPSERT INTO KUDU [functional_kudu.testtbl]
 : |
 : 01:SORT
 : |  order by: id ASC NULLS LAST
 : |
 : 00:SCAN KUDU [functional_kudu.testtbl]
 :  DISTRIBUTEDPLAN
 : UPSERT INTO KUDU [functional_kudu.testtbl]
 : |
 : 02:SORT
 : |  order by: id DESC NULLS LAST
 : |
 : 01:EXCHANGE [KUDU(functional_kudu.testtbl.id)]
 : |
 : 00:SCAN KUDU [functional_kudu.testtbl]
we'll wanna do something similar without the clustered hint later, as we 
discussed


-- 
To view, visit http://gerrit.cloudera.org:8080/6037
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic10b3295159354888efcde3df76b0edb24161515
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Henry Robinson 

[Impala-ASF-CR] IMPALA-4643: [DOCS] Phase 1 of genericizing cloudera.com links

2017-03-09 Thread Laurel Hale (Code Review)
Laurel Hale has posted comments on this change.

Change subject: IMPALA-4643: [DOCS] Phase 1 of genericizing cloudera.com links
..


Patch Set 1: Code-Review+1

Builds cleanly and the cdh_ig_hdfs_caching token replaces beautifully in the 
built content. Looks good.

-- 
To view, visit http://gerrit.cloudera.org:8080/6338
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8ddb3d7f0a398e70a0f741410bc6b1a309bd1686
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Laurel Hale 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5039: Fix variability in parquet dictionary filtering test

2017-03-09 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-5039: Fix variability in parquet dictionary filtering 
test
..


Patch Set 3: Code-Review+2

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6301/2/tests/common/test_result_verifier.py
File tests/common/test_result_verifier.py:

Line 537:   for expected_line in expected_lines:
> You can think of each of these as sparse lists of mutually exclusive things
Thanks, make sense. Works for me.


http://gerrit.cloudera.org:8080/#/c/6301/3/tests/common/test_result_verifier.py
File tests/common/test_result_verifier.py:

Line 571: (function, field, expected_value, actual_value))
I think we should dump the complete actual profile as well for better debugging.


-- 
To view, visit http://gerrit.cloudera.org:8080/6301
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6b7b84d973b3ac678a24e82900f2637d569158bb
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4008: Don't bake fields into generated IR functions of OldHashTable

2017-03-09 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-4008: Don't bake fields into generated IR functions of 
OldHashTable
..


Patch Set 2:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/6263/1//COMMIT_MSG
Commit Message:

Line 16: 
> I did a grep for CastPtrToLlvmPtr() to make sure there were no other instan
I looked at that instance at ScalarFnCall(). It t should be fine for MT as Expr 
objects are shared across fragment instances. My understanding of that path is 
that it's a fall back to the interpretation path if codegen fails for a child 
expression so they aren't dead code but I think there are other ways of 
generating those code without cross-compiling the static stub functions in 
expr-ir.cc


http://gerrit.cloudera.org:8080/#/c/6263/1/be/src/exec/hash-table-ir.cc
File be/src/exec/hash-table-ir.cc:

Line 26: ExprContext* const* HashTableCtx::GetBuildExprCtxs() const {
> Long lines
Done


http://gerrit.cloudera.org:8080/#/c/6263/1/be/src/exec/hash-table.cc
File be/src/exec/hash-table.cc:

Line 736: 
> Comment that this is pointer to the first ExprContext* pointer?
Done


Line 1099: BasicBlock* null_block = BasicBlock::Create(context, "null", 
*fn);
> Maybe comment that this is a pointer to the first ExprContext*?
Done


Line 1121: CodegenAnyVal result = CodegenAnyVal::CreateCallWrapped(codegen, 
,
> No longer accurate
It's still accurate in a way but I rephrased it to:

expr_ctx = ctx_vector[i];


http://gerrit.cloudera.org:8080/#/c/6263/1/be/src/exec/old-hash-table.cc
File be/src/exec/old-hash-table.cc:

Line 325:   llvm_loc = builder.CreatePointerCast(llvm_loc,
> Remove TODO if you're not going to do this in this patch?
Done. old-hash-table's day is numbered anyway.


Line 651: 
> What do you think about adding a CallFunction() helper to LlvmCodegen to ma
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6263
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I75500827dff56b1fa9e5296e8e8d8667ab54aef8
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4008: Don't bake fields into generated IR functions of OldHashTable

2017-03-09 Thread Michael Ho (Code Review)
Michael Ho has uploaded a new patch set (#2).

Change subject: IMPALA-4008: Don't bake fields into generated IR functions of 
OldHashTable
..

IMPALA-4008: Don't bake fields into generated IR functions of OldHashTable

To support sharing generated code across fragment instances,
no fragment instance specific states should be baked into the
IR. All cases were addressed previously except for the old hash
tables used for legacy agg/join. This change fixes the old hash
tables to not bake its fields in the generated IR functions. Similar
to previous patches, some cross-compiled thin wrappers are introduced
to access the fields of interest from an OldHashTable object.

Change-Id: I75500827dff56b1fa9e5296e8e8d8667ab54aef8
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/codegen/llvm-codegen.cc
M be/src/codegen/llvm-codegen.h
M be/src/exec/CMakeLists.txt
M be/src/exec/hash-table-ir.cc
M be/src/exec/hash-table.cc
M be/src/exec/hash-table.h
A be/src/exec/old-hash-table-ir.cc
M be/src/exec/old-hash-table.cc
M be/src/exec/old-hash-table.h
11 files changed, 185 insertions(+), 80 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/6263/2
-- 
To view, visit http://gerrit.cloudera.org:8080/6263
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I75500827dff56b1fa9e5296e8e8d8667ab54aef8
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4929: Safe concurrent access to IR function call graph

2017-03-09 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-4929: Safe concurrent access to IR function call graph
..


IMPALA-4929: Safe concurrent access to IR function call graph

Previously, LlvmCodeGen creates a map (fn_refs_map_) which
maps an IR function to its set of callees. That map was initialized
once at Impalad's start time and it is supposed to be read-only
afterwards.

However, the map was unintentionally modified by its user
LlvmCodeGen::MaterializeFunctionHelper() due to the use of
operator []. In particular, that operator may create a new
entry in the map if it doesn't exist already. This is possible
because the map initially only contains entries for functions
with at least one callee function. Using the operator [] with
functions with no callee function may modify the map. Since the
map is shared by all fragment instances, such unsafe concurrent
modification can cause missing or wrong callee functions to be
materialized, leading to failure to resolve symbols in LLVM.

This change fixes the problem above by:
1. Create an entry for all functions even if it has no callee.
   In fact, LlvmCodeGen::LinkModule() assumes all functions
   defined in main module will have entries in the map.

2. Introduce a new class CodegenCallGraph which encapsulates
   the map described above. The map was established once at
   initialization time and there is no interface to modify the
   map afterwards, thus preventing any accidental modification
   to the map at run time.

Change-Id: I1acd6bad80341121c8189d817e0fe62f2862f28a
Reviewed-on: http://gerrit.cloudera.org:8080/6326
Reviewed-by: Michael Ho 
Tested-by: Impala Public Jenkins
---
M be/src/codegen/CMakeLists.txt
A be/src/codegen/codegen-callgraph.cc
A be/src/codegen/codegen-callgraph.h
M be/src/codegen/llvm-codegen.cc
M be/src/codegen/llvm-codegen.h
5 files changed, 188 insertions(+), 81 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Michael Ho: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/6326
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I1acd6bad80341121c8189d817e0fe62f2862f28a
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 


[Impala-ASF-CR] IMPALA-5039: Fix variability in parquet dictionary filtering test

2017-03-09 Thread Joe McDonnell (Code Review)
Joe McDonnell has uploaded a new patch set (#3).

Change subject: IMPALA-5039: Fix variability in parquet dictionary filtering 
test
..

IMPALA-5039: Fix variability in parquet dictionary filtering test

The tests for dictionary filtering look at how many row groups are processed and
how many are filtered by matching text in the profile. However, the number of 
row
groups processed and filtered by any individual fragment depends on how the work
is split and how many impalads are running. This causes variability in the test
output.

To fix this, the test needs a way to aggregate the results across fragments.
This fix introduces the following syntax for specifying these aggregates:
aggregate(function_name, field_name): expected_value
This searches the runtime profile for lines that contain 'field_name: number'.
It skips the averaged fragment, as this is derived from all the other fragments.

Currently, only SUM is implemented, and the expected_value is required to be
an integer. It should be easy to implement other interesting functions like
COUNT and MIN/MAX. It would also be possible to extend it to floats.

Switching the dictionary filtering tests over to this new syntax eliminates the
variability in the tests.

Change-Id: I6b7b84d973b3ac678a24e82900f2637d569158bb
---
D 
testdata/workloads/functional-query/queries/QueryTest/mt-dop-parquet-filtering.test
M testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test
M tests/common/test_result_verifier.py
M tests/query_test/test_mt_dop.py
M tests/query_test/test_scanners.py
5 files changed, 134 insertions(+), 317 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/01/6301/3
-- 
To view, visit http://gerrit.cloudera.org:8080/6301
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6b7b84d973b3ac678a24e82900f2637d569158bb
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Alex Behm 


[Impala-ASF-CR] IMPALA-5039: Fix variability in parquet dictionary filtering test

2017-03-09 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change.

Change subject: IMPALA-5039: Fix variability in parquet dictionary filtering 
test
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6301/2/tests/common/test_result_verifier.py
File tests/common/test_result_verifier.py:

Line 495:   print field_regex
> why print? does this go anywhere useful?
Removed. This was just leftover debugging.


Line 537: expected_regexes.append(try_compile_regex(expected_line))
> What's the benefit of having one entry per expected_line in both these list
You can think of each of these as sparse lists of mutually exclusive things. An 
expected line is either a regex, an aggregation or an exact match. None entries 
indicate the absence of a regex or aggregation. Including the None entries 
means the lists are all the same size and have the same indices. So, for a 
given index into the expected lines, the expected_regexes tells you whether it 
is a regex or not. The expected_aggregations tells you whether it is an 
aggregation or not. If it is neither, then it is an exact match.

This also allows you to walk through the expected lines in order and have a 
single "matched" list across all expected lines.

I worked up a version that had separate lists, but it ended up not being much 
cleaner than the current code.

I hope this makes sense.


-- 
To view, visit http://gerrit.cloudera.org:8080/6301
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6b7b84d973b3ac678a24e82900f2637d569158bb
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4643: [DOCS] Phase 1 of genericizing cloudera.com links

2017-03-09 Thread John Russell (Code Review)
John Russell has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6338

Change subject: IMPALA-4643: [DOCS] Phase 1 of genericizing cloudera.com links
..

IMPALA-4643: [DOCS] Phase 1 of genericizing cloudera.com links

This change primarily sets up the infrastructure in the
keydefs file to allow shortening and genericizing
cross-doc links within the Impala library.

Many links to CDH-related pages such as cdh_*.xml
and cm_*.xml will be replaced by generic plain text.
Most of that will be in the next phase.

For phase 1, you can verify that things are working by
viewing the HTML output pages:

impala_new_features.html
impala_perf_hdfs_caching.html

and doublechecking that at the spots where formerly was a link to
a CDH page describing HDFS caching, now there is only the plain text
phrase: "see the documentation for HDFS caching".

Blanking out all the Cloudera doc links (in the next
phase) will remove many instances of www.cloudera.com
that appear both in the doc source and in the HTML output.
Change-Id: I8ddb3d7f0a398e70a0f741410bc6b1a309bd1686
---
M docs/impala_keydefs.ditamap
M docs/topics/impala_new_features.xml
M docs/topics/impala_perf_hdfs_caching.xml
3 files changed, 377 insertions(+), 126 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/6338/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6338
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I8ddb3d7f0a398e70a0f741410bc6b1a309bd1686
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 


[Impala-ASF-CR] IMPALA-4318: Kudu support for CREATE EXTERNAL TABLE AS SELECT

2017-03-09 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change.

Change subject: IMPALA-4318:  Kudu support for CREATE EXTERNAL TABLE AS SELECT
..


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6261/4//COMMIT_MSG
Commit Message:

PS4, Line 9: No reason not to allow this.
Not necessary.

Also, I find this whole paragraph confusing - for example what thrift protocol? 
Why would it need to be changed? (not that I'm saying you should explain this, 
it may be an unnecessary detail).

Obviously, you should assume that anyone reading this has no context for the 
work that you've been doing up to this point.


http://gerrit.cloudera.org:8080/#/c/6261/4/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
File fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java:

Line 240:* Analyzes and checks parameters specified for ingested external 
Kudu tables.
Since ingested isn't a term we use anywhere else (AFAIK), you should explain 
what you mean by it, something like:
"Analyzes and checks parameters specified for external tables that already 
exist in Kudu."


-- 
To view, visit http://gerrit.cloudera.org:8080/6261
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9aa82809a6c0c5e6386827314b7e5b520c1a6633
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3401: [DOCS] Physically remove Cloudera Manager info

2017-03-09 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-3401: [DOCS] Physically remove Cloudera Manager info
..


IMPALA-3401: [DOCS] Physically remove Cloudera Manager info

Followup from Laurel's code reviews, to physically
remove references to Cloudera Manager that were hidden.

Remove a few stray instances of Cloudera Manager that I found
still remaining in the source.

Fix up trailing spaces introduced during earlier
Cloudera Manager-related edits.

Also remove stray 'Cloudera' references, or stale/commented
Cloudera-specific info, noticed near other spots being edited.

Change-Id: Ifc4a84527ae42c39b3717190b6cf669e17fff04b
Reviewed-on: http://gerrit.cloudera.org:8080/6325
Reviewed-by: Ambreen Kazi 
Reviewed-by: John Russell 
Tested-by: Impala Public Jenkins
---
M docs/shared/impala_common.xml
M docs/topics/impala_admission.xml
M docs/topics/impala_auditing.xml
M docs/topics/impala_authorization.xml
M docs/topics/impala_breakpad.xml
M docs/topics/impala_config_options.xml
M docs/topics/impala_config_performance.xml
M docs/topics/impala_faq.xml
M docs/topics/impala_fixed_issues.xml
M docs/topics/impala_impala_shell.xml
M docs/topics/impala_incompatible_changes.xml
M docs/topics/impala_isilon.xml
M docs/topics/impala_kerberos.xml
M docs/topics/impala_logging.xml
M docs/topics/impala_new_features.xml
M docs/topics/impala_noncm_installation.xml
M docs/topics/impala_perf_resources.xml
M docs/topics/impala_perf_skew.xml
M docs/topics/impala_perf_testing.xml
M docs/topics/impala_prereqs.xml
M docs/topics/impala_proxy.xml
M docs/topics/impala_resource_management.xml
M docs/topics/impala_s3.xml
M docs/topics/impala_scalability.xml
M docs/topics/impala_schema_design.xml
M docs/topics/impala_timeouts.xml
M docs/topics/impala_troubleshooting.xml
M docs/topics/impala_txtfile.xml
M docs/topics/impala_udf.xml
M docs/topics/impala_webui.xml
30 files changed, 123 insertions(+), 839 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Ambreen Kazi: Looks good to me, but someone else must approve
  John Russell: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/6325
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ifc4a84527ae42c39b3717190b6cf669e17fff04b
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Ambreen Kazi 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: John Russell 


[native-toolchain-CR] IMPALA-4846: Upgrade Snappy to 1.1.4

2017-03-09 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4846: Upgrade Snappy to 1.1.4
..


Patch Set 1: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6336
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4fe07399e892837b9417f959bdaf967ca995be28
Gerrit-PatchSet: 1
Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-Owner: Laszlo Gaal 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3748: minimum buffer requirements in planner

2017-03-09 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#9).

Change subject: IMPALA-3748: minimum buffer requirements in planner
..

IMPALA-3748: minimum buffer requirements in planner

Compute the minimum buffer requirement for spilling nodes and
per-host estimates for the entire plan tree.

This builds on top of the existing resource estimation code, which
computes the sets of plan nodes that can execute concurrently. This is
cleaned up so that the process of producing resource requirements is
clearer. It also removes the unused VCore estimates.

Fixes various bugs and other issues:
* computeCosts() was not called for unpartitioned fragments, so
  the per-operator memory estimate was not visible.
* Nested loop join was not treated as a blocking join.
* The TODO comment about union was misleading
* Fix the computation for mt_dop > 1 by distinguishing per-instance and
  per-host estimates.
* Always generate an estimate instead of unpredictably returning
  -1/"unavailable" in many circumstances - there was little rhyme or
  reason to when this happened.
* Remove the special "trivial plan" estimates. With the rest of the
  cleanup we generate estimates <= 10MB for those trivial plans through
  the normal code path.

I left one bug (IMPALA-4862) unfixed because it is subtle, will affect
estimates for many plans and will be easier to review once we have the
test infra in place.

Testing:
Added basic planner tests for resource requirements in both the MT and
non-MT cases.

Re-enabled the explain_level tests, which appears to be the only
coverage for many of these estimates. Removed the complex and
brittle test cases and replaced with a couple of much simpler
end-to-end tests.

Change-Id: I1e358182bcf2bc5fe5c73883eb97878735b12d37
---
M be/src/exec/analytic-eval-node.cc
M be/src/exec/partitioned-aggregation-node.h
M be/src/exec/partitioned-hash-join-builder.h
M be/src/runtime/sorter.cc
M be/src/scheduling/query-schedule.cc
M be/src/service/query-exec-state.cc
M common/thrift/Frontend.thrift
M docs/shared/impala_common.xml
M docs/topics/impala_explain.xml
M docs/topics/impala_explain_level.xml
M docs/topics/impala_explain_plan.xml
M docs/topics/impala_hbase.xml
M docs/topics/impala_known_issues.xml
M docs/topics/impala_optimize_partition_key_scans.xml
M docs/topics/impala_perf_joins.xml
M docs/topics/impala_perf_stats.xml
M fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/common/PrintUtils.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java
M fe/src/main/java/org/apache/impala/planner/DataSink.java
M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java
M fe/src/main/java/org/apache/impala/planner/DataStreamSink.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/EmptySetNode.java
M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
M fe/src/main/java/org/apache/impala/planner/HBaseTableSink.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/JoinBuildSink.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/KuduTableSink.java
M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java
M fe/src/main/java/org/apache/impala/planner/PipelinedPlanNodeSet.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
A fe/src/main/java/org/apache/impala/planner/ResourceProfile.java
M fe/src/main/java/org/apache/impala/planner/SortNode.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/kudu-selectivity.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test
A 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M testdata/workloads/functional-query/queries/QueryTest/explain-level0.test
M testdata/workloads/functional-query/queries/QueryTest/explain-level1.test
M testdata/workloads/functional-query/queries/QueryTest/explain-level2.test
M 

[Impala-ASF-CR] IMPALA-3748: estimate minimum buffers in planner

2017-03-09 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3748: estimate minimum buffers in planner
..


Patch Set 8:

(15 comments)

http://gerrit.cloudera.org:8080/#/c/5847/8/fe/src/main/java/org/apache/impala/planner/PipelinedPlanNodeSet.java
File fe/src/main/java/org/apache/impala/planner/PipelinedPlanNodeSet.java:

Line 48:   public class ResourceRequirement {
> these aren't requirements if they include the existing "estimates". Resourc
Done


Line 115:   node.computeCosts(queryOptions);
> they're not costs
renamed to computeResourceConsumption(). I don't feel strongly about the name - 
please let me know if you have a better idea.


Line 156: hdfsScanMemEstimate + hbaseScanMemEstimate + 
nonScanMemEstimate + dataSinkMemEstimate);
> long line
Done


http://gerrit.cloudera.org:8080/#/c/5847/8/fe/src/main/java/org/apache/impala/planner/PlanFragment.java
File fe/src/main/java/org/apache/impala/planner/PlanFragment.java:

Line 209:   public int getNumInstancesPerHost(TQueryOptions queryOptions) {
> passing in the dop directly is preferable, because it's easier to see what'
Done


Line 232:   public long getNumDistinctValues(TQueryOptions queryOptions, 
List exprs) {
> rename to include "per instance" in the name to clarify. you might also wan
Done


http://gerrit.cloudera.org:8080/#/c/5847/8/fe/src/main/java/org/apache/impala/planner/PlanNode.java
File fe/src/main/java/org/apache/impala/planner/PlanNode.java:

Line 120:   // estimated per-instance memory usage for this plan node in bytes;
> the "estimates" don't really have defined semantics (max? desirable?). we m
I added a comment to point to IMPALA-5013, which tracks that problem. Some of 
the estimates (HDFS scanner, join, agg) are reasonable and we might want to use 
them as a basis for estimating "ideal" memory once this evolves further.

Reworded to avoid "usage".


Line 124:   // Minimum number of buffers required to execute spilling operator.
> why restrict this to spilling? doesn't every operator have a defined requir
That's true. Currently the requirement is 0 for non-spilling operators but that 
will change as IMPALA-4834 progresses.

Updated the comment accordingly to be clearer.


Line 281:   TQueryOptions queryOptions, TExplainLevel detailLevel) {
> what's the role of the query options?
It's needed to get mt_dop. For this particular interface it's at a high-level 
of abstraction so seems a little specific to be passing in mt_dop as an int.


Line 317:   PrintUtils.printInstances(" ", 
fragment_.getNumInstances(queryOptions)));
> why do we want this?
I think it makes it much easier to interpret the plans with mt_dop > 1. 
Otherwise it's not immediately obvious what parallelism each node executes 
with, and particularly how the per-instance estimates relate to the total 
per-host number.

You suggested in a different comment that we should move it to plan fragment, 
so I did that.


Line 322: PrintUtils.printPerInstanceMinBuffers(" ", 
perInstanceMinBufferBytes_));
> why make this conditional?
The output would be confusing if it was user-visible because the backend exec 
nodes often execute with less memory than them minimum buffers (because of the 
small buffers optimisation, etc). It'll be more actionable once we have true 
reservations.


http://gerrit.cloudera.org:8080/#/c/5847/8/testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test:

Line 10: |  hosts=1 instances=1 est-per-instance-mem=0B
> how come the number of hosts changed?
Because it's a single-node plan - I adjusted the approach so that # hosts more 
accurately reflects how the planner expects the plan to execute.

It's now based on PlanFragment.getNumNodes() (the # of instances of the  
fragment expected) vs PlanNode.getNumNodes() (an estimate that feeds into the 
decision how to partition a fragment). 

I think this will be even clearer if I move hosts/instances to the fragment 
level.

The previous approach led to various anomalies, e.g. hosts could be different 
for PlanNodes in the same fragment.


http://gerrit.cloudera.org:8080/#/c/5847/5/testdata/workloads/functional-planner/queries/PlannerTest/kudu-selectivity.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/kudu-selectivity.test:

Line 15:  hosts=1 instances=1 est-per-instance-mem=0B
> regarding the number of fragment instances: that's really a per-fragment (r
I ended up making this change at explain levels 2 and 3. I've  updated some of 
affected tests so you can see how the new output looks. If we agree I can go 
through and fix the remaining tests (I don't have a working kudu instance 
locally so it's a bit labour-intensive to update some of the tests).



[Impala-ASF-CR] IMPALA-3401: [DOCS] Physically remove Cloudera Manager info

2017-03-09 Thread John Russell (Code Review)
John Russell has posted comments on this change.

Change subject: IMPALA-3401: [DOCS] Physically remove Cloudera Manager info
..


Patch Set 3: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6325
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifc4a84527ae42c39b3717190b6cf669e17fff04b
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Ambreen Kazi 
Gerrit-Reviewer: John Russell 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3401: [DOCS] Physically remove Cloudera Manager info

2017-03-09 Thread John Russell (Code Review)
John Russell has posted comments on this change.

Change subject: IMPALA-3401: [DOCS] Physically remove Cloudera Manager info
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6325/2/docs/topics/impala_logging.xml
File docs/topics/impala_logging.xml:

PS2, Line 453: 
 : See
 : http://www.cloudera.com/documentation/enterprise/latest/topics/sg_redaction.html;
 scope="external" format="html"/>
 : for details about how to enable this feature and set
 : up the regular expressions to detect and redact 
sensitive information within SQL statement text.
 :   
> This should've been set to hidden and removed -- the content at that link i
That'll be an item for the next iteration where we deal with external links.


-- 
To view, visit http://gerrit.cloudera.org:8080/6325
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifc4a84527ae42c39b3717190b6cf669e17fff04b
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Ambreen Kazi 
Gerrit-Reviewer: John Russell 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3401: [DOCS] Physically remove Cloudera Manager info

2017-03-09 Thread Ambreen Kazi (Code Review)
Ambreen Kazi has posted comments on this change.

Change subject: IMPALA-3401: [DOCS] Physically remove Cloudera Manager info
..


Patch Set 3: Code-Review+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6325
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifc4a84527ae42c39b3717190b6cf669e17fff04b
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Ambreen Kazi 
Gerrit-Reviewer: John Russell 
Gerrit-HasComments: No


[native-toolchain-CR] IMPALA-4846: Upgrade Snappy to 1.1.4

2017-03-09 Thread Laszlo Gaal (Code Review)
Laszlo Gaal has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6336

Change subject: IMPALA-4846: Upgrade Snappy to 1.1.4
..

IMPALA-4846: Upgrade Snappy to 1.1.4

Snappy 1.1.4 (released Jan 25th, 2017) claims significant performance
improvements:
- improved compression performance by 5%
- improved decompression performance by 20%

Change-Id: I4fe07399e892837b9417f959bdaf967ca995be28
---
M buildall.sh
1 file changed, 2 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/native-toolchain 
refs/changes/36/6336/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6336
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I4fe07399e892837b9417f959bdaf967ca995be28
Gerrit-PatchSet: 1
Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-Owner: Laszlo Gaal 


[Impala-ASF-CR] IMPALA-3401: [DOCS] Physically remove Cloudera Manager info

2017-03-09 Thread John Russell (Code Review)
John Russell has posted comments on this change.

Change subject: IMPALA-3401: [DOCS] Physically remove Cloudera Manager info
..


Patch Set 3:

(3 comments)

Removed a couple more 'Cloudera' instances.

http://gerrit.cloudera.org:8080/#/c/6325/2/docs/topics/impala_logging.xml
File docs/topics/impala_logging.xml:

PS2, Line 372: 
> Cloudera ref - there's at least one more on this page.
Done


http://gerrit.cloudera.org:8080/#/c/6325/2/docs/topics/impala_txtfile.xml
File docs/topics/impala_txtfile.xml:

PS2, Line 489: 
> will this continue to be hosted on archive.cloudera? Flagging this because 
Done


http://gerrit.cloudera.org:8080/#/c/6325/2/docs/topics/impala_udf.xml
File docs/topics/impala_udf.xml:

PS2, Line 386: /cloudera/
> cloudera repo link to be replaced?
That'll be a discussion topic for the next iteration, not doing this time.


-- 
To view, visit http://gerrit.cloudera.org:8080/6325
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifc4a84527ae42c39b3717190b6cf669e17fff04b
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Ambreen Kazi 
Gerrit-Reviewer: John Russell 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3401: [DOCS] Physically remove Cloudera Manager info

2017-03-09 Thread John Russell (Code Review)
John Russell has uploaded a new patch set (#3).

Change subject: IMPALA-3401: [DOCS] Physically remove Cloudera Manager info
..

IMPALA-3401: [DOCS] Physically remove Cloudera Manager info

Followup from Laurel's code reviews, to physically
remove references to Cloudera Manager that were hidden.

Remove a few stray instances of Cloudera Manager that I found
still remaining in the source.

Fix up trailing spaces introduced during earlier
Cloudera Manager-related edits.

Also remove stray 'Cloudera' references, or stale/commented
Cloudera-specific info, noticed near other spots being edited.

Change-Id: Ifc4a84527ae42c39b3717190b6cf669e17fff04b
---
M docs/shared/impala_common.xml
M docs/topics/impala_admission.xml
M docs/topics/impala_auditing.xml
M docs/topics/impala_authorization.xml
M docs/topics/impala_breakpad.xml
M docs/topics/impala_config_options.xml
M docs/topics/impala_config_performance.xml
M docs/topics/impala_faq.xml
M docs/topics/impala_fixed_issues.xml
M docs/topics/impala_impala_shell.xml
M docs/topics/impala_incompatible_changes.xml
M docs/topics/impala_isilon.xml
M docs/topics/impala_kerberos.xml
M docs/topics/impala_logging.xml
M docs/topics/impala_new_features.xml
M docs/topics/impala_noncm_installation.xml
M docs/topics/impala_perf_resources.xml
M docs/topics/impala_perf_skew.xml
M docs/topics/impala_perf_testing.xml
M docs/topics/impala_prereqs.xml
M docs/topics/impala_proxy.xml
M docs/topics/impala_resource_management.xml
M docs/topics/impala_s3.xml
M docs/topics/impala_scalability.xml
M docs/topics/impala_schema_design.xml
M docs/topics/impala_timeouts.xml
M docs/topics/impala_troubleshooting.xml
M docs/topics/impala_txtfile.xml
M docs/topics/impala_udf.xml
M docs/topics/impala_webui.xml
30 files changed, 123 insertions(+), 839 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/6325/3
-- 
To view, visit http://gerrit.cloudera.org:8080/6325
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifc4a84527ae42c39b3717190b6cf669e17fff04b
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Ambreen Kazi 


[Impala-ASF-CR] IMPALA-5055: Fix DCHECK in parquet-column-readers.cc ReadPageHeader()

2017-03-09 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-5055: Fix DCHECK in parquet-column-readers.cc 
ReadPageHeader()
..


Patch Set 1:

My proposal is to keep the signature as it is, but initialize the status 
parameter to Status::OK() and do a private perf run to see if it has any 
negative impact on performance. If it does not, then I'd advocate to initialize 
it, since it may help prevent similar bugs in the future.

-- 
To view, visit http://gerrit.cloudera.org:8080/6328
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie22a8cf6b53f507c378c2efe302482409935184e
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] Impala-Shell: Add history max option

2017-03-09 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change.

Change subject: Impala-Shell: Add history_max option
..


Patch Set 1: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6335/1//COMMIT_MSG
Commit Message:

Line 7: Impala-Shell: Add history_max option
I think it will be useful to create a JIRA and refer that here, so Impala users 
will find it easier to refer back to this.


-- 
To view, visit http://gerrit.cloudera.org:8080/6335
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iaf65bbecb8fd7f1105aac62b6745d6125a603d7f
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: Yes


[Impala-ASF-CR] Impala-Shell: Add history max option

2017-03-09 Thread Zach Amsden (Code Review)
Zach Amsden has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6335

Change subject: Impala-Shell: Add history_max option
..

Impala-Shell: Add history_max option

Allow users to keep a longer history of queries if desired.  I
personally find it useful to keep a long history of queries to
reference and want to bump this up to a very large value, but
keep the default reasonable.

Testing: Created .impalarc as follows, now getting more history saved.

[impala]
history_max=1000

Change-Id: Iaf65bbecb8fd7f1105aac62b6745d6125a603d7f
---
M shell/impala_shell.py
M shell/impala_shell_config_defaults.py
2 files changed, 21 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/35/6335/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6335
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Iaf65bbecb8fd7f1105aac62b6745d6125a603d7f
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 


[Impala-ASF-CR] IMPALA-3401: [DOCS] Physically remove Cloudera Manager info

2017-03-09 Thread Ambreen Kazi (Code Review)
Ambreen Kazi has posted comments on this change.

Change subject: IMPALA-3401: [DOCS] Physically remove Cloudera Manager info
..


Patch Set 2:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/6325/2/docs/topics/impala_logging.xml
File docs/topics/impala_logging.xml:

PS2, Line 372: Cloudera
Cloudera ref - there's at least one more on this page.


PS2, Line 453: 
 : See
 : http://www.cloudera.com/documentation/enterprise/latest/topics/sg_redaction.html;
 scope="external" format="html"/>
 : for details about how to enable this feature and set
 : up the regular expressions to detect and redact 
sensitive information within SQL statement text.
 :   
This should've been set to hidden and removed -- the content at that link is 
very strictly geared towards CM. We need clarification on how redaction works 
with standalone Impala on hadoop. How it can be enabled and what the default 
setting is.


http://gerrit.cloudera.org:8080/#/c/6325/2/docs/topics/impala_txtfile.xml
File docs/topics/impala_txtfile.xml:

PS2, Line 489: Cloudera
will this continue to be hosted on archive.cloudera? Flagging this because 
Cloudera was removed from the previous paragraph that talks about this package, 
but not here.


http://gerrit.cloudera.org:8080/#/c/6325/2/docs/topics/impala_udf.xml
File docs/topics/impala_udf.xml:

PS2, Line 386: /cloudera/
cloudera repo link to be replaced?


-- 
To view, visit http://gerrit.cloudera.org:8080/6325
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifc4a84527ae42c39b3717190b6cf669e17fff04b
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Ambreen Kazi 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3586: Implement union passthrough

2017-03-09 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-3586: Implement union passthrough
..


Patch Set 13:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/5816/13/be/src/exec/union-node.cc
File be/src/exec/union-node.cc:

Line 156: // It is OK to close the child here because all RowBatches 
have already been
... because all tuple data has been copied, and we will not be calling ...


Line 174: child_batch_.reset();
move this before ++child_idx_ to have them clustered?


http://gerrit.cloudera.org:8080/#/c/5816/13/be/src/exec/union-node.h
File be/src/exec/union-node.h:

Line 67:   /// Children that can be passed through, without evaluating and 
materializing their
single line


Line 109:   inline bool AtEos(int per_fragment_instance_idx) const {
Maybe it's clearer to make GetNextPassThrough() not set eos, and instead have 
all the *eos setting be done directly inside GetNext(). That way each place 
different place can do the appropriate checks.


Line 119: DCHECK_LE(child_idx_, children_.size());
Checking of the child_idx_ is dependent on the aller pattern of advancing the 
child_idx_ and calling into this AtEos(), another argument for maybe inlining 
the checks into GetNext() to make this subtle point less disconnected.


-- 
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 13
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5055: Fix DCHECK in parquet-column-readers.cc ReadPageHeader()

2017-03-09 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-5055: Fix DCHECK in parquet-column-readers.cc 
ReadPageHeader()
..


Patch Set 1:

Since we expect no performance penalty and consumers shouldn't rely on status 
being untouched, can initialize it to OK() in GetBytes and do a private perf 
run to see whether it has a perf impact? I think that would eliminate a 
potential cause for future problems.

-- 
To view, visit http://gerrit.cloudera.org:8080/6328
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie22a8cf6b53f507c378c2efe302482409935184e
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5055: Fix DCHECK in parquet-column-readers.cc ReadPageHeader()

2017-03-09 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5055: Fix DCHECK in parquet-column-readers.cc 
ReadPageHeader()
..


Patch Set 1: Code-Review+2

I spoke to Joe offline and he mentioned that changing the return type to Status 
involved touching many more things. Given that I'm happy with this fix.

-- 
To view, visit http://gerrit.cloudera.org:8080/6328
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie22a8cf6b53f507c378c2efe302482409935184e
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4929: Safe concurrent access to IR function call graph

2017-03-09 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4929: Safe concurrent access to IR function call graph
..


Patch Set 3:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/367/

-- 
To view, visit http://gerrit.cloudera.org:8080/6326
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1acd6bad80341121c8189d817e0fe62f2862f28a
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4929: Safe concurrent access to IR function call graph

2017-03-09 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-4929: Safe concurrent access to IR function call graph
..


Patch Set 3: Code-Review+2

Carry Tim's +2 forward.

-- 
To view, visit http://gerrit.cloudera.org:8080/6326
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1acd6bad80341121c8189d817e0fe62f2862f28a
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4929: Safe concurrent access to IR function call graph

2017-03-09 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-4929: Safe concurrent access to IR function call graph
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6326/2/be/src/codegen/codegen-callgraph.cc
File be/src/codegen/codegen-callgraph.cc:

PS2, Line 71: std::
> std:: not needed.
Done


http://gerrit.cloudera.org:8080/#/c/6326/2/be/src/codegen/codegen-callgraph.h
File be/src/codegen/codegen-callgraph.h:

Line 57:   CallGraph call_graph_;
> Maybe this is a bit overkill, but moving this (and the Init() function) to 
Thanks for the idea. The public interfaces of this class: GetCallees() and 
fns_referenced_by_gv() both have const attribute in their return type so it 
should get most of what we need for preventing mutation outside of the class. 
Of course, one can always do const_cast() but I guess we should catch them in 
code review.

The bool inited_ will catch re-initialization of the map as this should not 
occur.


-- 
To view, visit http://gerrit.cloudera.org:8080/6326
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1acd6bad80341121c8189d817e0fe62f2862f28a
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4929: Safe concurrent access to IR function call graph

2017-03-09 Thread Zach Amsden (Code Review)
Zach Amsden has posted comments on this change.

Change subject: IMPALA-4929: Safe concurrent access to IR function call graph
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6326/2/be/src/codegen/codegen-callgraph.h
File be/src/codegen/codegen-callgraph.h:

Line 57:   bool inited_;
Maybe this is a bit overkill, but moving this (and the Init() function) to a 
nested class that only allows access to the map through a const reference 
(returned by .get()) would make it impossible to mutate the map after 
initialization, and the compiler will catch you at it if you try.


-- 
To view, visit http://gerrit.cloudera.org:8080/6326
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1acd6bad80341121c8189d817e0fe62f2862f28a
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4929: Safe concurrent access to IR function call graph

2017-03-09 Thread Michael Ho (Code Review)
Hello Tim Armstrong,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6326

to look at the new patch set (#3).

Change subject: IMPALA-4929: Safe concurrent access to IR function call graph
..

IMPALA-4929: Safe concurrent access to IR function call graph

Previously, LlvmCodeGen creates a map (fn_refs_map_) which
maps an IR function to its set of callees. That map was initialized
once at Impalad's start time and it is supposed to be read-only
afterwards.

However, the map was unintentionally modified by its user
LlvmCodeGen::MaterializeFunctionHelper() due to the use of
operator []. In particular, that operator may create a new
entry in the map if it doesn't exist already. This is possible
because the map initially only contains entries for functions
with at least one callee function. Using the operator [] with
functions with no callee function may modify the map. Since the
map is shared by all fragment instances, such unsafe concurrent
modification can cause missing or wrong callee functions to be
materialized, leading to failure to resolve symbols in LLVM.

This change fixes the problem above by:
1. Create an entry for all functions even if it has no callee.
   In fact, LlvmCodeGen::LinkModule() assumes all functions
   defined in main module will have entries in the map.

2. Introduce a new class CodegenCallGraph which encapsulates
   the map described above. The map was established once at
   initialization time and there is no interface to modify the
   map afterwards, thus preventing any accidental modification
   to the map at run time.

Change-Id: I1acd6bad80341121c8189d817e0fe62f2862f28a
---
M be/src/codegen/CMakeLists.txt
A be/src/codegen/codegen-callgraph.cc
A be/src/codegen/codegen-callgraph.h
M be/src/codegen/llvm-codegen.cc
M be/src/codegen/llvm-codegen.h
5 files changed, 188 insertions(+), 81 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/6326/3
-- 
To view, visit http://gerrit.cloudera.org:8080/6326
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1acd6bad80341121c8189d817e0fe62f2862f28a
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zach Amsden 


[Impala-ASF-CR] IMPALA-3586: Implement union passthrough

2017-03-09 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#13).

Change subject: IMPALA-3586: Implement union passthrough
..

IMPALA-3586: Implement union passthrough

The union node acts as pass through operator and forwards row batches
from it's children without materializing. This is done in the case
when the child's tuple layout is identical to union node tuple layout
and no functions need to be applied to the child row batches.

A new query option DISABLE_UNION_PASSTHROUGH was added in this patch
as a precaution and for testing purposes.

Testing:
- Added new planner and end to end tests that cover the new
  functionality.
- Updated existing tests to reflect the new behavior.

Perf:
Ran a benchmark on a local 10 GB tpcds dataset. I used an unpartitioned
version of the store_sales table. There was over a 2x performance
improvement for the following query:

SELECT
  COUNT(ss_sold_time_sk),
  COUNT(ss_item_sk),
  COUNT(ss_customer_sk),
  COUNT(ss_cdemo_sk),
  COUNT(ss_hdemo_sk),
  COUNT(ss_addr_sk),
  COUNT(ss_store_sk),
  COUNT(ss_promo_sk),
  COUNT(ss_ticket_number),
  COUNT(ss_quantity),
  COUNT(ss_wholesale_cost),
  COUNT(ss_list_price),
  COUNT(ss_sales_price),
  COUNT(ss_ext_discount_amt),
  COUNT(ss_ext_sales_price),
  COUNT(ss_ext_wholesale_cost),
  COUNT(ss_ext_list_price),
  COUNT(ss_ext_tax),
  COUNT(ss_coupon_amt),
  COUNT(ss_net_paid),
  COUNT(ss_net_paid_inc_tax),
  COUNT(ss_net_profit),
  COUNT(ss_sold_date_sk)
FROM (
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
) t

Before:
Total Time: 43s164ms

Summary:
Operator  #Hosts   Avg Time   Max Time#Rows  Est. #Rows   Peak Mem  
Est. Peak Mem  Detail
--
13:AGGREGATE   1  224.721us  224.721us1   1   28.00 KB  
  -1.00 B  FINALIZE
12:EXCHANGE1   24.578us   24.578us3   1  0  
  -1.00 B  UNPARTITIONED
11:AGGREGATE   32s402ms3s060ms3   1  119.00 KB  
 10.00 MB
00:UNION   3   35s380ms   37s846ms  288.01M 288.01M3.08 MB  
0
|--02:SCAN HDFS3  184.197ms  219.931ms   28.80M  28.80M  535.03 MB  
  1.88 GB  store_sales_unpartitioned
|--03:SCAN HDFS3  131.956ms  153.401ms   28.80M  28.80M  534.98 MB  
  1.88 GB  store_sales_unpartitioned
|--04:SCAN HDFS3  178.456ms  247.721ms   28.80M  28.80M  534.98 MB  
  1.88 GB  store_sales_unpartitioned
|--05:SCAN HDFS3  189.398ms  242.251ms   28.80M  28.80M  535.01 MB  
  1.88 GB  store_sales_unpartitioned
|--06:SCAN HDFS3  122.786ms  156.528ms   28.80M  28.80M  534.98 MB  
  1.88 GB  store_sales_unpartitioned
|--07:SCAN HDFS3  147.467ms  183.391ms   28.80M  28.80M  535.13 MB  
  1.88 GB  store_sales_unpartitioned
|--08:SCAN HDFS3  147.502ms  186.273ms   28.80M  28.80M  535.01 MB  
  1.88 GB  store_sales_unpartitioned
|--09:SCAN HDFS3  130.086ms  154.682ms   28.80M  28.80M  535.04 MB  
  1.88 GB  store_sales_unpartitioned
|--10:SCAN HDFS3  122.701ms  161.056ms   28.80M  28.80M  534.89 MB  
  1.88 GB  store_sales_unpartitioned
01:SCAN HDFS   3  287.863ms  330.436ms   28.80M  28.80M  534.98 MB  
  1.88 GB  store_sales_unpartitioned

After:
Total Time: 20s660ms

Summary:
Operator  #Hosts   Avg Time   Max Time#Rows  Est. #Rows   Peak Mem  
Est. Peak Mem  Detail
--
13:AGGREGATE   1  187.474us  187.474us1   1   28.00 KB  
  -1.00 B  FINALIZE
12:EXCHANGE1   15.238us   15.238us3   1  0  
  -1.00 B  UNPARTITIONED
11:AGGREGATE   32s958ms3s749ms3   13.08 MB  
 10.00 MB
00:UNION   3  211.510ms  224.667ms  288.01M 288.01M  0  
0
|--02:SCAN HDFS31s637ms1s734ms   28.80M  28.80M  528.68 MB  
  1.88 GB  store_sales_unpartitioned
|--03:SCAN HDFS31s697ms1s708ms   28.80M  28.80M  528.48 MB  
  

[Impala-ASF-CR] IMPALA-3586: Implement union passthrough

2017-03-09 Thread Taras Bobrovytsky (Code Review)
Hello Alex Behm, Dan Hecht,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5816

to look at the new patch set (#13).

Change subject: IMPALA-3586: Implement union passthrough
..

IMPALA-3586: Implement union passthrough

The union node acts as pass through operator and forwards row batches
from it's children without materializing. This is done in the case
when the child's tuple layout is identical to union node tuple layout
and no functions need to be applied to the child row batches.

A new query option DISABLE_UNION_PASSTHROUGH was added in this patch
as a precaution and for testing purposes.

Testing:
- Added new planner and end to end tests that cover the new
  functionality.
- Updated existing tests to reflect the new behavior.

Perf:
Ran a benchmark on a local 10 GB tpcds dataset. I used an unpartitioned
version of the store_sales table. There was over a 2x performance
improvement for the following query:

SELECT
  COUNT(ss_sold_time_sk),
  COUNT(ss_item_sk),
  COUNT(ss_customer_sk),
  COUNT(ss_cdemo_sk),
  COUNT(ss_hdemo_sk),
  COUNT(ss_addr_sk),
  COUNT(ss_store_sk),
  COUNT(ss_promo_sk),
  COUNT(ss_ticket_number),
  COUNT(ss_quantity),
  COUNT(ss_wholesale_cost),
  COUNT(ss_list_price),
  COUNT(ss_sales_price),
  COUNT(ss_ext_discount_amt),
  COUNT(ss_ext_sales_price),
  COUNT(ss_ext_wholesale_cost),
  COUNT(ss_ext_list_price),
  COUNT(ss_ext_tax),
  COUNT(ss_coupon_amt),
  COUNT(ss_net_paid),
  COUNT(ss_net_paid_inc_tax),
  COUNT(ss_net_profit),
  COUNT(ss_sold_date_sk)
FROM (
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select * from tpcds_10_parquet.store_sales_unpartitioned
) t

Before:
Total Time: 43s164ms

Summary:
Operator  #Hosts   Avg Time   Max Time#Rows  Est. #Rows   Peak Mem  
Est. Peak Mem  Detail
--
13:AGGREGATE   1  224.721us  224.721us1   1   28.00 KB  
  -1.00 B  FINALIZE
12:EXCHANGE1   24.578us   24.578us3   1  0  
  -1.00 B  UNPARTITIONED
11:AGGREGATE   32s402ms3s060ms3   1  119.00 KB  
 10.00 MB
00:UNION   3   35s380ms   37s846ms  288.01M 288.01M3.08 MB  
0
|--02:SCAN HDFS3  184.197ms  219.931ms   28.80M  28.80M  535.03 MB  
  1.88 GB  store_sales_unpartitioned
|--03:SCAN HDFS3  131.956ms  153.401ms   28.80M  28.80M  534.98 MB  
  1.88 GB  store_sales_unpartitioned
|--04:SCAN HDFS3  178.456ms  247.721ms   28.80M  28.80M  534.98 MB  
  1.88 GB  store_sales_unpartitioned
|--05:SCAN HDFS3  189.398ms  242.251ms   28.80M  28.80M  535.01 MB  
  1.88 GB  store_sales_unpartitioned
|--06:SCAN HDFS3  122.786ms  156.528ms   28.80M  28.80M  534.98 MB  
  1.88 GB  store_sales_unpartitioned
|--07:SCAN HDFS3  147.467ms  183.391ms   28.80M  28.80M  535.13 MB  
  1.88 GB  store_sales_unpartitioned
|--08:SCAN HDFS3  147.502ms  186.273ms   28.80M  28.80M  535.01 MB  
  1.88 GB  store_sales_unpartitioned
|--09:SCAN HDFS3  130.086ms  154.682ms   28.80M  28.80M  535.04 MB  
  1.88 GB  store_sales_unpartitioned
|--10:SCAN HDFS3  122.701ms  161.056ms   28.80M  28.80M  534.89 MB  
  1.88 GB  store_sales_unpartitioned
01:SCAN HDFS   3  287.863ms  330.436ms   28.80M  28.80M  534.98 MB  
  1.88 GB  store_sales_unpartitioned

After:
Total Time: 20s660ms

Summary:
Operator  #Hosts   Avg Time   Max Time#Rows  Est. #Rows   Peak Mem  
Est. Peak Mem  Detail
--
13:AGGREGATE   1  187.474us  187.474us1   1   28.00 KB  
  -1.00 B  FINALIZE
12:EXCHANGE1   15.238us   15.238us3   1  0  
  -1.00 B  UNPARTITIONED
11:AGGREGATE   32s958ms3s749ms3   13.08 MB  
 10.00 MB
00:UNION   3  211.510ms  224.667ms  288.01M 288.01M  0  
0
|--02:SCAN HDFS31s637ms1s734ms   28.80M  28.80M  528.68 MB  
  1.88 GB  

[Impala-ASF-CR] IMPALA-4996: Single-threaded KuduScanNode

2017-03-09 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change.

Change subject: IMPALA-4996: Single-threaded KuduScanNode
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6312/1/be/src/exec/kudu-scan-node-base.h
File be/src/exec/kudu-scan-node-base.h:

Line 35: /// are used to retrieve the rows for this scan.
You should make this comment more specific to KuduScanNodeBase, eg. "Base class 
for single and multi-threaded versions of..." and "TODO: remove this once all 
scanners support multithreaded execution..."


PS1, Line 98: //
///


-- 
To view, visit http://gerrit.cloudera.org:8080/6312
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6e4593300e376bc508b78acaea64ffdd2c73a67a
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5041: AuthManager::Init() is not idempotent

2017-03-09 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6333

Change subject: IMPALA-5041: AuthManager::Init() is not idempotent
..

IMPALA-5041: AuthManager::Init() is not idempotent

We use a global variable to track the completion of the
kerberos environment setup which shouldn't be the case since
we would want to re-setup the environment if we call
AuthManager::Init() with a different configuration.

Also, the global variable 'env_setup_complete_' was used so that
we don't set the enviornment twice (once for internal transport and
once for external). This patch ensures that it's still the case.

Change-Id: I12cc210aa422f077446ea728ebf76921351417b0
---
M be/src/rpc/auth-provider.h
M be/src/rpc/authentication.cc
M be/src/rpc/authentication.h
3 files changed, 11 insertions(+), 16 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/6333/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6333
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I12cc210aa422f077446ea728ebf76921351417b0
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 


[Impala-ASF-CR] IMPALA-4929: Safe concurrent access to IR function call graph

2017-03-09 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4929: Safe concurrent access to IR function call graph
..


Patch Set 2: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6326
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1acd6bad80341121c8189d817e0fe62f2862f28a
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4929: Safe concurrent access to IR function call graph

2017-03-09 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4929: Safe concurrent access to IR function call graph
..


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/6326/1/be/src/codegen/codegen-callgraph.cc
File be/src/codegen/codegen-callgraph.cc:

Line 87: call_graph_[key].insert(fn_name);
> The entry may not exist already. Note the entry creation above is for 'fn_n
Ah got it


http://gerrit.cloudera.org:8080/#/c/6326/2/be/src/codegen/codegen-callgraph.cc
File be/src/codegen/codegen-callgraph.cc:

PS2, Line 71: http://gerrit.cloudera.org:8080/#/c/6326/1/be/src/codegen/codegen-callgraph.h
File be/src/codegen/codegen-callgraph.h:

Line 21: #include 
> Good point but it appears that two closely related files llvm-codegen.cc an
I believe the std:: and boost:: versions have different perf characteristics 
because the c++ standard has some restrictions around how iterators behave. So 
they might not be a simple drop-in replacement.

E.g.
https://tinodidriksen.com/2010/04/cpp-set-performance/


-- 
To view, visit http://gerrit.cloudera.org:8080/6326
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1acd6bad80341121c8189d817e0fe62f2862f28a
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4929: Safe concurrent access to IR function call graph

2017-03-09 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: IMPALA-4929: Safe concurrent access to IR function call graph
..


Patch Set 1:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/6326/1/be/src/codegen/codegen-callgraph.cc
File be/src/codegen/codegen-callgraph.cc:

Line 27: using namespace strings;
> You can use common/names.h instead of the string and unordered_* usings.
Done


Line 35:   Status status =
> Can combine this with the below line.
Too long after converting to nullptr.


PS1, Line 71: boost::
> I believe this creates two unordered_sets (the Lvalue and Rvalue), then cop
Done


Line 87: call_graph_[key].insert(fn_name);
> I think we can use find(key)->insert here, since we don't really want the b
The entry may not exist already. Note the entry creation above is for 
'fn_name'. Auto-creating entries here is fine as we are constructing the map. 
We just don't want this to happen after initialization.


http://gerrit.cloudera.org:8080/#/c/6326/1/be/src/codegen/codegen-callgraph.h
File be/src/codegen/codegen-callgraph.h:

Line 21: #include 
> Use std::unordered_set and _map when possible
Good point but it appears that two closely related files llvm-codegen.cc and 
libcache.cc are already using boost::unordered_* so I'd avoid mixing both to 
avoid confusion or to further extend the scope of this change. We should clean 
up all these places in one shot instead.


Line 29: class CodegenCallGraph {
> One line class comment?
Done


PS1, Line 39: inline
> I believe "inline" is implied for all methods defined in the class body: 
Done


Line 43: if (LIKELY(iter != call_graph_.end())) {
> Consider using  ? : to make this a one-liner
Done


PS1, Line 46: NULL
> nullptr?
Done


PS1, Line 51: inline
> unnecessary inline
Done


Line 71:   /// and return them in 'users'.
> "append them to 'users'", to make it clear that it doesn't clear existing c
Done


http://gerrit.cloudera.org:8080/#/c/6326/1/be/src/codegen/llvm-codegen.cc
File be/src/codegen/llvm-codegen.cc:

PS1, Line 290: NULL
> nit: we've mostly standardised on nullptr
Done


PS1, Line 594: (*callees)
> nit: unnecessary parens
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6326
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1acd6bad80341121c8189d817e0fe62f2862f28a
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4929: Safe concurrent access to IR function call graph

2017-03-09 Thread Michael Ho (Code Review)
Michael Ho has uploaded a new patch set (#2).

Change subject: IMPALA-4929: Safe concurrent access to IR function call graph
..

IMPALA-4929: Safe concurrent access to IR function call graph

Previously, LlvmCodeGen creates a map (fn_refs_map_) which
maps an IR function to its set of callees. That map was initialized
once at Impalad's start time and it is supposed to be read-only
afterwards.

However, the map was unintentionally modified by its user
LlvmCodeGen::MaterializeFunctionHelper() due to the use of
operator []. In particular, that operator may create a new
entry in the map if it doesn't exist already. This is possible
because the map initially only contains entries for functions
with at least one callee function. Using the operator [] with
functions with no callee function may modify the map. Since the
map is shared by all fragment instances, such unsafe concurrent
modification can cause missing or wrong callee functions to be
materialized, leading to failure to resolve symbols in LLVM.

This change fixes the problem above by:
1. Create an entry for all functions even if it has no callee.
   In fact, LlvmCodeGen::LinkModule() assumes all functions
   defined in main module will have entries in the map.

2. Introduce a new class CodegenCallGraph which encapsulates
   the map described above. The map was established once at
   initialization time and there is no interface to modify the
   map afterwards, thus preventing any accidental modification
   to the map at run time.

Change-Id: I1acd6bad80341121c8189d817e0fe62f2862f28a
---
M be/src/codegen/CMakeLists.txt
A be/src/codegen/codegen-callgraph.cc
A be/src/codegen/codegen-callgraph.h
M be/src/codegen/llvm-codegen.cc
M be/src/codegen/llvm-codegen.h
5 files changed, 188 insertions(+), 81 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/6326/2
-- 
To view, visit http://gerrit.cloudera.org:8080/6326
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1acd6bad80341121c8189d817e0fe62f2862f28a
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5055: Fix DCHECK in parquet-column-readers.cc ReadPageHeader()

2017-03-09 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5055: Fix DCHECK in parquet-column-readers.cc 
ReadPageHeader()
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6328/1/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

Line 802: bool success = stream_->GetBytes(
> What would be the benefit of the bool* if we always return a Status?
We use this pattern when things are retryable (e.g. in a lot of spilling code, 
false + OK status means "we didn't succeed, but you should free memory and 
retry).

You make a good point - GetBytes() doesn't actually do that. We could just 
change it to return a Status.

I wonder if this is some vestige of an optimisation from when Status was much 
more expensive.


-- 
To view, visit http://gerrit.cloudera.org:8080/6328
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie22a8cf6b53f507c378c2efe302482409935184e
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5055: Fix DCHECK in parquet-column-readers.cc ReadPageHeader()

2017-03-09 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5055: Fix DCHECK in parquet-column-readers.cc 
ReadPageHeader()
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6328/1/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

Line 802: bool success = stream_->GetBytes(
I feel like this pattern (returning a bool and setting status on error) is 
pretty error-prone. In a couple of perf-critical places we use it because it's 
faster than returning a Status (e.g. partitioned-hash-join-node), but I don't 
see a reason to use it here.

We could consider inverting it for GetBytes() - returning Status and setting a  
bool* to indicate success. What do you think?


-- 
To view, visit http://gerrit.cloudera.org:8080/6328
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie22a8cf6b53f507c378c2efe302482409935184e
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4929: Safe concurrent access to IR function call graph

2017-03-09 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: IMPALA-4929: Safe concurrent access to IR function call graph
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6326/1/be/src/codegen/codegen-callgraph.h
File be/src/codegen/codegen-callgraph.h:

Line 21: #include 
Use std::unordered_set and _map when possible


-- 
To view, visit http://gerrit.cloudera.org:8080/6326
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1acd6bad80341121c8189d817e0fe62f2862f28a
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5055: Fix DCHECK in parquet-column-readers.cc ReadPageHeader()

2017-03-09 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-5055: Fix DCHECK in parquet-column-readers.cc 
ReadPageHeader()
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6328/1//COMMIT_MSG
Commit Message:

Line 13: I verified that the other uses of status are ok. Most do not check 
status.ok()
I think we should change GetBytes() to always initialize status. The function 
comment even seems vague on this, so we should update it, too.

Does any of the other uses rely on the fact that status does not get changed if 
no error occurs? If so, I think we should revisit that code.


-- 
To view, visit http://gerrit.cloudera.org:8080/6328
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie22a8cf6b53f507c378c2efe302482409935184e
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5055: Fix DCHECK in parquet-column-readers.cc ReadPageHeader()

2017-03-09 Thread Joe McDonnell (Code Review)
Joe McDonnell has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6328

Change subject: IMPALA-5055: Fix DCHECK in parquet-column-readers.cc 
ReadPageHeader()
..

IMPALA-5055: Fix DCHECK in parquet-column-readers.cc ReadPageHeader()

GetBytes only sets status in the case of an error. This means that 
ReadPageHeader
needs to initialize the status variable so that the status.ok() check is
accurate after the GetBytes call.

I verified that the other uses of status are ok. Most do not check status.ok()
directly, but rely on the return value of the function setting status.

Change-Id: Ie22a8cf6b53f507c378c2efe302482409935184e
---
M be/src/exec/parquet-column-readers.cc
1 file changed, 1 insertion(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/6328/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6328
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ie22a8cf6b53f507c378c2efe302482409935184e
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell 


[Impala-ASF-CR] IMPALA-4929: Safe concurrent access to IR function call graph

2017-03-09 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4929: Safe concurrent access to IR function call graph
..


Patch Set 1:

(12 comments)

Thanks for the cleanup - I think this makes it easier to understand the 
callgraph

http://gerrit.cloudera.org:8080/#/c/6326/1/be/src/codegen/codegen-callgraph.cc
File be/src/codegen/codegen-callgraph.cc:

Line 27: using namespace strings;
You can use common/names.h instead of the string and unordered_* usings.


Line 35:   Status status =
Can combine this with the below line.


PS1, Line 71: boost::
I believe this creates two unordered_sets (the Lvalue and Rvalue), then copies 
the rvalue to the lvalue then destroys the rvalue. I think if you use 
emplace(fn_name) that just initialises the set inside the map: 
http://en.cppreference.com/w/cpp/container/unordered_map/emplace

The boost:: package name is also not needed.


Line 87: call_graph_[key].insert(fn_name);
I think we can use find(key)->insert here, since we don't really want the 
behaviour of auto-creating entries.


http://gerrit.cloudera.org:8080/#/c/6326/1/be/src/codegen/codegen-callgraph.h
File be/src/codegen/codegen-callgraph.h:

Line 29: class CodegenCallGraph {
One line class comment?


PS1, Line 39: inline
I believe "inline" is implied for all methods defined in the class body: 

https://isocpp.org/wiki/faq/inline-functions#inline-member-fns-more


Line 43: if (LIKELY(iter != call_graph_.end())) {
Consider using  ? : to make this a one-liner


PS1, Line 46: NULL
nullptr?


PS1, Line 51: inline
unnecessary inline


Line 71:   /// and return them in 'users'.
"append them to 'users'", to make it clear that it doesn't clear existing 
contents.


http://gerrit.cloudera.org:8080/#/c/6326/1/be/src/codegen/llvm-codegen.cc
File be/src/codegen/llvm-codegen.cc:

PS1, Line 290: NULL
nit: we've mostly standardised on nullptr


PS1, Line 594: (*callees)
nit: unnecessary parens


-- 
To view, visit http://gerrit.cloudera.org:8080/6326
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1acd6bad80341121c8189d817e0fe62f2862f28a
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3973: optional 2nd and 3rd arguments for instr().

2017-03-09 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: IMPALA-3973: optional 2nd and 3rd arguments for instr().
..


Patch Set 2:

> > Since I got the +1 from Zoltan, and I'd classify the subsequent
 > > suggestions as nice-to-have but not essential, any objection to
 > my
 > > +2'ing it now and revisiting improvements later?
 > 
 > Yes. The way things work on the code side is that all comments are
 > addressed before submitting.

TO be even more specific: on the code side, all comments are addressed in the 
opinion of reviewer, not the patch author.

-- 
To view, visit http://gerrit.cloudera.org:8080/5589
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I17268bdb480230938f94559fe1eabe34ac2448b7
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Zoltan Ivanfi 
Gerrit-Reviewer: zi+z...@cloudera.com
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3973: optional 2nd and 3rd arguments for instr().

2017-03-09 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: IMPALA-3973: optional 2nd and 3rd arguments for instr().
..


Patch Set 2:

> Since I got the +1 from Zoltan, and I'd classify the subsequent
 > suggestions as nice-to-have but not essential, any objection to my
 > +2'ing it now and revisiting improvements later?

Yes. The way things work on the code side is that all comments are addressed 
before submitting.

-- 
To view, visit http://gerrit.cloudera.org:8080/5589
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I17268bdb480230938f94559fe1eabe34ac2448b7
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Zoltan Ivanfi 
Gerrit-Reviewer: zi+z...@cloudera.com
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5025: Update binutils to 2.26.1

2017-03-09 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-5025: Update binutils to 2.26.1
..


IMPALA-5025: Update binutils to 2.26.1

This release includes the fix for IMPALA-3507 and the fix for
https://sourceware.org/bugzilla/show_bug.cgi?id=19520, which
made libImpalaUdf.a unusable by older linkers.

Change-Id: Idcc00efc13b1ff4fe8ce3aafcff0b7880908ad5b
Reviewed-on: http://gerrit.cloudera.org:8080/6287
Reviewed-by: Tim Armstrong 
Tested-by: Impala Public Jenkins
---
M bin/impala-config.sh
1 file changed, 2 insertions(+), 2 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Tim Armstrong: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/6287
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Idcc00efc13b1ff4fe8ce3aafcff0b7880908ad5b
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5025: Update binutils to 2.26.1

2017-03-09 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5025: Update binutils to 2.26.1
..


Patch Set 2: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6287
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Idcc00efc13b1ff4fe8ce3aafcff0b7880908ad5b
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No