[Impala-ASF-CR] IMPALA-4437: fix crash in disk-io-mgr

2016-11-08 Thread Internal Jenkins (Code Review)
Internal Jenkins has submitted this change and it was merged.

Change subject: IMPALA-4437: fix crash in disk-io-mgr
..


IMPALA-4437: fix crash in disk-io-mgr

This fixes another issue where the 'buffer_' field was not set to NULL
on an error, triggering a DCHECK.

Testing:
Added a unit test that triggers the bug on the two different codepaths
that I fixed.

Change-Id: Ib76cf5ba8d368b2b37bdc1d2133b8ddcb39f9e00
Reviewed-on: http://gerrit.cloudera.org:8080/4979
Reviewed-by: Dan Hecht 
Tested-by: Internal Jenkins
---
M be/src/runtime/disk-io-mgr-test.cc
M be/src/runtime/disk-io-mgr.cc
2 files changed, 40 insertions(+), 0 deletions(-)

Approvals:
  Internal Jenkins: Verified
  Dan Hecht: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/4979
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ib76cf5ba8d368b2b37bdc1d2133b8ddcb39f9e00
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Internal Jenkins


[Impala-ASF-CR] IMPALA-4437: fix crash in disk-io-mgr

2016-11-08 Thread Internal Jenkins (Code Review)
Internal Jenkins has posted comments on this change.

Change subject: IMPALA-4437: fix crash in disk-io-mgr
..


Patch Set 1: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/4979
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib76cf5ba8d368b2b37bdc1d2133b8ddcb39f9e00
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Internal Jenkins
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2523: Make HdfsTableSink aware of clustered input

2016-11-08 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-2523: Make HdfsTableSink aware of clustered input
..


Patch Set 11:

Passed private S3 run with only query_tests enabled here.

Relevant lines for S3:
02:22:40.508 [gw2] PASSED 
query_test/test_insert_behaviour.py::TestInsertBehaviour::test_clustered_partition_single_file
 
02:25:23.115 [gw2] PASSED 
query_test/test_insert_behaviour.py::TestInsertBehaviour::test_clustered_partition_multiple_files

-- 
To view, visit http://gerrit.cloudera.org:8080/4863
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibeda0bdabbfe44c8ac95bf7c982a75649e1b82d0
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2523: Make HdfsTableSink aware of clustered input

2016-11-08 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-2523: Make HdfsTableSink aware of clustered input
..


Patch Set 11:

Marcel, can you have a look for a +2? Thanks.

-- 
To view, visit http://gerrit.cloudera.org:8080/4863
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibeda0bdabbfe44c8ac95bf7c982a75649e1b82d0
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] Preview: IMPALA-4363: Add timestamp validation

2016-11-08 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: Preview: IMPALA-4363: Add timestamp validation
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/4968/2/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

Line 510: DCHECK(false);
If we ever hit this by accident in a release build we might not want to return 
OK() but some error making it clear what went wrong.


http://gerrit.cloudera.org:8080/#/c/4968/2/common/thrift/generate_error_codes.py
File common/thrift/generate_error_codes.py:

Line 308:"File '$0' column '$1' contains invalid timestamp."),
> "Invalid" should be qualified a little more. The value is actually out of r
I think if anyhow possible it would be nice to output the invalid value itself, 
too, to make it easier to locate.


-- 
To view, visit http://gerrit.cloudera.org:8080/4968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4431: Add audit event log control mechanism to prevent disk overflow.

2016-11-08 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-4431: Add audit event log control mechanism to prevent 
disk  overflow.
..


Patch Set 3:

(5 comments)

Please mark comments as done and post replies here when you're done.

http://gerrit.cloudera.org:8080/#/c/4971/3/be/src/common/init.cc
File be/src/common/init.cc:

Line 69:  "to retain.The most recent audit event log files are 
retained. If set to 0, "
Please follow the indentation of the surrounding code


PS3, Line 69: .T
nit: space missing


http://gerrit.cloudera.org:8080/#/c/4971/1/be/src/common/logging.cc
File be/src/common/logging.cc:

Line 58: const string AUDIT_EVENT_LOG_FILE_PREFIX = 
"impala_audit_event_log_1.0-";
> Where in the code do we make sure that event log files are always written w
Please address this comment.


http://gerrit.cloudera.org:8080/#/c/4971/3/be/src/common/logging.cc
File be/src/common/logging.cc:

Line 185:   // e.g. /tmp/impalad.*.INFO.*
This line looks wrong, please update comment.


http://gerrit.cloudera.org:8080/#/c/4971/3/be/src/common/logging.h
File be/src/common/logging.h:

Line 90: /// and removes
line wrapping


-- 
To view, visit http://gerrit.cloudera.org:8080/4971
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8c3229cbdb6275f969c15258c9ccab6efeb24368
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Donghui Xu 
Gerrit-Reviewer: Lars Volker 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4365: Enabling end-to-end tests on a remote cluster

2016-11-08 Thread Internal Jenkins (Code Review)
Internal Jenkins has posted comments on this change.

Change subject: IMPALA-4365: Enabling end-to-end tests on a remote cluster
..


Patch Set 14: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/4769
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1f443a1728a1d28168090c6f54e82dec2cb073e9
Gerrit-PatchSet: 14
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: David Knupp 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Harrison Sheinblatt 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Martin Grund 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4365: Enabling end-to-end tests on a remote cluster

2016-11-08 Thread Internal Jenkins (Code Review)
Internal Jenkins has submitted this change and it was merged.

Change subject: IMPALA-4365: Enabling end-to-end tests on a remote cluster
..


IMPALA-4365: Enabling end-to-end tests on a remote cluster

This patch lays the groundwork for loading data and running end-to-end
tests on a remote CDH cluster. The requirements for the cluster to run
the tests are:

  - Managed by Cloudera Manager (CM)
  - GPL Extras need to be installed
  - KMS and KeyTrustee installed and available as a service
  - SERDEPROPERTIES in the Hive DB modified to accept wide tables
  - Hive warehouse dir points to /test-warehouse

The actual data loading is done via a new script, remote_data_load.py,
which takes the CM host as an argument. It can be run from a client
machine that is not a node of the cluster, but it needs to have the
Impala repo checked out and Impala built. This insures that all of the
necessary data load scripts are available, as well as setting up the
environment properly (client binaries like beeline and the hbase shell
are available, python libraries like cm_api are installed, necessary
environment variables are defined, etc.)

It should be noted that running remote_data_load.py will overwrite
any local XML config files with the configurations downloaded from
the remote cluster.

Usage: remote_data_load.py [options] 

Options:
  -h, --helpshow this help message and exit
  --snapshot-file=SNAPSHOT_FILE
Path to the test-warehouse archive
  --cm-user=CM_USER Cloudera Manager admin user
  --cm-pass=CM_PASS Cloudera Manager admin user password
  --gateway=GATEWAY Gateway host to upload the data from. If not
set, uses the CM host as gateway.
  --ssh-user=SSH_USER   System user on the remote machine with
passwordless SSH configured.
  --no-load Do not try to load the snapshot
  --exploration-strategy=EXPLORATION_STRATEGY
  --testRun end-to-end tests against cluster

Testing:

This patch is being submitted with the understanding that there are
still clean up issues that need to be addressed in the remote data
load script, for which JIRA's have been filed.

However, since many of the existing build scripts also had to be
modified, it is more important to make sure that no regressions were
inadvertently introduced into the existing data load process. Loading
data to a local mini-cluster was checked repeatedly while this patch
was being developed, as well as running it against the Jenkins job
that provides the test-warehouse snapshot used by the many other
Impala CI builds that run daily.

Change-Id: I1f443a1728a1d28168090c6f54e82dec2cb073e9
Reviewed-on: http://gerrit.cloudera.org:8080/4769
Reviewed-by: Taras Bobrovytsky 
Tested-by: Internal Jenkins
---
M bin/load-data.py
A bin/remote_data_load.py
M testdata/bin/compute-table-stats.sh
M testdata/bin/create-load-data.sh
M testdata/bin/create-table-many-blocks.sh
M testdata/bin/generate-schema-statements.py
M testdata/bin/load-test-warehouse-snapshot.sh
M testdata/bin/load_nested.py
M testdata/bin/run-step.sh
M testdata/bin/setup-hdfs-env.sh
M testdata/datasets/functional/schema_constraints.csv
11 files changed, 797 insertions(+), 64 deletions(-)

Approvals:
  Taras Bobrovytsky: Looks good to me, approved
  Internal Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/4769
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I1f443a1728a1d28168090c6f54e82dec2cb073e9
Gerrit-PatchSet: 15
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: David Knupp 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Harrison Sheinblatt 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Martin Grund 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 


[Impala-ASF-CR] IMPALA-4431: Add audit event log control mechanism to prevent disk overflow.

2016-11-08 Thread Donghui Xu (Code Review)
Donghui Xu has uploaded a new patch set (#4).

Change subject: IMPALA-4431: Add audit event log control mechanism to prevent 
disk  overflow.
..

IMPALA-4431: Add audit event log control mechanism to prevent disk 
overflow.

There is no limit to the number of audit event log files. When audit 
event log is enabled,the growing number of log files will fill up the 
disk space.

This patch adds checking and rotation mechanism on audit event log 
files to prevent file number out of control which causes disk overflow.

Change-Id: I8c3229cbdb6275f969c15258c9ccab6efeb24368
---
M be/src/common/init.cc
M be/src/common/logging.cc
M be/src/common/logging.h
3 files changed, 31 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/71/4971/4
-- 
To view, visit http://gerrit.cloudera.org:8080/4971
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8c3229cbdb6275f969c15258c9ccab6efeb24368
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Donghui Xu 
Gerrit-Reviewer: Lars Volker 


[Impala-ASF-CR] IMPALA-1286: Extract common conjuncts from disjunctions.

2016-11-08 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-1286: Extract common conjuncts from disjunctions.
..


Patch Set 5:

I understand that it's harmless in that queries that succeeded before the patch 
won't start failing after the patch, but I'm concerned that a) queries that 
succeed with optimisation on will fail or behave differently when you turn 
optimisations off because the analyser picks a wider type and b) the lack of 
distinction between user-visible types and "internal" types will be a source of 
bugs and confusing behaviour.

There may only be a couple of cases where this is user-visible (CTAS and 
typeof() come to mind), so maybe there's a way to fix that there.

The longer-term design concern is if there aren't clear rules for determining 
the types of expressions that can be applied during the initial semantic 
analysis (e.g. BOOL || BOOL => BOOL), it makes it very hard to change the 
implementation of the frontend without changing the user-visible behaviour of 
the frontend.

-- 
To view, visit http://gerrit.cloudera.org:8080/4877
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3710: Kudu DML should ignore conflicts, pt2

2016-11-08 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-3710: Kudu DML should ignore conflicts, pt2
..


Patch Set 1:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/4985/1/be/src/exec/data-sink.cc
File be/src/exec/data-sink.cc:

Line 119: if (!dst_stats->__isset.kudu_stats) 
dst_stats->__set_kudu_stats(TKuduDmlStats());
> pass 0 in c'tor of TKuduDmlStats?
I can't bc there's no parameter, I think because I use optional fields.


Line 161:   ss << "NumRowErrors: " << stats.kudu_stats.num_row_errors << 
endl;
> General question: Is there an easy way to distinguish the different types o
It is unfortunate and probably not helpful enough, but I'm very worried about 
us presenting incorrect information to the user which I think is worse. The 
issue is that we can't trust the Kudu error codes enough. E.g. they re-use 
error codes for some different cases where I'm not confident enough that we'll 
always get it right. My goal is to work with them for the next release to get a 
more concrete contract. I was on the fence about having different Impala error 
codes at all, but I think it's worthwhile for de-duping the logs. If you think 
this is confusing, I'd prefer to just remove the warning logging w/ different 
error codes.


http://gerrit.cloudera.org:8080/#/c/4985/1/be/src/exec/kudu-table-sink.cc
File be/src/exec/kudu-table-sink.cc:

Line 214:   COUNTER_ADD(num_row_errors_, 1);
> This is expensive right? How about we update a local counter and COUNTER_AD
Done


Line 305: if (!e.IsNotFound() && !e.IsAlreadyPresent() && status.ok()) {
> Why not move to L325 into an else if?
Done


Line 310:   // Kudu does not have yet have a way to programmatically 
differentiate between 'row
> few words mixed up
Done


http://gerrit.cloudera.org:8080/#/c/4985/1/common/thrift/ImpalaInternalService.thrift
File common/thrift/ImpalaInternalService.thrift:

Line 430: // TODO: Refactor to reflect usage by other DML statements.
> Do you intend to do that soon? The structures are a little confusing now.
Yeah I'll go ahead and do this if you think the strategy is otherwise OK.


Line 445:   // The latest observed Kudu timestamp reported by the KuduSession 
at this partition.
> is partition == table here?
Hm yeah I guess it's not great to call this a partition... Table I think might 
be confusing too since there are many of these inserting across the cluster for 
the same table. How about I just remove 'at this partition'?


-- 
To view, visit http://gerrit.cloudera.org:8080/4985
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4eb1ad91dc355ea51de261c3a14df0f9d28c879c
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-1286: Extract common conjuncts from disjunctions.

2016-11-08 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-1286: Extract common conjuncts from disjunctions.
..


Patch Set 5:

I think we can probably describe the constant folding behaviour as some kind of 
dependent type system that distinguishes between constant expressions and 
non-constant expressions (and where the literal value is part of the logical 
type of the expression). This code essentially already is essentially treating 
"literal decimal" and "decimal" as different types: 
https://github.com/apache/incubator-impala/blob/9755ea63b50d23f323c84edb4307ce603ded4fe1/fe/src/main/java/org/apache/impala/analysis/Expr.java#L453

So there may be a reasonable way out of my second concern, and an argument that 
we've already created that problem.

-- 
To view, visit http://gerrit.cloudera.org:8080/4877
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4392: restore PeakMemoryUsage to DataSink profiles

2016-11-08 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-4392: restore PeakMemoryUsage to DataSink profiles
..


Patch Set 1:

why do the sinks' memtrackers need to be constructed by the PFE rather than the 
sink themselves?

-- 
To view, visit http://gerrit.cloudera.org:8080/4969
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iaa5db623a84c47d5904033ec26aece74f500a2c9
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4392: restore PeakMemoryUsage to DataSink profiles

2016-11-08 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4392: restore PeakMemoryUsage to DataSink profiles
..


Patch Set 1:

The only real reason is to avoid adding an extra layer of MemTrackers in the 
join nodes for the builders. That's also a viable alternative.

-- 
To view, visit http://gerrit.cloudera.org:8080/4969
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iaa5db623a84c47d5904033ec26aece74f500a2c9
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4438: Serialize test failpoints.py to reduce memory pressure

2016-11-08 Thread Jim Apple (Code Review)
Jim Apple has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/4999

Change subject: IMPALA-4438: Serialize test_failpoints.py to reduce memory 
pressure
..

IMPALA-4438: Serialize test_failpoints.py to reduce memory pressure

On EC2 c3.4xlarge instances, with 8cores and 30GB RAM, this test could
trigger the Linux OOM killer by running tests in parallel. This patch
switches to serial execution, which makes the test take four minutes,
rather than one to two minutes.

Change-Id: Iea4a588e1228d38f90387a077cbe530257636b7d
---
M tests/failure/test_failpoints.py
1 file changed, 2 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/4999/1
-- 
To view, visit http://gerrit.cloudera.org:8080/4999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Iea4a588e1228d38f90387a077cbe530257636b7d
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 


[Impala-ASF-CR] IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions

2016-11-08 Thread Tim Armstrong (Code Review)
Hello Dan Hecht,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4655

to look at the new patch set (#14).

Change subject: IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions
..

IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions

This change enables codegen for all builtin aggregate functions,
e.g. timestamp functions and group_concat.

There are several parts to the change:
* Adding support for generic UDAs. Previous the codegen code did not
  handle multiple input arguments or NULL return values.
* Defaulting to using the UDA interface when there is not a special
  codegen path (we have implementations of all builtin aggregate
  functions for the interpreted path).
* Remove all the logic to disable codegen for the special cases that now
  are supported.

Also fix the generation of code to get/set NULL bits since I needed
to add functionality there anyway.

Testing:
Add tests that check that codegen was enabled for builtin aggregate
functions. Also fix some gaps in the preexisting tests.

Also add tests for UDAs that check input/output nulls are handled
correctly, in anticipation of enabling codegen for arbitrary UDAs.
The tests are run with both codegen enabled and disabled. To avoid
flaky tests, we switch the UDF tests to use "unique_database".

Perf:
Ran local TPC-H and targeted perf. Spent a lot of time on TPC-H Q1,
since my original approach regressed it ~5%. In the end the problem was
to do with the ordering of loads/stores to the slot and null bit in the
generated code: the previous version of the code exploited some
properties of the particular aggregate function. I ended up replicating
this behaviour to avoid regressing perf.

Change-Id: Id9dc21d1d676505d3617e1e4f37557397c4fb260
---
M be/src/benchmarks/hash-benchmark.cc
M be/src/codegen/codegen-anyval.cc
M be/src/codegen/codegen-anyval.h
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/llvm-codegen-test.cc
M be/src/codegen/llvm-codegen.cc
M be/src/codegen/llvm-codegen.h
M be/src/exec/aggregation-node.cc
M be/src/exec/exec-node.cc
M be/src/exec/hash-join-node.cc
M be/src/exec/hash-table.cc
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-scanner.cc
M be/src/exec/old-hash-table.cc
M be/src/exec/partitioned-aggregation-node-ir.cc
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-aggregation-node.h
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/text-converter.cc
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/case-expr.cc
M be/src/exprs/compound-predicates.cc
M be/src/exprs/expr-codegen-test.cc
M be/src/exprs/expr.cc
M be/src/exprs/literal.cc
M be/src/exprs/null-literal.cc
M be/src/exprs/scalar-fn-call.cc
M be/src/exprs/slot-ref.cc
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/tuple.cc
M be/src/runtime/types.h
M be/src/testutil/test-udas.cc
M be/src/testutil/test-udfs.cc
M be/src/util/tuple-row-compare.cc
M testdata/workloads/functional-query/queries/QueryTest/java-udf.test
M 
testdata/workloads/functional-query/queries/QueryTest/libs_with_same_filenames.test
M testdata/workloads/functional-query/queries/QueryTest/load-java-udfs.test
M testdata/workloads/functional-query/queries/QueryTest/uda-mem-limit.test
M testdata/workloads/functional-query/queries/QueryTest/uda.test
A 
testdata/workloads/functional-query/queries/QueryTest/udf-codegen-required.test
M testdata/workloads/functional-query/queries/QueryTest/udf-errors.test
M testdata/workloads/functional-query/queries/QueryTest/udf-mem-limit.test
M testdata/workloads/functional-query/queries/QueryTest/udf.test
M tests/common/test_result_verifier.py
M tests/query_test/test_aggregation.py
M tests/query_test/test_udfs.py
47 files changed, 1,037 insertions(+), 800 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/4655/14
-- 
To view, visit http://gerrit.cloudera.org:8080/4655
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id9dc21d1d676505d3617e1e4f37557397c4fb260
Gerrit-PatchSet: 14
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4392: restore PeakMemoryUsage to DataSink profiles

2016-11-08 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-4392: restore PeakMemoryUsage to DataSink profiles
..


Patch Set 1: Code-Review+2

I see.  Since we're going to have this extra layer eventually anyway (when 
using the builder as a true sink) and it seems it'll be simpler that way, 
consider doing that alternative.  But I don't feel too strongly about it, I'm 
also okay with this fix in the meantime.

-- 
To view, visit http://gerrit.cloudera.org:8080/4969
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iaa5db623a84c47d5904033ec26aece74f500a2c9
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions

2016-11-08 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions
..


Patch Set 14: Code-Review+2

Rebase, carry +2

-- 
To view, visit http://gerrit.cloudera.org:8080/4655
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id9dc21d1d676505d3617e1e4f37557397c4fb260
Gerrit-PatchSet: 14
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] Add -build shared libs for default build for speed.

2016-11-08 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: Add -build_shared_libs for default build for speed.
..


Patch Set 1:

I'm fine with this change, but I remember Dan had some reservations when this 
change was initially proposed a long time ago. Dan?

-- 
To view, visit http://gerrit.cloudera.org:8080/4970
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic83db07e59ff339dcce7362bd296ebcfd60b71d6
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-HasComments: No


[Impala-ASF-CR] Improve Kudu UPSERT test coverage

2016-11-08 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change.

Change subject: Improve Kudu UPSERT test coverage
..


Patch Set 3:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/4953/2//COMMIT_MSG
Commit Message:

PS2, Line 9: relea
> release
Done


PS2, Line 10: 2.8,
> 2.8
Done


PS2, Line 9: In preparation for the public release of Kudu integration in
   : 2.8, we need to make sure that we've covered as much of the
   : Kudu related functionality with tests as possible.
   : 
> Not really needed in the commit.
Done


http://gerrit.cloudera.org:8080/#/c/4953/1/testdata/workloads/functional-query/queries/QueryTest/kudu_crud.test
File testdata/workloads/functional-query/queries/QueryTest/kudu_crud.test:

PS1, Line 680: 
 : 
 : 
 : 
> Ah, makes sense. Thanks for looking into that, I think it's fine. We can th
IMPALA-4445


http://gerrit.cloudera.org:8080/#/c/4953/2/tests/common/impala_test_suite.py
File tests/common/impala_test_suite.py:

PS2, Line 356: select * from %s
> might be good to put a limit on here for safety, e.g. the number of rows sp
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4953
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib9e7afbef60186edb00a9d11fbe5a8c64931add6
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] Improve Kudu UPSERT test coverage

2016-11-08 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has uploaded a new patch set (#3).

Change subject: Improve Kudu UPSERT test coverage
..

Improve Kudu UPSERT test coverage

In preparation for the public release of Kudu integration in
2.8, we need to make sure that we've covered as much of the
Kudu related functionality with tests as possible.

This patch also introduces a new test section 'DML_RESULTS', which
takes the name of a table as a comment and the contents of the
table as its body and then verifies that the body matches the
actual contents of the table. This makes it easy to check that a
DML operation has the desired effect on the contents of a table,
rather than always having to add another test case that runs a
select on the table.

Change-Id: Ib9e7afbef60186edb00a9d11fbe5a8c64931add6
---
M testdata/workloads/functional-query/queries/QueryTest/kudu_crud.test
A testdata/workloads/functional-query/queries/QueryTest/kudu_upsert.test
M tests/common/impala_test_suite.py
M tests/common/test_result_verifier.py
M tests/query_test/test_kudu.py
M tests/util/test_file_parser.py
6 files changed, 498 insertions(+), 66 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/4953/3
-- 
To view, visit http://gerrit.cloudera.org:8080/4953
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib9e7afbef60186edb00a9d11fbe5a8c64931add6
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] Add -build shared libs for default build for speed.

2016-11-08 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: Add -build_shared_libs for default build for speed.
..


Patch Set 1: Code-Review+1

I'm ok with this as long as we have automated test coverage of this script (in 
case the shared libs build breaks) and we stick with using static linking for 
all automated testing (since that's the configuration of Impala that is 
deployed).

-- 
To view, visit http://gerrit.cloudera.org:8080/4970
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic83db07e59ff339dcce7362bd296ebcfd60b71d6
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] Add -build shared libs for default build for speed.

2016-11-08 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: Add -build_shared_libs for default build for speed.
..


Patch Set 1:

> I'm ok with this as long as we have automated test coverage of this
 > script (in case the shared libs build breaks) and we stick with
 > using static linking for all automated testing (since that's the
 > configuration of Impala that is deployed).

We do have automated coverage. I would love it if 

I'm sure many people deploy with -release and without -so, but it wouldn't 
surprise me if others ship with -so.

-- 
To view, visit http://gerrit.cloudera.org:8080/4970
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic83db07e59ff339dcce7362bd296ebcfd60b71d6
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] Add -build shared libs for default build for speed.

2016-11-08 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: Add -build_shared_libs for default build for speed.
..


Patch Set 1:

> > I'm fine with this change, but I remember Dan had some
 > reservations
 > > when this change was initially proposed a long time ago. Dan?
 > 
 > My concern was not with this change, but related. It was with a
 > change that flipped to using shared libs everywhere, and my concern
 > is that we should test something more similar to what we actually
 > ship. i.e. tests and perf should still use static linking.

We don't ship any binaries. The only release artifacts to Apache Impala 
(incubating) are source code tarballs.

-- 
To view, visit http://gerrit.cloudera.org:8080/4970
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic83db07e59ff339dcce7362bd296ebcfd60b71d6
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] Add -build shared libs for default build for speed.

2016-11-08 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: Add -build_shared_libs for default build for speed.
..


Patch Set 1:

> I'm fine with this change, but I remember Dan had some reservations
 > when this change was initially proposed a long time ago. Dan?

My concern was not with this change, but related. It was with a change that 
flipped to using shared libs everywhere, and my concern is that we should test 
something more similar to what we actually ship. i.e. tests and perf should 
still use static linking.

For dev bootstrapping, I'm fine with this.

-- 
To view, visit http://gerrit.cloudera.org:8080/4970
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic83db07e59ff339dcce7362bd296ebcfd60b71d6
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4438: Serialize test failpoints.py to reduce memory pressure

2016-11-08 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-4438: Serialize test_failpoints.py to reduce memory 
pressure
..


Patch Set 1: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/4999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iea4a588e1228d38f90387a077cbe530257636b7d
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Alex Behm 
Gerrit-HasComments: No


[Impala-ASF-CR] Add -build shared libs for default build for speed.

2016-11-08 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: Add -build_shared_libs for default build for speed.
..


Patch Set 1:

Good point. I was curious about what others did. It looks like MapR also ships 
a statically-linked binary: 
http://package.mapr.com/releases/ecosystem-5.x/redhat/mapr-impala-2.5.0.20160631-1.noarch.rpm

Not sure what other downstream users do, but I'd guess that deploying a single 
binary is most common (since it's easiest and what our build scripts do by 
default).

-- 
To view, visit http://gerrit.cloudera.org:8080/4970
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic83db07e59ff339dcce7362bd296ebcfd60b71d6
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4438: Serialize test failpoints.py to reduce memory pressure

2016-11-08 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-4438: Serialize test_failpoints.py to reduce memory 
pressure
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4999/1/tests/failure/test_failpoints.py
File tests/failure/test_failpoints.py:

Line 118:   @pytest.mark.execute_serially
Please add a comment explaining why this should be run serially, and that it's 
not really worth running in parallel.


-- 
To view, visit http://gerrit.cloudera.org:8080/4999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iea4a588e1228d38f90387a077cbe530257636b7d
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Alex Behm 
Gerrit-HasComments: Yes


[Impala-ASF-CR] Add -build shared libs for default build for speed.

2016-11-08 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: Add -build_shared_libs for default build for speed.
..


Patch Set 1: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/4970
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic83db07e59ff339dcce7362bd296ebcfd60b71d6
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4446: expr-test fails under ASAN

2016-11-08 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-4446: expr-test fails under ASAN
..


Patch Set 1: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/5000
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0ac10d34dd6463ab52e41de1002ef065cfe63a20
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4446: expr-test fails under ASAN

2016-11-08 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-4446: expr-test fails under ASAN
..


Patch Set 1: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5000/1/be/src/exprs/like-predicate.cc
File be/src/exprs/like-predicate.cc:

PS1, Line 357: string pattern_str(reinterpret_cast(pattern_value.ptr), pattern_value.len);
long line - clang-format?


-- 
To view, visit http://gerrit.cloudera.org:8080/5000
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0ac10d34dd6463ab52e41de1002ef065cfe63a20
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Henry Robinson 
Gerrit-HasComments: Yes


[Impala-ASF-CR] Add -build shared libs for default build for speed.

2016-11-08 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: Add -build_shared_libs for default build for speed.
..


Patch Set 2: Code-Review+2

rebase, carry +2

-- 
To view, visit http://gerrit.cloudera.org:8080/4970
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic83db07e59ff339dcce7362bd296ebcfd60b71d6
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] Preview: IMPALA-4363: Add timestamp validation

2016-11-08 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: Preview: IMPALA-4363: Add timestamp validation
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4968/2/common/thrift/generate_error_codes.py
File common/thrift/generate_error_codes.py:

Line 308:"File '$0' column '$1' contains invalid timestamp."),
> I think if anyhow possible it would be nice to output the invalid value its
We generally avoid dumping data for security reasons. It's hard to redact and 
can lead to very uncomfortable conversations with users.


-- 
To view, visit http://gerrit.cloudera.org:8080/4968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4446: expr-test fails under ASAN

2016-11-08 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/5000

Change subject: IMPALA-4446: expr-test fails under ASAN
..

IMPALA-4446: expr-test fails under ASAN

Various places in the LikePredicate code assumed StringVal is
null-terminated. There is no such guarantee. By coincidence string
literals were sometimes backed by std::string storage that was
null-terminated, so this bug was latent until recently.

Testing:
Was able to reproduce the failure locally under ASAN, now the test
passes. Running the full ASAN tests to verify, but putting this up
for review first to unbreak the build sooner.

Change-Id: I0ac10d34dd6463ab52e41de1002ef065cfe63a20
---
M be/src/exprs/like-predicate.cc
1 file changed, 10 insertions(+), 13 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/5000/1
-- 
To view, visit http://gerrit.cloudera.org:8080/5000
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I0ac10d34dd6463ab52e41de1002ef065cfe63a20
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4438: Serialize test failpoints.py to reduce memory pressure

2016-11-08 Thread Jim Apple (Code Review)
Hello Alex Behm,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4999

to look at the new patch set (#2).

Change subject: IMPALA-4438: Serialize test_failpoints.py to reduce memory 
pressure
..

IMPALA-4438: Serialize test_failpoints.py to reduce memory pressure

On EC2 c3.4xlarge instances, with 8cores and 30GB RAM, this test could
trigger the Linux OOM killer by running tests in parallel. This patch
switches to serial execution, which makes the test take four minutes,
rather than one to two minutes.

Change-Id: Iea4a588e1228d38f90387a077cbe530257636b7d
---
M tests/failure/test_failpoints.py
1 file changed, 4 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/4999/2
-- 
To view, visit http://gerrit.cloudera.org:8080/4999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iea4a588e1228d38f90387a077cbe530257636b7d
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Alex Behm 


[Impala-ASF-CR] IMPALA-4438: Serialize test failpoints.py to reduce memory pressure

2016-11-08 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: IMPALA-4438: Serialize test_failpoints.py to reduce memory 
pressure
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4999/1/tests/failure/test_failpoints.py
File tests/failure/test_failpoints.py:

Line 118:   # Run serially because this can create enough memory pressure to 
invoke the Linux OOM
> Please add a comment explaining why this should be run serially, and that i
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iea4a588e1228d38f90387a077cbe530257636b7d
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4438: Serialize test failpoints.py to reduce memory pressure

2016-11-08 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: IMPALA-4438: Serialize test_failpoints.py to reduce memory 
pressure
..


Patch Set 3: Code-Review+2

rebase, carry Alex's +2

-- 
To view, visit http://gerrit.cloudera.org:8080/4999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iea4a588e1228d38f90387a077cbe530257636b7d
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Jim Apple 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-1286: Extract common conjuncts from disjunctions.

2016-11-08 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-1286: Extract common conjuncts from disjunctions.
..


Patch Set 5:

I don't agree with the argument that the new behavior with constant folding 
will be less intuitive. I think it will be more user friendly.

In your example, of CTAS I think a reasonable user can expect:

insert into t select 1 + 1

to work if t has a tinyint column.

I think the only way we can satisfy your design goal is to always defer to the 
widest type, i.e. 1 + 1 -> smallint.

For simple cases we can fold in place, but with more complicated examples we 
have to substitute against inline view expressions to even realize something is 
constant, so that doesn't seem feasible to me.

-- 
To view, visit http://gerrit.cloudera.org:8080/4877
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] Improve Kudu UPSERT test coverage

2016-11-08 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: Improve Kudu UPSERT test coverage
..


Patch Set 2:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/4953/2/tests/common/impala_test_suite.py
File tests/common/impala_test_suite.py:

PS2, Line 341: 
 : s
Right now I think this would need more work to support both a RESULTS section 
and a DML_RESULTS section, see my comments in test_result_verifier.py.
For now, let's assert in here (i.e. RESULTS is set) that DML_RESULTS isn't also 
set to make sure a test case doesn't try to specify both.


PS2, Line 357: target_impalad_client = choice(target_impalad_clients)
remove; this is already defined above on l310


PS2, Line 359: self.__verify_results_and_errors(vector, test_section, 
dml_result, use_db,
 : 'DML_RESULTS')
1) add a DCHECK that ERRORS isn't specified because we shouldn't be checking 
them against the DML_RESULTS query.

2) Revert __verify_results_and_errors and instead just call the 
verify_raw_results method directly:

verify_raw_results(test_section, result, 
vector.get_value('table_format').file_format,
   pytest.config.option.update_results,
   replace_filenames_with_placeholder, results_section)


http://gerrit.cloudera.org:8080/#/c/4953/2/tests/common/test_result_verifier.py
File tests/common/test_result_verifier.py:

PS2, Line 288: verify_raw_results
Right now this gets confusing when there are DML_RESULTS and other sections, 
e.g. ERRORS, TYPES, etc., anything checked in this method. This code ends up 
assuming that any other section applies to 'results_section'. That happens to 
work right now because none of the test cases have both 'RESULTS' and 
'DML_RESULTS'. If there were an ERRORS section that is intended to apply to the 
actual DML stmt, then checking the table results after in the DML_RESULTS 
section would break since it wouldn't have those errors.

The best thing to do would be to split apart some of this functionality since 
it's doing a bunch of stuff (we already have verify_results() and 
verify_errors()), then have a separate method that can handle verifying dml 
results nicely, e.g. all of the sections including RESULTS _and_ DML_RESULTS, 
where ERRORS applies to RESULTS; TYPES/LABELS applies to DML_RESULTS.

However, in the interest of getting tests in sooner rather than later, we 
should at least ensure that this isn't misused and it's commented well enough.

The comment in the header should:
1) make it clear that this is checking ERRORS, TYPES, LABELS, and 
result_section against the exec_result. Mention that result_section is a 
parameter for DML tests so a follow-up SELECT can verify the final state of the 
table.
2) leave a TODO to split this fn up and have better tailored methods for 
checking regular select query results vs checking DML results.


-- 
To view, visit http://gerrit.cloudera.org:8080/4953
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib9e7afbef60186edb00a9d11fbe5a8c64931add6
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4433: Fix undefined NDV calculations

2016-11-08 Thread Jim Apple (Code Review)
Jim Apple has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/5004

Change subject: IMPALA-4433: Fix undefined NDV calculations
..

IMPALA-4433: Fix undefined NDV calculations

GCC's __builtin_ctz[l[l]] functions return undefined results when the
argument is 0. This patch handles that 0 case.

This changes the metadata for the test data, because ctz is used in
HLL++, which is used as the NDV computation in compute stats.

Change-Id: I8460bc3f7e510ce07b8673387c9440accc432abe
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/udf_samples/hyperloglog-uda.cc
M be/src/util/bit-util.h
M 
testdata/workloads/functional-query/queries/QueryTest/alter-table-set-column-stats.test
M 
testdata/workloads/functional-query/queries/QueryTest/compute-stats-incremental.test
M testdata/workloads/functional-query/queries/QueryTest/compute-stats.test
M testdata/workloads/functional-query/queries/QueryTest/show-stats.test
M testdata/workloads/functional-query/queries/QueryTest/truncate-table.test
8 files changed, 43 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/5004/1
-- 
To view, visit http://gerrit.cloudera.org:8080/5004
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I8460bc3f7e510ce07b8673387c9440accc432abe
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 


[Impala-ASF-CR] IMPALA-1286: Extract common conjuncts from disjunctions.

2016-11-08 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-1286: Extract common conjuncts from disjunctions.
..


Patch Set 5: Code-Review+1

I was less concerned with what the output type is and more about how it is 
obtained. The problem is that we don't have any comprehensible rules about how 
the output type of an expression is obtained or what the input to the type 
derivation algorithm even is. In my mind a query option shouldn't be part of 
the input to the type derivation. 

It's all a consequence of a pre-existing design so I'm not saying you should 
fix it as part of this change.

I think the original sin is that the output type of an operation should be 
determined by the input types, because that gives you a reasonable way to trace 
back and understand how the type of an expression was derived. This isn't the 
case today since it looks at other information and maybe depends in complicate 
way on other rewrites.

My point about dependent types is that you can achieve what you're arguing for 
without making typechecking depend on a query optimisation. E.g. the logical 
types would be "LITERAL(1) + LITERAL(1) => LITERAL(2)" instead of "TINYINT + 
TINYINT => SMALLINT) and they would be lowered at some point into actual 
physical types. The current system of typing literals seems to be trying to 
approximate this.

I'm not saying we shouldn't go ahead, but I think there are design problems 
with the type system and I think this is digging the hole a little deeper. I'm 
ok as long as we have a way to replace it with a more principled approach later 
without breaking things. It seems like it's mostly corner cases that would be 
affected in the meantime and there probably is a way out.

-- 
To view, visit http://gerrit.cloudera.org:8080/4877
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4392: restore PeakMemoryUsage to DataSink profiles

2016-11-08 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4392: restore PeakMemoryUsage to DataSink profiles
..


Patch Set 2:

I switched to the alternative approach - seems to work out a bit simpler.

-- 
To view, visit http://gerrit.cloudera.org:8080/4969
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iaa5db623a84c47d5904033ec26aece74f500a2c9
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4392: restore PeakMemoryUsage to DataSink profiles

2016-11-08 Thread Tim Armstrong (Code Review)
Hello Dan Hecht,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4969

to look at the new patch set (#2).

Change subject: IMPALA-4392: restore PeakMemoryUsage to DataSink profiles
..

IMPALA-4392: restore PeakMemoryUsage to DataSink profiles

The join build sink patches refactored the DataSink interface and
inadvertently removed this counter from the profile.
The problem was that the sink MemTracker was not initialized with the
sink's profile.

The fix is for the sink to create the MemTracker itself.

Testing:
Ran core tests. Manually checked profile to make sure the counter
appeared in HdfsTableSink, DataStreamSender, etc.

Change-Id: Iaa5db623a84c47d5904033ec26aece74f500a2c9
---
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/hbase-table-sink.cc
M be/src/exec/hbase-table-sink.h
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/exec/kudu-table-sink.cc
M be/src/exec/kudu-table-sink.h
M be/src/exec/nested-loop-join-builder.cc
M be/src/exec/nested-loop-join-builder.h
M be/src/exec/nested-loop-join-node.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-builder.h
M be/src/exec/plan-root-sink.cc
M be/src/exec/row-batch-cache.h
M be/src/runtime/data-stream-sender.cc
M be/src/runtime/data-stream-sender.h
M be/src/runtime/plan-fragment-executor.cc
M be/src/runtime/plan-fragment-executor.h
19 files changed, 66 insertions(+), 76 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/4969/2
-- 
To view, visit http://gerrit.cloudera.org:8080/4969
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iaa5db623a84c47d5904033ec26aece74f500a2c9
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4446: expr-test fails under ASAN

2016-11-08 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4446: expr-test fails under ASAN
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5000/1/be/src/exprs/like-predicate.cc
File be/src/exprs/like-predicate.cc:

PS1, Line 357: string pattern_str(reinterpret_cast(pattern_value.ptr), pattern_value.len);
> long line - clang-format?
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5000
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0ac10d34dd6463ab52e41de1002ef065cfe63a20
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3710: Kudu DML should ignore conflicts by default

2016-11-08 Thread Internal Jenkins (Code Review)
Internal Jenkins has submitted this change and it was merged.

Change subject: IMPALA-3710: Kudu DML should ignore conflicts by default
..


IMPALA-3710: Kudu DML should ignore conflicts by default

Removes the non-standard IGNORE syntax that was allowed for
DML into Kudu tables to indicate that certain errors should
be ignored, i.e. not fail the query and continue. However,
because there is no way to 'roll back' mutations that
occurred before an error occurs, tables are left in an
inconsistent state and it's difficult to know what rows were
successfully modified vs which rows were not. Instead, this
change makes it so that we always 'ignore' these conflicts,
i.e. a 'best effort'. In the future, when Kudu will provide
the mechanisms Impala needs to provide a notion of isolation
levels, then Impala will be able to provide options for more
traditional semantics.

After this change, the following errors are ignored:
* INSERT where the PK already exists
* UPDATE/DELETE where the PK doesn't exist

Another follow-up patch will change other violations to be
handled in this way as well, e.g. nulls inserted in
non-nullable cols.

Reporting:
The number of rows inserted is reported to the coordinator,
which makes the aggregate available to the shell and via the
profile.
TODO: Return rows modified for INSERT via HS2 (IMPALA-1789).
TODO: Return rows modified for other CRUD (beeswax+hs2) (IMPALA-3713).
TODO: Return error counts for specific warnings (IMPALA-4416).

Testing:
Updated tests. Ran all functional tests. More tests will be
needed when other conflicts are handled in the same way.

Change-Id: I83b5beaa982d006da4997a2af061ef7c22cad3f1
Reviewed-on: http://gerrit.cloudera.org:8080/4911
Reviewed-by: Alex Behm 
Tested-by: Internal Jenkins
---
M be/src/exec/kudu-table-sink.cc
M be/src/exec/kudu-table-sink.h
M common/thrift/DataSinks.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/DeleteStmt.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java
M fe/src/main/java/org/apache/impala/analysis/UpdateStmt.java
M fe/src/main/java/org/apache/impala/planner/KuduTableSink.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeModifyStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/kudu-delete.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_crud.test
17 files changed, 66 insertions(+), 200 deletions(-)

Approvals:
  Internal Jenkins: Verified
  Alex Behm: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/4911
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I83b5beaa982d006da4997a2af061ef7c22cad3f1
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Matthew Jacobs 


[Impala-ASF-CR] IMPALA-4258: Remove duplicated and unused test macros

2016-11-08 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4258: Remove duplicated and unused test macros
..


Patch Set 2: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/4917
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08539d7e46b89d7e0a4338510b65f9867814c275
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3710: Kudu DML should ignore conflicts by default

2016-11-08 Thread Internal Jenkins (Code Review)
Internal Jenkins has posted comments on this change.

Change subject: IMPALA-3710: Kudu DML should ignore conflicts by default
..


Patch Set 5: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/4911
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I83b5beaa982d006da4997a2af061ef7c22cad3f1
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4446: expr-test fails under ASAN

2016-11-08 Thread Tim Armstrong (Code Review)
Hello Henry Robinson, Alex Behm,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5000

to look at the new patch set (#2).

Change subject: IMPALA-4446: expr-test fails under ASAN
..

IMPALA-4446: expr-test fails under ASAN

Various places in the LikePredicate code assumed StringVal is
null-terminated. There is no such guarantee. By coincidence string
literals were sometimes backed by std::string storage that was
null-terminated, so this bug was latent until recently.

Testing:
Was able to reproduce the failure locally under ASAN, now the test
passes. Running the full ASAN tests to verify, but putting this up
for review first to unbreak the build sooner.

Change-Id: I0ac10d34dd6463ab52e41de1002ef065cfe63a20
---
M be/src/exprs/like-predicate.cc
1 file changed, 15 insertions(+), 15 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/5000/2
-- 
To view, visit http://gerrit.cloudera.org:8080/5000
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0ac10d34dd6463ab52e41de1002ef065cfe63a20
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4446: expr-test fails under ASAN

2016-11-08 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4446: expr-test fails under ASAN
..


Patch Set 3: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/5000
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0ac10d34dd6463ab52e41de1002ef065cfe63a20
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4392: restore PeakMemoryUsage to DataSink profiles

2016-11-08 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-4392: restore PeakMemoryUsage to DataSink profiles
..


Patch Set 2: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/4969
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iaa5db623a84c47d5904033ec26aece74f500a2c9
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-1286: Extract common conjuncts from disjunctions.

2016-11-08 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-1286: Extract common conjuncts from disjunctions.
..


Patch Set 5:

Thanks for the explanation and your patience. I'm certainly open to learning 
more about a cleaner design.

For the most part, the output type of an Expr is determined by its input types.

Sounds like we need to sit together and flesh out the future direction a little 
more. I'm still not sure I follow how your example would solve the CTAS 
problem. Can a LITERAL(2) be inserted into a column with a TINYINT column?

I think there is actually a more severe problem than the ones you identified. 
Expr optimizations can potentially change the result of a query. I believe this 
is true even with our existing expr substitution logic. Here's one example:

bool_col OR bool_col --> bool_col

The original Expr will return FALSE when bool_col is NULL, but the new Expr 
will return NULL. We can come up with other such examples.
Dealing with NULLs is generally tricky when transforming Exprs.

-- 
To view, visit http://gerrit.cloudera.org:8080/4877
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-1286: Extract common conjuncts from disjunctions.

2016-11-08 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-1286: Extract common conjuncts from disjunctions.
..


Patch Set 6: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/4877
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4447: Rein in overly broad sed that dirties the tree

2016-11-08 Thread Jim Apple (Code Review)
Jim Apple has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/5008

Change subject: IMPALA-4447: Rein in overly broad sed that dirties the tree
..

IMPALA-4447: Rein in overly broad sed that dirties the tree

This patch fixes a sed expression to make sure it only laters the code
it is meant to alter, not the comment describing the code.

Tested with tests/run-tests.py query_test/test_udfs.py

Change-Id: I51a0498d24b7fccc05b6183123501766cb36f85e
---
M testdata/bin/copy-udfs-udas.sh
1 file changed, 4 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/08/5008/1
-- 
To view, visit http://gerrit.cloudera.org:8080/5008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I51a0498d24b7fccc05b6183123501766cb36f85e
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 


[Impala-ASF-CR] IMPALA-3710: Kudu DML should ignore conflicts, pt2

2016-11-08 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-3710: Kudu DML should ignore conflicts, pt2
..


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/4985/1/be/src/exec/data-sink.cc
File be/src/exec/data-sink.cc:

Line 161:   ss << "NumRowErrors: " << stats.kudu_stats.num_row_errors << 
endl;
> It is unfortunate and probably not helpful enough, but I'm very worried abo
Agree that misinformation is worse. Can you add a comment to the counter 
similar to what you responded here? Just as an explanation why all errors are 
purposely aggregated.


http://gerrit.cloudera.org:8080/#/c/4985/1/common/thrift/ImpalaInternalService.thrift
File common/thrift/ImpalaInternalService.thrift:

Line 430: // TODO: Refactor to reflect usage by other DML statements.
> Yeah I'll go ahead and do this if you think the strategy is otherwise OK.
I think it makes sense to do this non-functional rename/refactoring in a 
separate patch.


Line 445:   // The latest observed Kudu timestamp reported by the KuduSession 
at this partition.
> Hm yeah I guess it's not great to call this a partition... Table I think mi
reported by the local KuduSession?


-- 
To view, visit http://gerrit.cloudera.org:8080/4985
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4eb1ad91dc355ea51de261c3a14df0f9d28c879c
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4447: Rein in overly broad sed that dirties the tree

2016-11-08 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4447: Rein in overly broad sed that dirties the tree
..


Patch Set 1: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/5008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I51a0498d24b7fccc05b6183123501766cb36f85e
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-1286: Extract common conjuncts from disjunctions.

2016-11-08 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-1286: Extract common conjuncts from disjunctions.
..


Patch Set 6:

I think there would have to be some distinction between logical types and 
physical types. E.g. during analysis we'd assign logical types like LITERAL(2) 
to the expression, which mapped to physical types like TINYINT for actual 
execution.

-- 
To view, visit http://gerrit.cloudera.org:8080/4877
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] Improve Kudu UPSERT test coverage

2016-11-08 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change.

Change subject: Improve Kudu UPSERT test coverage
..


Patch Set 4:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/4953/2/tests/common/impala_test_suite.py
File tests/common/impala_test_suite.py:

PS2, Line 341:  test_section
 : s
> Right now I think this would need more work to support both a RESULTS secti
Done


PS2, Line 357: dml_results_query = "select * from %s limit 1000" % \
> remove; this is already defined above on l310
Done


PS2, Line 359: dml_result = self.__execute_query(target_impalad_client, 
dml_results_query)
 : verify_raw_results
> 1) add a DCHECK that ERRORS isn't specified because we shouldn't be checkin
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4953
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib9e7afbef60186edb00a9d11fbe5a8c64931add6
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] Improve Kudu UPSERT test coverage

2016-11-08 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has uploaded a new patch set (#4).

Change subject: Improve Kudu UPSERT test coverage
..

Improve Kudu UPSERT test coverage

In preparation for the public release of Kudu integration in
2.8, we need to make sure that we've covered as much of the
Kudu related functionality with tests as possible.

This patch also introduces a new test section 'DML_RESULTS', which
takes the name of a table as a comment and the contents of the
table as its body and then verifies that the body matches the
actual contents of the table. This makes it easy to check that a
DML operation has the desired effect on the contents of a table,
rather than always having to add another test case that runs a
select on the table. For now, this section cannot be used in a
test along with the RESULTS or ERRORS sections.

Change-Id: Ib9e7afbef60186edb00a9d11fbe5a8c64931add6
---
M testdata/workloads/functional-query/queries/QueryTest/kudu_crud.test
A testdata/workloads/functional-query/queries/QueryTest/kudu_upsert.test
M tests/common/impala_test_suite.py
M tests/common/test_result_verifier.py
M tests/query_test/test_kudu.py
M tests/util/test_file_parser.py
6 files changed, 502 insertions(+), 63 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/4953/4
-- 
To view, visit http://gerrit.cloudera.org:8080/4953
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib9e7afbef60186edb00a9d11fbe5a8c64931add6
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-4433: Fix undefined NDV calculations

2016-11-08 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4433: Fix undefined NDV calculations
..


Patch Set 1:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/5004/1//COMMIT_MSG
Commit Message:

Line 10: argument is 0. This patch handles that 0 case.
Might be good to mention that varies depending on the architecture.


http://gerrit.cloudera.org:8080/#/c/5004/1/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

Line 1175:   if (hash_value != 0) {
Same comment as the other place - I think we can remove this branch.


http://gerrit.cloudera.org:8080/#/c/5004/1/be/src/udf_samples/hyperloglog-uda.cc
File be/src/udf_samples/hyperloglog-uda.cc:

Line 75:   if (hash_value != 0) {
Let's remove this branch while we're at it. It doesn't make sense to me why 
it's there in the first place and I don't think it's consequential (if you 
think about it, getting a hash value of 0 should be extremely rare and not 
affect the final probabilistic result).

I wonder if the branch was originally an attempt to work around the problem 
you're fixing.


PS1, Line 81: hash_top_bits ? __builtin_ctzll(hash_top_bits) :
:  (sizeof(hash_value) * CHAR_BIT - 
HLL_PRECISION)
Can't we use the builtin you


-- 
To view, visit http://gerrit.cloudera.org:8080/5004
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8460bc3f7e510ce07b8673387c9440accc432abe
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Amos Bird 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3200: Implement suballocator for splitting buffers

2016-11-08 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#4).

Change subject: IMPALA-3200: Implement suballocator for splitting buffers
..

IMPALA-3200: Implement suballocator for splitting buffers

This is useful for situations like hash tables, where we want to
make multiple non-spillable allocations of variable size from buffer
pool memory and not incur the overhead of interacting with the global
buffer pool. The allocator subdivides buffers to service allocations
and uses a buddy allocation algorithm to merge freed allocations into
larger chunks. This helps avoid fragmentation and is quite effective
at reusing memory given the typical doubling allocation patterns of
hash tables in partitioned aggs and joins.

Testing:
The allocator has fairly robust internal consistency checks via
assertions and unique_ptrs. Includes a unit test that exercises
various allocation patterns.

I tested porting hash tables over to allocate memory using an earlier
version of the suballocator, which worked well (was able to run a
wide range of queries successfully with good performance).

Change-Id: I8bfe0e429f67ad273f7c7d0816703a9e6c3da788
---
M be/src/bufferpool/CMakeLists.txt
A be/src/bufferpool/suballocator-test.cc
A be/src/bufferpool/suballocator.cc
A be/src/bufferpool/suballocator.h
M be/src/common/names.h
5 files changed, 788 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/4715/4
-- 
To view, visit http://gerrit.cloudera.org:8080/4715
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8bfe0e429f67ad273f7c7d0816703a9e6c3da788
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4433: Fix undefined NDV calculations

2016-11-08 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: IMPALA-4433: Fix undefined NDV calculations
..


Patch Set 2:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/5004/1//COMMIT_MSG
Commit Message:

Line 10: argument is 0. This patch handles that 0 case, which can depend on the
> Might be good to mention that varies depending on the architecture.
Done


http://gerrit.cloudera.org:8080/#/c/5004/1/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

Line 1175:   // Use the lower bits to index into the number of streams and then 
find the first 1 bit
> Same comment as the other place - I think we can remove this branch.
Done


http://gerrit.cloudera.org:8080/#/c/5004/1/be/src/udf_samples/hyperloglog-uda.cc
File be/src/udf_samples/hyperloglog-uda.cc:

Line 75:   // Use the lower bits to index into the number of streams and then 
find the first 1 bit
> Let's remove this branch while we're at it. It doesn't make sense to me why
Done


PS1, Line 81:   (sizeof(hash_value) * CHAR_BIT - HLL_PRECISION));
:   dst->ptr[idx] = ::max(dst->ptr[idx], first_one_bit);
> Can't we use the builtin you
It's in Impala proper, while it seems like the non-test files in this directory 
carefully avoid using anything from Impala proper.


-- 
To view, visit http://gerrit.cloudera.org:8080/5004
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8460bc3f7e510ce07b8673387c9440accc432abe
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Amos Bird 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4433: Fix undefined NDV calculations

2016-11-08 Thread Jim Apple (Code Review)
Jim Apple has uploaded a new patch set (#2).

Change subject: IMPALA-4433: Fix undefined NDV calculations
..

IMPALA-4433: Fix undefined NDV calculations

GCC's __builtin_ctz[l[l]] functions return undefined results when the
argument is 0. This patch handles that 0 case, which can depend on the
architecture.

This changes the metadata for the test data, because ctz is used in
HLL++, which is used as the NDV computation in compute stats.

Change-Id: I8460bc3f7e510ce07b8673387c9440accc432abe
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/udf_samples/hyperloglog-uda.cc
M be/src/util/bit-util.h
M 
testdata/workloads/functional-query/queries/QueryTest/alter-table-set-column-stats.test
M 
testdata/workloads/functional-query/queries/QueryTest/compute-stats-incremental.test
M testdata/workloads/functional-query/queries/QueryTest/compute-stats.test
M testdata/workloads/functional-query/queries/QueryTest/show-stats.test
M testdata/workloads/functional-query/queries/QueryTest/truncate-table.test
8 files changed, 51 insertions(+), 29 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/5004/2
-- 
To view, visit http://gerrit.cloudera.org:8080/5004
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8460bc3f7e510ce07b8673387c9440accc432abe
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Amos Bird 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4433: Fix undefined NDV calculations

2016-11-08 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4433: Fix undefined NDV calculations
..


Patch Set 2: Code-Review+1

(1 comment)

Would be good to have someone else take a look in case there might be other 
consequences for tests.

http://gerrit.cloudera.org:8080/#/c/5004/1/be/src/udf_samples/hyperloglog-uda.cc
File be/src/udf_samples/hyperloglog-uda.cc:

PS1, Line 81:   (sizeof(hash_value) * CHAR_BIT - HLL_PRECISION));
:   dst->ptr[idx] = ::max(dst->ptr[idx], first_one_bit);
> It's in Impala proper, while it seems like the non-test files in this direc
Oops, thought I deleted this comment (I realised that point halfway through 
writing it)


-- 
To view, visit http://gerrit.cloudera.org:8080/5004
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8460bc3f7e510ce07b8673387c9440accc432abe
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Amos Bird 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions

2016-11-08 Thread Internal Jenkins (Code Review)
Internal Jenkins has posted comments on this change.

Change subject: IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions
..


Patch Set 14: Verified-1

Build failed: 
http://sandbox.jenkins.cloudera.com/job/impala-external-gerrit-verify-merge-ASF/449/

-- 
To view, visit http://gerrit.cloudera.org:8080/4655
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id9dc21d1d676505d3617e1e4f37557397c4fb260
Gerrit-PatchSet: 14
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] Improve Kudu UPSERT test coverage

2016-11-08 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: Improve Kudu UPSERT test coverage
..


Patch Set 4: Code-Review+1

(4 comments)

http://gerrit.cloudera.org:8080/#/c/4953/4//COMMIT_MSG
Commit Message:

PS4, Line 9: In preparation for the public release of Kudu integration in
   : 2.8, we need to make sure that we've covered as much of the
   : Kudu related functionality with tests as possible.
let's remove this; sorry if my previous comments were confusing

I think the line above covers this.


http://gerrit.cloudera.org:8080/#/c/4953/4/testdata/workloads/functional-query/queries/QueryTest/kudu_upsert.test
File testdata/workloads/functional-query/queries/QueryTest/kudu_upsert.test:

PS4, Line 23: row_regex: .*NumModifiedRows: 1.*
I realized in my own change that all of these regex checks can be simplified: 
we don't need the regex we can just do

NumModifiedRows: 1

Can you clean these up here and below?


http://gerrit.cloudera.org:8080/#/c/4953/4/tests/common/impala_test_suite.py
File tests/common/impala_test_suite.py:

PS4, Line 341: ssert 'DML_RESULTS' not in test_section
add a comment that this isn't supported for now because 
__verify_results_and_errors calls verify_raw_results which  always checks 
ERRORS, TYPES, LABELS, ... and it doesn't make sense to do so for this as well 
as for DML_RESULTS below.


PS4, Line 357: limit 1000
Please add a comment about the limit, e.g .to make sure the queries aren't 
unbounded; the tests shouldn't be checking tables this big anyway


-- 
To view, visit http://gerrit.cloudera.org:8080/4953
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib9e7afbef60186edb00a9d11fbe5a8c64931add6
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4433: Fix undefined NDV calculations

2016-11-08 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-4433: Fix undefined NDV calculations
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5004/2/be/src/util/bit-util.h
File be/src/util/bit-util.h:

Line 245: 
missing function comments


-- 
To view, visit http://gerrit.cloudera.org:8080/5004
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8460bc3f7e510ce07b8673387c9440accc432abe
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Amos Bird 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3710: Kudu DML should ignore conflicts, pt2

2016-11-08 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-3710: Kudu DML should ignore conflicts, pt2
..


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/4985/1/be/src/exec/data-sink.cc
File be/src/exec/data-sink.cc:

Line 161:   ss << "NumRowErrors: " << stats.kudu_stats.num_row_errors << 
endl;
> Agree that misinformation is worse. Can you add a comment to the counter si
I put a comment in the Thrift definition of num_row_errors which I thought 
would be more visible. I can move it here if you think that's better?


http://gerrit.cloudera.org:8080/#/c/4985/1/common/thrift/ImpalaInternalService.thrift
File common/thrift/ImpalaInternalService.thrift:

Line 430: // TODO: Refactor to reflect usage by other DML statements.
> I think it makes sense to do this non-functional rename/refactoring in a se
Ok I agree, I'll do it separately


Line 445:   // The latest observed Kudu timestamp reported by the KuduSession 
at this partition.
> reported by the local KuduSession?
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4985
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4eb1ad91dc355ea51de261c3a14df0f9d28c879c
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3710: Kudu DML should ignore conflicts, pt2

2016-11-08 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-3710: Kudu DML should ignore conflicts, pt2
..


Patch Set 1:

BTW, based on a conversation with the Kudu folks I learned my handling of the 
'NotFound' errors wasn't right. I changed the code to not attempt to break them 
apart.

-- 
To view, visit http://gerrit.cloudera.org:8080/4985
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4eb1ad91dc355ea51de261c3a14df0f9d28c879c
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions

2016-11-08 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions
..


Patch Set 15: Code-Review+2

Bad rebase - needed to fix a test

-- 
To view, visit http://gerrit.cloudera.org:8080/4655
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id9dc21d1d676505d3617e1e4f37557397c4fb260
Gerrit-PatchSet: 15
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions

2016-11-08 Thread Tim Armstrong (Code Review)
Hello Internal Jenkins, Dan Hecht,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4655

to look at the new patch set (#15).

Change subject: IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions
..

IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions

This change enables codegen for all builtin aggregate functions,
e.g. timestamp functions and group_concat.

There are several parts to the change:
* Adding support for generic UDAs. Previous the codegen code did not
  handle multiple input arguments or NULL return values.
* Defaulting to using the UDA interface when there is not a special
  codegen path (we have implementations of all builtin aggregate
  functions for the interpreted path).
* Remove all the logic to disable codegen for the special cases that now
  are supported.

Also fix the generation of code to get/set NULL bits since I needed
to add functionality there anyway.

Testing:
Add tests that check that codegen was enabled for builtin aggregate
functions. Also fix some gaps in the preexisting tests.

Also add tests for UDAs that check input/output nulls are handled
correctly, in anticipation of enabling codegen for arbitrary UDAs.
The tests are run with both codegen enabled and disabled. To avoid
flaky tests, we switch the UDF tests to use "unique_database".

Perf:
Ran local TPC-H and targeted perf. Spent a lot of time on TPC-H Q1,
since my original approach regressed it ~5%. In the end the problem was
to do with the ordering of loads/stores to the slot and null bit in the
generated code: the previous version of the code exploited some
properties of the particular aggregate function. I ended up replicating
this behaviour to avoid regressing perf.

Change-Id: Id9dc21d1d676505d3617e1e4f37557397c4fb260
---
M be/src/benchmarks/hash-benchmark.cc
M be/src/codegen/codegen-anyval.cc
M be/src/codegen/codegen-anyval.h
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/llvm-codegen-test.cc
M be/src/codegen/llvm-codegen.cc
M be/src/codegen/llvm-codegen.h
M be/src/exec/aggregation-node.cc
M be/src/exec/exec-node.cc
M be/src/exec/hash-join-node.cc
M be/src/exec/hash-table.cc
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-scanner.cc
M be/src/exec/old-hash-table.cc
M be/src/exec/partitioned-aggregation-node-ir.cc
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-aggregation-node.h
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/text-converter.cc
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/case-expr.cc
M be/src/exprs/compound-predicates.cc
M be/src/exprs/expr-codegen-test.cc
M be/src/exprs/expr.cc
M be/src/exprs/literal.cc
M be/src/exprs/null-literal.cc
M be/src/exprs/scalar-fn-call.cc
M be/src/exprs/slot-ref.cc
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/tuple.cc
M be/src/runtime/types.h
M be/src/testutil/test-udas.cc
M be/src/testutil/test-udfs.cc
M be/src/util/tuple-row-compare.cc
M testdata/workloads/functional-query/queries/QueryTest/java-udf.test
M 
testdata/workloads/functional-query/queries/QueryTest/libs_with_same_filenames.test
M testdata/workloads/functional-query/queries/QueryTest/load-java-udfs.test
M testdata/workloads/functional-query/queries/QueryTest/uda-mem-limit.test
M testdata/workloads/functional-query/queries/QueryTest/uda.test
A 
testdata/workloads/functional-query/queries/QueryTest/udf-codegen-required.test
M testdata/workloads/functional-query/queries/QueryTest/udf-errors.test
M testdata/workloads/functional-query/queries/QueryTest/udf-mem-limit.test
M testdata/workloads/functional-query/queries/QueryTest/udf.test
M tests/common/test_result_verifier.py
M tests/query_test/test_aggregation.py
M tests/query_test/test_udfs.py
47 files changed, 1,039 insertions(+), 802 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/4655/15
-- 
To view, visit http://gerrit.cloudera.org:8080/4655
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id9dc21d1d676505d3617e1e4f37557397c4fb260
Gerrit-PatchSet: 15
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-3710: Kudu DML should ignore conflicts, pt2

2016-11-08 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has uploaded a new patch set (#2).

Change subject: IMPALA-3710: Kudu DML should ignore conflicts, pt2
..

IMPALA-3710: Kudu DML should ignore conflicts, pt2

Second part of IMPALA-3710, which removed the IGNORE DML
option and changed the following errors on Kudu DML
operations to be ignored:
1) INSERT where the PK already exists
2) UPDATE/DELETE where the PK doesn't exist

This changes other data-related errors to be ignored as
well:
3) NULLs in non-nullable columns, i.e. null constraint
  violoations.
4) Rows with PKs that are in an 'uncovered range'.

It became clear that we can't differentiate between (3) and
(4) because both return a Kudu 'NotFound' error code. The
Impala error codes have been simplified as well: we just
report a generic KUDU_NOT_FOUND error in these cases.

This also adds some metadata to the thrift report sent to
the coordinator from sinks so the total number of rows with
errors can be added to the profile. Note that this does not
include a breakdown of error counts by type/code because we
cannot differentiate between all of these cases yet.

An upcoming change will add this new info to the beeswax
interface and show it in the shell output (IMPALA-3713).

Testing: Updated kudu_crud tests to check the number of rows
with errors.

Change-Id: I4eb1ad91dc355ea51de261c3a14df0f9d28c879c
---
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/kudu-table-sink.cc
M be/src/exec/kudu-table-sink.h
M be/src/runtime/coordinator.cc
M common/thrift/ImpalaInternalService.thrift
M common/thrift/generate_error_codes.py
M testdata/workloads/functional-query/queries/QueryTest/kudu_crud.test
9 files changed, 179 insertions(+), 101 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/85/4985/2
-- 
To view, visit http://gerrit.cloudera.org:8080/4985
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4eb1ad91dc355ea51de261c3a14df0f9d28c879c
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 


[Impala-ASF-CR](asf-site) Add "Effective Coding Practices" doc to site

2016-11-08 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: Add "Effective Coding Practices" doc to site
..


Patch Set 2:

I'm happy with this content going up, but I feel it should be a wiki page - 
there's no reason this should be considered immutable, and may evolve as time 
goes on.

-- 
To view, visit http://gerrit.cloudera.org:8080/4961
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I21fdce898a71be836b658e0c914e05a6868d6263
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: asf-site
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3710: Kudu DML should ignore conflicts, pt2

2016-11-08 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-3710: Kudu DML should ignore conflicts, pt2
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/4985/2/be/src/exec/kudu-table-sink.cc
File be/src/exec/kudu-table-sink.cc:

PS2, Line 188: violoations
spelling


http://gerrit.cloudera.org:8080/#/c/4985/2/be/src/exec/kudu-table-sink.h
File be/src/exec/kudu-table-sink.h:

PS2, Line 119: rows
maybe "rows consumed"? or "rows processed"?


-- 
To view, visit http://gerrit.cloudera.org:8080/4985
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4eb1ad91dc355ea51de261c3a14df0f9d28c879c
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3710: Kudu DML should ignore conflicts, pt2

2016-11-08 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-3710: Kudu DML should ignore conflicts, pt2
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/4985/2/be/src/exec/kudu-table-sink.cc
File be/src/exec/kudu-table-sink.cc:

PS2, Line 188: violoations
> spelling
Done


http://gerrit.cloudera.org:8080/#/c/4985/2/be/src/exec/kudu-table-sink.h
File be/src/exec/kudu-table-sink.h:

PS2, Line 119: rows
> maybe "rows consumed"? or "rows processed"?
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4985
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4eb1ad91dc355ea51de261c3a14df0f9d28c879c
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3710: Kudu DML should ignore conflicts, pt2

2016-11-08 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has uploaded a new patch set (#3).

Change subject: IMPALA-3710: Kudu DML should ignore conflicts, pt2
..

IMPALA-3710: Kudu DML should ignore conflicts, pt2

Second part of IMPALA-3710, which removed the IGNORE DML
option and changed the following errors on Kudu DML
operations to be ignored:
1) INSERT where the PK already exists
2) UPDATE/DELETE where the PK doesn't exist

This changes other data-related errors to be ignored as
well:
3) NULLs in non-nullable columns, i.e. null constraint
  violoations.
4) Rows with PKs that are in an 'uncovered range'.

It became clear that we can't differentiate between (3) and
(4) because both return a Kudu 'NotFound' error code. The
Impala error codes have been simplified as well: we just
report a generic KUDU_NOT_FOUND error in these cases.

This also adds some metadata to the thrift report sent to
the coordinator from sinks so the total number of rows with
errors can be added to the profile. Note that this does not
include a breakdown of error counts by type/code because we
cannot differentiate between all of these cases yet.

An upcoming change will add this new info to the beeswax
interface and show it in the shell output (IMPALA-3713).

Testing: Updated kudu_crud tests to check the number of rows
with errors.

Change-Id: I4eb1ad91dc355ea51de261c3a14df0f9d28c879c
---
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/kudu-table-sink.cc
M be/src/exec/kudu-table-sink.h
M be/src/runtime/coordinator.cc
M common/thrift/ImpalaInternalService.thrift
M common/thrift/generate_error_codes.py
M testdata/workloads/functional-query/queries/QueryTest/kudu_crud.test
9 files changed, 180 insertions(+), 101 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/85/4985/3
-- 
To view, visit http://gerrit.cloudera.org:8080/4985
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4eb1ad91dc355ea51de261c3a14df0f9d28c879c
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Matthew Jacobs 


[Impala-ASF-CR] IMPALA-3710: Kudu DML should ignore conflicts, pt2

2016-11-08 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-3710: Kudu DML should ignore conflicts, pt2
..


Patch Set 3: Code-Review+1

-- 
To view, visit http://gerrit.cloudera.org:8080/4985
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4eb1ad91dc355ea51de261c3a14df0f9d28c879c
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: No


[Impala-ASF-CR] Add -build shared libs for default build for speed.

2016-11-08 Thread Internal Jenkins (Code Review)
Internal Jenkins has submitted this change and it was merged.

Change subject: Add -build_shared_libs for default build for speed.
..


Add -build_shared_libs for default build for speed.

This is already recommended by the wiki:

https://cwiki.apache.org/confluence/display/IMPALA/Building+Impala

Change-Id: Ic83db07e59ff339dcce7362bd296ebcfd60b71d6
Reviewed-on: http://gerrit.cloudera.org:8080/4970
Reviewed-by: Jim Apple 
Tested-by: Internal Jenkins
---
M bin/bootstrap_development.sh
1 file changed, 1 insertion(+), 1 deletion(-)

Approvals:
  Jim Apple: Looks good to me, approved
  Internal Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/4970
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ic83db07e59ff339dcce7362bd296ebcfd60b71d6
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] Add -build shared libs for default build for speed.

2016-11-08 Thread Internal Jenkins (Code Review)
Internal Jenkins has posted comments on this change.

Change subject: Add -build_shared_libs for default build for speed.
..


Patch Set 2: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/4970
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic83db07e59ff339dcce7362bd296ebcfd60b71d6
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3398: Add docs to main Impala branch.

2016-11-08 Thread Jim Apple (Code Review)
Jim Apple has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/5014

Change subject: IMPALA-3398: Add docs to main Impala branch.
..

IMPALA-3398: Add docs to main Impala branch.

These are refugees from doc_prototype. They can be rendered with the
DITA Open Toolkit version 2.3.3 by:

/tmp/dita-ot-2.3.3/bin/dita \
  -i impala.ditamap \
  -f html5 \
  -o $(mktemp -d) \
  -filter impala_html.ditaval

Change-Id: I8861e99adc446f659a04463ca78c79200669484f
---
A docs/Cloudera-Impala-Release-Notes.ditamap
A docs/generatingImpalaDoc.md
A docs/images/howto_access_control.png
A docs/images/howto_per_node_peak_memory_usage.png
A docs/images/howto_show_histogram.png
A docs/images/howto_static_server_pools_config.png
A docs/images/impala_arch.jpeg
A docs/images/support_send_diagnostic_data.png
A docs/impala.ditamap
A docs/impala_html.ditaval
A docs/impala_pdf.ditaval
A docs/impala_sqlref.ditamap
A docs/shared/ImpalaVariables.xml
A docs/shared/impala_common.xml
A docs/topics/impala.xml
A docs/topics/impala_abort_on_default_limit_exceeded.xml
A docs/topics/impala_abort_on_error.xml
A docs/topics/impala_admin.xml
A docs/topics/impala_admission.xml
A docs/topics/impala_aggregate_functions.xml
A docs/topics/impala_aliases.xml
A docs/topics/impala_allow_unsupported_formats.xml
A docs/topics/impala_alter_function.xml
A docs/topics/impala_alter_table.xml
A docs/topics/impala_alter_view.xml
A docs/topics/impala_analytic_functions.xml
A docs/topics/impala_appx_count_distinct.xml
A docs/topics/impala_appx_median.xml
A docs/topics/impala_array.xml
A docs/topics/impala_auditing.xml
A docs/topics/impala_authentication.xml
A docs/topics/impala_authorization.xml
A docs/topics/impala_avg.xml
A docs/topics/impala_avro.xml
A docs/topics/impala_batch_size.xml
A docs/topics/impala_bigint.xml
A docs/topics/impala_bit_functions.xml
A docs/topics/impala_boolean.xml
A docs/topics/impala_breakpad.xml
A docs/topics/impala_cdh.xml
A docs/topics/impala_char.xml
A docs/topics/impala_cluster_sizing.xml
A docs/topics/impala_cm_installation.xml
A docs/topics/impala_comments.xml
A docs/topics/impala_complex_types.xml
A docs/topics/impala_components.xml
A docs/topics/impala_compression_codec.xml
A docs/topics/impala_compute_stats.xml
A docs/topics/impala_concepts.xml
A docs/topics/impala_conditional_functions.xml
A docs/topics/impala_config.xml
A docs/topics/impala_config_options.xml
A docs/topics/impala_config_performance.xml
A docs/topics/impala_connecting.xml
A docs/topics/impala_conversion_functions.xml
A docs/topics/impala_count.xml
A docs/topics/impala_create_data_source.xml
A docs/topics/impala_create_database.xml
A docs/topics/impala_create_function.xml
A docs/topics/impala_create_role.xml
A docs/topics/impala_create_table.xml
A docs/topics/impala_create_view.xml
A docs/topics/impala_data_sources.xml
A docs/topics/impala_databases.xml
A docs/topics/impala_datatypes.xml
A docs/topics/impala_date.xml
A docs/topics/impala_datetime_functions.xml
A docs/topics/impala_ddl.xml
A docs/topics/impala_debug_action.xml
A docs/topics/impala_decimal.xml
A docs/topics/impala_default_order_by_limit.xml
A docs/topics/impala_delegation.xml
A docs/topics/impala_delete.xml
A docs/topics/impala_describe.xml
A docs/topics/impala_development.xml
A docs/topics/impala_disable_cached_reads.xml
A docs/topics/impala_disable_codegen.xml
A docs/topics/impala_disable_outermost_topn.xml
A docs/topics/impala_disable_row_runtime_filtering.xml
A docs/topics/impala_disable_streaming_preaggregations.xml
A docs/topics/impala_disable_unsafe_spills.xml
A docs/topics/impala_disk_space.xml
A docs/topics/impala_distinct.xml
A docs/topics/impala_dml.xml
A docs/topics/impala_double.xml
A docs/topics/impala_drop_data_source.xml
A docs/topics/impala_drop_database.xml
A docs/topics/impala_drop_function.xml
A docs/topics/impala_drop_role.xml
A docs/topics/impala_drop_stats.xml
A docs/topics/impala_drop_table.xml
A docs/topics/impala_drop_view.xml
A docs/topics/impala_errata.xml
A docs/topics/impala_exec_single_node_rows_threshold.xml
A docs/topics/impala_explain.xml
A docs/topics/impala_explain_level.xml
A docs/topics/impala_explain_plan.xml
A docs/topics/impala_faq.xml
A docs/topics/impala_faq_base.xml
A docs/topics/impala_features.xml
A docs/topics/impala_file_formats.xml
A docs/topics/impala_fixed_issues.xml
A docs/topics/impala_float.xml
A docs/topics/impala_functions.xml
A docs/topics/impala_functions_overview.xml
A docs/topics/impala_glossary.xml
A docs/topics/impala_grant.xml
A docs/topics/impala_group_by.xml
A docs/topics/impala_group_concat.xml
A docs/topics/impala_hadoop.xml
A docs/topics/impala_having.xml
A docs/topics/impala_hbase.xml
A docs/topics/impala_hbase_cache_blocks.xml
A docs/topics/impala_hbase_caching.xml
A docs/topics/impala_hints.xml
A docs/topics/impala_howto_rm.xml
A docs/topics/impala_identifiers.xml
A docs/topics/impala_impala_shell.xml
A docs/topics/impala_incompatible_changes.

[Impala-ASF-CR] IMPALA-3398: Add docs to main Impala branch.

2016-11-08 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: IMPALA-3398: Add docs to main Impala branch.
..


Patch Set 1:

Not really ready for review yet

-- 
To view, visit http://gerrit.cloudera.org:8080/5014
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8861e99adc446f659a04463ca78c79200669484f
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Jim Apple 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4397,IMPALA-3259: reduce codegen time and memory

2016-11-08 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#3).

Change subject: IMPALA-4397,IMPALA-3259: reduce codegen time and memory
..

IMPALA-4397,IMPALA-3259: reduce codegen time and memory

A handful of fixes to codegen memory usage:
* Delete the IR module when we're done with it (it can be fairly large)
* Track the compiled code size (typically not that large, but it can add
  up if there are many fragments).
* Estimate optimisation memory requirements and track it in the memory
  tracker. This is very crude but much better than not tracking it.

A handful of fixes to improve codegen time/cost, particularly targeted
at compute stats workloads:
* Avoid over-inlining when there are many aggregate functions,
  conjuncts, etc by adding "NoInline" attributes.
* Bail of out text scanner codegen for wide tables.
* Don't codegen non-grouping merge aggregations. They will only process
  one row per Impala daemon, so codegen is not worth it.
* Make the Hll algorithm more efficient by removing a branch that does
  not affect the final probabilistic result substantially, and by
  specialising the hash function based on decimal type.

Limitations:
* This doesn't tackle over-inlining of large expr trees, but a similar
  approach will be used there in a follow-on patch.

Perf:
Compute stats on functional_parquet.widetable_1000_cols goes from 1min+
of codegen to < 1s codegen on my machine. Local perf runs of tpc-h
and targeted perf showed no regressions and some moderate improvements
(1-2%).

Also did an experiment to understand the perf consequences of disabling
inlining. I manually set CODEGEN_INLINE_EXPRS_THRESHOLD to 0, and ran:

  drop stats tpch_20_parquet.lineitem
  compute stats tpch_20_parquet.lineitem;

There was no difference in time spent in the agg node: 30.7s with
inlining, 30.5s without.

Change-Id: Id10015b49da182cb181a653ac8464b4a18b71091
---
M be/src/benchmarks/hash-benchmark.cc
M be/src/codegen/llvm-codegen-test.cc
M be/src/codegen/llvm-codegen.cc
M be/src/codegen/llvm-codegen.h
M be/src/codegen/mcjit-mem-mgr.h
M be/src/exec/aggregation-node.cc
M be/src/exec/exchange-node.cc
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/hash-join-node.cc
M be/src/exec/hash-table.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/sort-node.cc
M be/src/exec/topn-node.cc
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/anyval-util.h
M be/src/exprs/expr-codegen-test.cc
M be/src/exprs/expr.h
M be/src/runtime/lib-cache.cc
M be/src/runtime/runtime-state.cc
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M testdata/workloads/functional-query/queries/QueryTest/compute-stats.test
M tests/query_test/test_aggregation.py
29 files changed, 1,444 insertions(+), 152 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/56/4956/3
-- 
To view, visit http://gerrit.cloudera.org:8080/4956
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id10015b49da182cb181a653ac8464b4a18b71091
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4397,IMPALA-3259: reduce codegen time and memory

2016-11-08 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#4).

Change subject: IMPALA-4397,IMPALA-3259: reduce codegen time and memory
..

IMPALA-4397,IMPALA-3259: reduce codegen time and memory

A handful of fixes to codegen memory usage:
* Delete the IR module when we're done with it (it can be fairly large)
* Track the compiled code size (typically not that large, but it can add
  up if there are many fragments).
* Estimate optimisation memory requirements and track it in the memory
  tracker. This is very crude but much better than not tracking it.

A handful of fixes to improve codegen time/cost, particularly targeted
at compute stats workloads:
* Avoid over-inlining when there are many aggregate functions,
  conjuncts, etc by adding "NoInline" attributes.
* Bail of out text scanner codegen for wide tables.
* Don't codegen non-grouping merge aggregations. They will only process
  one row per Impala daemon, so codegen is not worth it.
* Make the Hll algorithm more efficient by specialising the hash function
  based on decimal width.

Limitations:
* This doesn't tackle over-inlining of large expr trees, but a similar
  approach will be used there in a follow-on patch.

Perf:
Compute stats on functional_parquet.widetable_1000_cols goes from 1min+
of codegen to < 1s codegen on my machine. Local perf runs of tpc-h
and targeted perf showed no regressions and some moderate improvements
(1-2%).

Also did an experiment to understand the perf consequences of disabling
inlining. I manually set CODEGEN_INLINE_EXPRS_THRESHOLD to 0, and ran:

  drop stats tpch_20_parquet.lineitem
  compute stats tpch_20_parquet.lineitem;

There was no difference in time spent in the agg node: 30.7s with
inlining, 30.5s without.

Change-Id: Id10015b49da182cb181a653ac8464b4a18b71091
---
M be/src/benchmarks/hash-benchmark.cc
M be/src/codegen/llvm-codegen-test.cc
M be/src/codegen/llvm-codegen.cc
M be/src/codegen/llvm-codegen.h
M be/src/codegen/mcjit-mem-mgr.h
M be/src/exec/aggregation-node.cc
M be/src/exec/exchange-node.cc
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/hash-join-node.cc
M be/src/exec/hash-table.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/sort-node.cc
M be/src/exec/topn-node.cc
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/anyval-util.h
M be/src/exprs/expr-codegen-test.cc
M be/src/exprs/expr.h
M be/src/runtime/lib-cache.cc
M be/src/runtime/runtime-state.cc
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M testdata/workloads/functional-query/queries/QueryTest/compute-stats.test
M tests/query_test/test_aggregation.py
29 files changed, 1,444 insertions(+), 152 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/56/4956/4
-- 
To view, visit http://gerrit.cloudera.org:8080/4956
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id10015b49da182cb181a653ac8464b4a18b71091
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4438: Serialize test failpoints.py to reduce memory pressure

2016-11-08 Thread Internal Jenkins (Code Review)
Internal Jenkins has posted comments on this change.

Change subject: IMPALA-4438: Serialize test_failpoints.py to reduce memory 
pressure
..


Patch Set 3: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/4999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iea4a588e1228d38f90387a077cbe530257636b7d
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4438: Serialize test failpoints.py to reduce memory pressure

2016-11-08 Thread Internal Jenkins (Code Review)
Internal Jenkins has submitted this change and it was merged.

Change subject: IMPALA-4438: Serialize test_failpoints.py to reduce memory 
pressure
..


IMPALA-4438: Serialize test_failpoints.py to reduce memory pressure

On EC2 c3.4xlarge instances, with 8cores and 30GB RAM, this test could
trigger the Linux OOM killer by running tests in parallel. This patch
switches to serial execution, which makes the test take four minutes,
rather than one to two minutes.

Change-Id: Iea4a588e1228d38f90387a077cbe530257636b7d
Reviewed-on: http://gerrit.cloudera.org:8080/4999
Reviewed-by: Jim Apple 
Tested-by: Internal Jenkins
---
M tests/failure/test_failpoints.py
1 file changed, 4 insertions(+), 0 deletions(-)

Approvals:
  Jim Apple: Looks good to me, approved
  Internal Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/4999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Iea4a588e1228d38f90387a077cbe530257636b7d
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Jim Apple 


[Impala-ASF-CR] IMPALA-4410: Safer tear-down of RuntimeState

2016-11-08 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4410: Safer tear-down of RuntimeState
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/4893/2/be/src/runtime/test-env.cc
File be/src/runtime/test-env.cc:

Line 109:   for (shared_ptr& runtime_state : runtime_states_) {
This calculation is wrong unless you only created one RuntimeState per query 
(which was the original intent of the interface).


http://gerrit.cloudera.org:8080/#/c/4893/2/be/src/runtime/test-env.h
File be/src/runtime/test-env.h:

PS2, Line 42: CreateRuntimeState
The old name was deliberate, since it's meant to provide a simple interface to 
set up the minimal set of query-wide objects for a test.

I.e. once there is actually a query-wide state, this should create it.

The new name seems ok, but there doesn't seem to be much point in changing it 
if it's going to get changed back after IMPALA-4014


-- 
To view, visit http://gerrit.cloudera.org:8080/4893
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie416e4d57240142bf685385299b749c3a6792c45
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4258: Remove duplicated and unused test macros

2016-11-08 Thread Internal Jenkins (Code Review)
Internal Jenkins has posted comments on this change.

Change subject: IMPALA-4258: Remove duplicated and unused test macros
..


Patch Set 2: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/4917
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08539d7e46b89d7e0a4338510b65f9867814c275
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4258: Remove duplicated and unused test macros

2016-11-08 Thread Internal Jenkins (Code Review)
Internal Jenkins has submitted this change and it was merged.

Change subject: IMPALA-4258: Remove duplicated and unused test macros
..


IMPALA-4258: Remove duplicated and unused test macros

Macros defined in test-macros.h are either duplicated in gtest-util.h
or are unused anywhere in the code. This change deletes test-macros.h

Change-Id: I08539d7e46b89d7e0a4338510b65f9867814c275
Reviewed-on: http://gerrit.cloudera.org:8080/4917
Reviewed-by: Tim Armstrong 
Tested-by: Internal Jenkins
---
M be/src/runtime/tmp-file-mgr-test.cc
D be/src/testutil/test-macros.h
2 files changed, 1 insertion(+), 56 deletions(-)

Approvals:
  Internal Jenkins: Verified
  Tim Armstrong: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/4917
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I08539d7e46b89d7e0a4338510b65f9867814c275
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: anujphadke 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-3710: Kudu DML should ignore conflicts, pt2

2016-11-08 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-3710: Kudu DML should ignore conflicts, pt2
..


Patch Set 3: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/4985
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4eb1ad91dc355ea51de261c3a14df0f9d28c879c
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4446: expr-test fails under ASAN

2016-11-08 Thread Internal Jenkins (Code Review)
Internal Jenkins has posted comments on this change.

Change subject: IMPALA-4446: expr-test fails under ASAN
..


Patch Set 3: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/5000
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0ac10d34dd6463ab52e41de1002ef065cfe63a20
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4446: expr-test fails under ASAN

2016-11-08 Thread Internal Jenkins (Code Review)
Internal Jenkins has submitted this change and it was merged.

Change subject: IMPALA-4446: expr-test fails under ASAN
..


IMPALA-4446: expr-test fails under ASAN

Various places in the LikePredicate code assumed StringVal is
null-terminated. There is no such guarantee. By coincidence string
literals were sometimes backed by std::string storage that was
null-terminated, so this bug was latent until recently.

Testing:
Was able to reproduce the failure locally under ASAN, now the test
passes. Running the full ASAN tests to verify, but putting this up
for review first to unbreak the build sooner.

Change-Id: I0ac10d34dd6463ab52e41de1002ef065cfe63a20
Reviewed-on: http://gerrit.cloudera.org:8080/5000
Reviewed-by: Tim Armstrong 
Tested-by: Internal Jenkins
---
M be/src/exprs/like-predicate.cc
1 file changed, 15 insertions(+), 15 deletions(-)

Approvals:
  Internal Jenkins: Verified
  Tim Armstrong: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/5000
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I0ac10d34dd6463ab52e41de1002ef065cfe63a20
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-1286: Extract common conjuncts from disjunctions.

2016-11-08 Thread Internal Jenkins (Code Review)
Internal Jenkins has posted comments on this change.

Change subject: IMPALA-1286: Extract common conjuncts from disjunctions.
..


Patch Set 6: Verified-1

Build failed: 
http://sandbox.jenkins.cloudera.com/job/impala-external-gerrit-verify-merge-ASF/454/

-- 
To view, visit http://gerrit.cloudera.org:8080/4877
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4447: Rein in overly broad sed that dirties the tree

2016-11-08 Thread Internal Jenkins (Code Review)
Internal Jenkins has posted comments on this change.

Change subject: IMPALA-4447: Rein in overly broad sed that dirties the tree
..


Patch Set 1: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/5008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I51a0498d24b7fccc05b6183123501766cb36f85e
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4447: Rein in overly broad sed that dirties the tree

2016-11-08 Thread Internal Jenkins (Code Review)
Internal Jenkins has submitted this change and it was merged.

Change subject: IMPALA-4447: Rein in overly broad sed that dirties the tree
..


IMPALA-4447: Rein in overly broad sed that dirties the tree

This patch fixes a sed expression to make sure it only laters the code
it is meant to alter, not the comment describing the code.

Tested with tests/run-tests.py query_test/test_udfs.py

Change-Id: I51a0498d24b7fccc05b6183123501766cb36f85e
Reviewed-on: http://gerrit.cloudera.org:8080/5008
Reviewed-by: Tim Armstrong 
Tested-by: Internal Jenkins
---
M testdata/bin/copy-udfs-udas.sh
1 file changed, 4 insertions(+), 4 deletions(-)

Approvals:
  Internal Jenkins: Verified
  Tim Armstrong: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/5008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I51a0498d24b7fccc05b6183123501766cb36f85e
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions

2016-11-08 Thread Internal Jenkins (Code Review)
Internal Jenkins has submitted this change and it was merged.

Change subject: IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions
..


IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions

This change enables codegen for all builtin aggregate functions,
e.g. timestamp functions and group_concat.

There are several parts to the change:
* Adding support for generic UDAs. Previous the codegen code did not
  handle multiple input arguments or NULL return values.
* Defaulting to using the UDA interface when there is not a special
  codegen path (we have implementations of all builtin aggregate
  functions for the interpreted path).
* Remove all the logic to disable codegen for the special cases that now
  are supported.

Also fix the generation of code to get/set NULL bits since I needed
to add functionality there anyway.

Testing:
Add tests that check that codegen was enabled for builtin aggregate
functions. Also fix some gaps in the preexisting tests.

Also add tests for UDAs that check input/output nulls are handled
correctly, in anticipation of enabling codegen for arbitrary UDAs.
The tests are run with both codegen enabled and disabled. To avoid
flaky tests, we switch the UDF tests to use "unique_database".

Perf:
Ran local TPC-H and targeted perf. Spent a lot of time on TPC-H Q1,
since my original approach regressed it ~5%. In the end the problem was
to do with the ordering of loads/stores to the slot and null bit in the
generated code: the previous version of the code exploited some
properties of the particular aggregate function. I ended up replicating
this behaviour to avoid regressing perf.

Change-Id: Id9dc21d1d676505d3617e1e4f37557397c4fb260
Reviewed-on: http://gerrit.cloudera.org:8080/4655
Reviewed-by: Tim Armstrong 
Tested-by: Internal Jenkins
---
M be/src/benchmarks/hash-benchmark.cc
M be/src/codegen/codegen-anyval.cc
M be/src/codegen/codegen-anyval.h
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/llvm-codegen-test.cc
M be/src/codegen/llvm-codegen.cc
M be/src/codegen/llvm-codegen.h
M be/src/exec/aggregation-node.cc
M be/src/exec/exec-node.cc
M be/src/exec/hash-join-node.cc
M be/src/exec/hash-table.cc
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-scanner.cc
M be/src/exec/old-hash-table.cc
M be/src/exec/partitioned-aggregation-node-ir.cc
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-aggregation-node.h
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/text-converter.cc
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/case-expr.cc
M be/src/exprs/compound-predicates.cc
M be/src/exprs/expr-codegen-test.cc
M be/src/exprs/expr.cc
M be/src/exprs/literal.cc
M be/src/exprs/null-literal.cc
M be/src/exprs/scalar-fn-call.cc
M be/src/exprs/slot-ref.cc
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/tuple.cc
M be/src/runtime/types.h
M be/src/testutil/test-udas.cc
M be/src/testutil/test-udfs.cc
M be/src/util/tuple-row-compare.cc
M testdata/workloads/functional-query/queries/QueryTest/java-udf.test
M 
testdata/workloads/functional-query/queries/QueryTest/libs_with_same_filenames.test
M testdata/workloads/functional-query/queries/QueryTest/load-java-udfs.test
M testdata/workloads/functional-query/queries/QueryTest/uda-mem-limit.test
M testdata/workloads/functional-query/queries/QueryTest/uda.test
A 
testdata/workloads/functional-query/queries/QueryTest/udf-codegen-required.test
M testdata/workloads/functional-query/queries/QueryTest/udf-errors.test
M testdata/workloads/functional-query/queries/QueryTest/udf-mem-limit.test
M testdata/workloads/functional-query/queries/QueryTest/udf.test
M tests/common/test_result_verifier.py
M tests/query_test/test_aggregation.py
M tests/query_test/test_udfs.py
47 files changed, 1,039 insertions(+), 802 deletions(-)

Approvals:
  Internal Jenkins: Verified
  Tim Armstrong: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/4655
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Id9dc21d1d676505d3617e1e4f37557397c4fb260
Gerrit-PatchSet: 16
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions

2016-11-08 Thread Internal Jenkins (Code Review)
Internal Jenkins has posted comments on this change.

Change subject: IMPALA-1430,IMPALA-4108: codegen all builtin aggregate functions
..


Patch Set 15: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/4655
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id9dc21d1d676505d3617e1e4f37557397c4fb260
Gerrit-PatchSet: 15
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4410: Safer tear-down of RuntimeState

2016-11-08 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-4410: Safer tear-down of RuntimeState
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/4893/2/be/src/runtime/test-env.cc
File be/src/runtime/test-env.cc:

Line 109:   for (shared_ptr& runtime_state : runtime_states_) {
> This calculation is wrong unless you only created one RuntimeState per quer
What do you propose - summing over the distinct query IDs? Or checking in 
CreateState() that the query_id hasn't been used before? It looks 
like at most if not all callsites it's supposed to be called with a unique id, 
so perhaps that's an invariant we should enforce in the Create...() method.


http://gerrit.cloudera.org:8080/#/c/4893/2/be/src/runtime/test-env.h
File be/src/runtime/test-env.h:

PS2, Line 42: CreateRuntimeState
> The old name was deliberate, since it's meant to provide a simple interface
I can change it back; it seemed almost obfuscatory here because all it did was 
create a RuntimeState with some associated structures and QueryState has 
another meaning already. Maybe CreatePerQueryState() will be clearer.


-- 
To view, visit http://gerrit.cloudera.org:8080/4893
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie416e4d57240142bf685385299b749c3a6792c45
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-1286: Extract common conjuncts from disjunctions.

2016-11-08 Thread Alex Behm (Code Review)
Hello Marcel Kornacker, Internal Jenkins, Tim Armstrong,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4877

to look at the new patch set (#7).

Change subject: IMPALA-1286: Extract common conjuncts from disjunctions.
..

IMPALA-1286: Extract common conjuncts from disjunctions.

Adds a new ExprRewriteRule to extract common conjuncts from
disjunctions.

Examples:
(a AND b AND c) OR (b AND d) ==> b AND ((a AND c) OR (d))
(a AND b) OR (a AND b) ==> a AND b
(a AND b AND c) OR (c) ==> c

Adds a new query option ENABLE_EXPR_REWRITES to enable/disable
non-essential expr rewrites in the FE. Note that some rewrites
are required, e.g., BetweenToCompoundRule. Disabling the rewrites
is useful for testing, in particular, to make sure that the exprs
specified in expr-test.cc are executed as written.

Testing: Added a new unit test in ExprRewriteRulesTest.

Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad
---
M be/src/exprs/expr-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/BetweenPredicate.java
M fe/src/main/java/org/apache/impala/analysis/CompoundPredicate.java
M fe/src/main/java/org/apache/impala/analysis/ExistsPredicate.java
M fe/src/main/java/org/apache/impala/rewrite/ExprRewriteRule.java
A fe/src/main/java/org/apache/impala/rewrite/ExtractCommonConjunctRule.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
M fe/src/test/java/org/apache/impala/analysis/ExprTest.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-views.test
18 files changed, 282 insertions(+), 23 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/77/4877/7
-- 
To view, visit http://gerrit.cloudera.org:8080/4877
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-1286: Extract common conjuncts from disjunctions.

2016-11-08 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-1286: Extract common conjuncts from disjunctions.
..


Patch Set 7: Code-Review+2

Trivial update in expected plan to an equivalent one.

-- 
To view, visit http://gerrit.cloudera.org:8080/4877
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3710: Kudu DML should ignore conflicts, pt2

2016-11-08 Thread Internal Jenkins (Code Review)
Internal Jenkins has submitted this change and it was merged.

Change subject: IMPALA-3710: Kudu DML should ignore conflicts, pt2
..


IMPALA-3710: Kudu DML should ignore conflicts, pt2

Second part of IMPALA-3710, which removed the IGNORE DML
option and changed the following errors on Kudu DML
operations to be ignored:
1) INSERT where the PK already exists
2) UPDATE/DELETE where the PK doesn't exist

This changes other data-related errors to be ignored as
well:
3) NULLs in non-nullable columns, i.e. null constraint
  violoations.
4) Rows with PKs that are in an 'uncovered range'.

It became clear that we can't differentiate between (3) and
(4) because both return a Kudu 'NotFound' error code. The
Impala error codes have been simplified as well: we just
report a generic KUDU_NOT_FOUND error in these cases.

This also adds some metadata to the thrift report sent to
the coordinator from sinks so the total number of rows with
errors can be added to the profile. Note that this does not
include a breakdown of error counts by type/code because we
cannot differentiate between all of these cases yet.

An upcoming change will add this new info to the beeswax
interface and show it in the shell output (IMPALA-3713).

Testing: Updated kudu_crud tests to check the number of rows
with errors.

Change-Id: I4eb1ad91dc355ea51de261c3a14df0f9d28c879c
Reviewed-on: http://gerrit.cloudera.org:8080/4985
Reviewed-by: Alex Behm 
Reviewed-by: Dan Hecht 
Tested-by: Internal Jenkins
---
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/kudu-table-sink.cc
M be/src/exec/kudu-table-sink.h
M be/src/runtime/coordinator.cc
M common/thrift/ImpalaInternalService.thrift
M common/thrift/generate_error_codes.py
M testdata/workloads/functional-query/queries/QueryTest/kudu_crud.test
9 files changed, 180 insertions(+), 101 deletions(-)

Approvals:
  Internal Jenkins: Verified
  Alex Behm: Looks good to me, but someone else must approve
  Dan Hecht: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/4985
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I4eb1ad91dc355ea51de261c3a14df0f9d28c879c
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Matthew Jacobs 


[Impala-ASF-CR] IMPALA-3710: Kudu DML should ignore conflicts, pt2

2016-11-08 Thread Internal Jenkins (Code Review)
Internal Jenkins has posted comments on this change.

Change subject: IMPALA-3710: Kudu DML should ignore conflicts, pt2
..


Patch Set 3: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/4985
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4eb1ad91dc355ea51de261c3a14df0f9d28c879c
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: No


  1   2   >