[Impala-ASF-CR] IMPALA-4856: Port data stream service to KRPC

2017-09-25 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8023 )

Change subject: IMPALA-4856: Port data stream service to KRPC
..


Patch Set 1:

(14 comments)

http://gerrit.cloudera.org:8080/#/c/8023/1/be/src/runtime/krpc-data-stream-mgr.h
File be/src/runtime/krpc-data-stream-mgr.h:

http://gerrit.cloudera.org:8080/#/c/8023/1/be/src/runtime/krpc-data-stream-mgr.h@110
PS1, Line 110: sent
nit: received


http://gerrit.cloudera.org:8080/#/c/8023/1/be/src/runtime/krpc-data-stream-mgr.h@110
PS1, Line 110: no TransmitData() RPCs will successfully deliver their
 : /// payload.
Why would there be a TransmitData() RPC if EndDataStream() has already been 
sent? Doesn't the sender send it only if it knows all its TransmitData() RPCs 
have been processed?


http://gerrit.cloudera.org:8080/#/c/8023/1/be/src/runtime/krpc-data-stream-mgr.h@133
PS1, Line 133: /// period expires.
As per Tim's comment above, I would also reference IMPALA-3990 as a TODO here 
for later fixing.


http://gerrit.cloudera.org:8080/#/c/8023/1/be/src/runtime/krpc-data-stream-mgr.h@159
PS1, Line 159: Consider tracking, on the sender, whether a batch has been 
successfully sent or
 : ///   not. That's enough state to realise that a receiver has 
failed (rather than not
 : ///   prepared yet), and the data stream mgr can use that to 
fail an RPC fast, rather than
 : ///   having the closed-stream list.
It would be nice to have a JIRA for this and reference it here.


http://gerrit.cloudera.org:8080/#/c/8023/1/be/src/runtime/krpc-data-stream-mgr.h@353
PS1, Line 353: waiting_senders
This is a little confusing to follow in the .cc file, since when I see 
"waiting_senders", I expect it to be a set of some unique identifiers for a 
Sender ID.

Although this is unique to a specific sender, it would be a little clearer to 
call this 'waiting_senders_ctxs'.

Let me know what you think.


http://gerrit.cloudera.org:8080/#/c/8023/1/be/src/runtime/krpc-data-stream-mgr.h@356
PS1, Line 356: closed_senders
Similarly, we could call this 'closed_senders_ctxs'.


http://gerrit.cloudera.org:8080/#/c/8023/1/be/src/runtime/krpc-data-stream-mgr.cc
File be/src/runtime/krpc-data-stream-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/8023/1/be/src/runtime/krpc-data-stream-mgr.cc@168
PS1, Line 168:   num_senders_waiting_->Increment(1);
 :   total_senders_waited_->Increment(1);
 :   RecvrId recvr_id = make_pair(fragment_instance_id, 
request->dest_node_id());
 :   auto payload =
 :   make_unique(proto_batch, context, 
request, response);
 :   
early_senders_map_[recvr_id].waiting_senders.push_back(move(payload));
I'm wondering if it makes sense to add simple inline functions that encapsulate 
this functionality; for the sake of readability.

Eg: AddEarlyWaitingSender(), AddEarlyClosedSender()


http://gerrit.cloudera.org:8080/#/c/8023/1/be/src/runtime/krpc-data-stream-mgr.cc@213
PS1, Line 213: If no receiver found, but not in the closed stream cache
nit: If no receiver is found, and the receiver is not in the closed stream 
cache as well, we still need...


http://gerrit.cloudera.org:8080/#/c/8023/1/be/src/runtime/krpc-data-stream-mgr.cc@218
PS1, Line 218:   RecvrId recvr_id = make_pair(fragment_instance_id, 
dest_node_id);
 :   auto payload = make_unique(context, 
request, response);
 :   
early_senders_map_[recvr_id].closed_senders.emplace_back(move(payload));
 :   num_senders_waiting_->Increment(1);
 :   total_senders_waited_->Increment(1);
AddEarlyClosedSender() as per comment above, if you agree.


http://gerrit.cloudera.org:8080/#/c/8023/1/be/src/runtime/krpc-data-stream-mgr.cc@227
PS1, Line 227:   if (LIKELY(recvr != nullptr)) 
recvr->RemoveSender(request->sender_id());
 :   Status::OK().ToProto(response->mutable_status());
 :   context->RespondSuccess();
This may need some modification based on the recent commit for IMPALA-5199:
https://github.com/apache/incubator-impala/commit/5119ced50c0e0c4001621c9d4da598c187bdb580


http://gerrit.cloudera.org:8080/#/c/8023/1/be/src/runtime/krpc-data-stream-recvr.cc
File be/src/runtime/krpc-data-stream-recvr.cc:

http://gerrit.cloudera.org:8080/#/c/8023/1/be/src/runtime/krpc-data-stream-recvr.cc@75
PS1, Line 75: ("new data")
I'm having some trouble understanding what this means. Could you please clarify?


http://gerrit.cloudera.org:8080/#/c/8023/1/be/src/runtime/krpc-data-stream-recvr.cc@271
PS1, Line 271: data_arrival_cv_.notify_all();
Shouldn't this notify be done while holding the lock_ ?


http://gerrit.cloudera.org:8080/#/c/8023/1/be/src/runtime/krpc-data-stream-recvr.cc@285
PS1, Line 285:   while (!blocked_senders_.empty()) {
nit: Add comment: Respond to blocked senders' RPCs


http://gerrit.cloudera.org:808

[Impala-ASF-CR] IMPALA-3360: Codegen inserting into runtime filters

2017-09-25 Thread Thomas Tauber-Marshall (Code Review)
Hello Michael Ho, Matthew Jacobs, Jim Apple, Philip Zeyliger, Tim Armstrong, 
Dan Hecht, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8029

to look at the new patch set (#7).

Change subject: IMPALA-3360: Codegen inserting into runtime filters
..

IMPALA-3360: Codegen inserting into runtime filters

This patch codegens PhjBuilder::InsertRuntimeFilters() and
FilterContext::Insert().

This allows us to unroll the loop over all the filters in
PhjBuilder::ProcessBuildBatch(), eliminate the branch on type that
happens in RawValue::GetHashValue(), and eliminate the AVX check
that happens in BloomFilter::Insert().

Testing:
- Ran existing runtime filter tests.
- Ran perf tests locally (all avg. over three runs):
  - Four way self join on tpch_parquet.lineitem. Should be a good case
for this as there's several large hash join build sides that will
benefit from the codegen. Total query running time improved ~7%
(from 16.07s to 14.91s).
  - Single join of tpch_parquet.lineitem against a selectively
filtered tpch_parquet.lineitem. Should be a bad case for this
patch, as the build side of the join is very small. Total query
running time regressed by about ~2% (from 0.73s to 0.75s) due to
an increase in codegen time (from 295ms to 309ms for the fragment
containing the hash join).

Change-Id: I79cf23ad92dadaab996a50a2ca07ef9ebe8639bb
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/partitioned-hash-join-builder-ir.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-builder.h
M be/src/util/CMakeLists.txt
A be/src/util/bloom-filter-ir.cc
M be/src/util/bloom-filter.h
10 files changed, 311 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/29/8029/7
--
To view, visit http://gerrit.cloudera.org:8080/8029
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I79cf23ad92dadaab996a50a2ca07ef9ebe8639bb
Gerrit-Change-Number: 8029
Gerrit-PatchSet: 7
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-3360: Codegen inserting into runtime filters

2017-09-25 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8029 )

Change subject: IMPALA-3360: Codegen inserting into runtime filters
..


Patch Set 7: Code-Review+2

GVO failed because of WARN_UNUSED_RESULT not being followed. Added a 
RETURN_IF_ERROR


--
To view, visit http://gerrit.cloudera.org:8080/8029
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I79cf23ad92dadaab996a50a2ca07ef9ebe8639bb
Gerrit-Change-Number: 8029
Gerrit-PatchSet: 7
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 25 Sep 2017 15:26:04 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3360: Codegen inserting into runtime filters

2017-09-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8029 )

Change subject: IMPALA-3360: Codegen inserting into runtime filters
..


Patch Set 7:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1264/


--
To view, visit http://gerrit.cloudera.org:8080/8029
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I79cf23ad92dadaab996a50a2ca07ef9ebe8639bb
Gerrit-Change-Number: 8029
Gerrit-PatchSet: 7
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 25 Sep 2017 15:38:29 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5870: Improve runtime profile for partial sort

2017-09-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8123 )

Change subject: IMPALA-5870: Improve runtime profile for partial sort
..


Patch Set 2:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1265/


--
To view, visit http://gerrit.cloudera.org:8080/8123
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2b15af78d8299db8edc44ff820c85db1cbe0be1b
Gerrit-Change-Number: 8123
Gerrit-PatchSet: 2
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 25 Sep 2017 15:42:20 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5376: Implement all TPCDS test cases or alternates for Impala.

2017-09-25 Thread Michael Brown (Code Review)
Michael Brown has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8102 )

Change subject: IMPALA-5376: Implement all TPCDS test cases or alternates for 
Impala.
..


Patch Set 6:

(1 comment)

> Job result output for PS 8 on jenkins.impala.io:
 > https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/329/consoleText

This job is failing, yet I can't see why. Any ideas?

http://gerrit.cloudera.org:8080/#/c/8102/6/testdata/workloads/tpcds/queries/tpcds-q47.test
File testdata/workloads/tpcds/queries/tpcds-q47.test:

http://gerrit.cloudera.org:8080/#/c/8102/6/testdata/workloads/tpcds/queries/tpcds-q47.test@10
PS6, Line 10: truncate(avg(sum(ss_sales_price)) over
> Done
Is there a bug tracking the flap?

Also, if there's a known flap, I don't think we want to knowingly commit flaky 
tests. Seems like the flapping tests should be skipped.



--
To view, visit http://gerrit.cloudera.org:8080/8102
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6e284888600a7a69d1f23fcb7dac21cbb13b7d66
Gerrit-Change-Number: 8102
Gerrit-PatchSet: 6
Gerrit-Owner: Tim Wood 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Matthew Mulder 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Wood 
Gerrit-Comment-Date: Mon, 25 Sep 2017 17:12:34 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5310: Add COMPUTE STATS TABLESAMPLE.

2017-09-25 Thread Alex Behm (Code Review)
Alex Behm has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8136


Change subject: IMPALA-5310: Add COMPUTE STATS TABLESAMPLE.
..

IMPALA-5310: Add COMPUTE STATS TABLESAMPLE.

Adds the TABLESAMPLE clause for COMPUTE STATS.

Syntax:
COMPUTE STATS  TABLESAMPLE SYSTEM() [REPEATABLE()]

Computes and replaces the table-level row count and total file size,
as well as all table-level column statistics. Existing partition-level
row counts are not modified.
The TABLESAMPLE clause can be used to limit the scanned data volume to
a desired percentage. When sampling, the unmodified results of the
COMPUTE STATS queries are sent to the CatalogServer. There, the stats
are extrapolated before storing them into the HMS so as not to confuse
other engines like Hive/SparkSQL which may rely on the shared HMS
fields being accurate.

Limitations
- TABLESAMPLE requires --enable_stats_extrapolation=true
- TABLESAMPLE is not supported for COMPUTE INCREMENTAL STATS

Changes to EXPLAIN
The stored statistics from the HMS are more clearly displayed under
a 'stored statistics' section. Example:

00:SCAN HDFS [functional.alltypes, RANDOM]
   partitions=24/24 files=24 size=478.45KB
   stored statistics:
 table: rows=7300 size=478.45KB
 partitions: 24/24 rows=7300
 columns: all

Testing:
- added new functional tests
- core/hdfs run passed

Change-Id: I7f3e72471ac563adada4a4156033a85852b7c8b7
---
M be/src/exec/catalog-op-executor.cc
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/main/java/org/apache/impala/analysis/TableSampleClause.java
M fe/src/main/java/org/apache/impala/catalog/ColumnStats.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java
M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/hive/executor/UdfExecutorTest.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test
M testdata/workloads/functional-planner/queries/PlannerTest/max-row-size.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/partition-pruning.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/sort-expr-materialization.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/spillable-buffer-sizing.test
M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test
M 
testdata/workloads/functional-query/queries/QueryTest/alter-table-set-column-stats.test
M 
testdata/workloads/functional-query/queries/QueryTest/compute-stats-incremental.test
A 
testdata/workloads/functional-query/queries/QueryTest/compute-stats-tablesample.test
M testdata/workloads/functional-query/queries/QueryTest/compute-stats.test
M testdata/workloads/functional-query/queries/QueryTest/explain-level2.test
M testdata/workloads/functional-query/queries/QueryTest/explain-level3.test
M 
testdata/workloads/functional-query/queries/QueryTest/hbase-compute-stats-incremental.test
M testdata/workloads/functional-query/queries/QueryTest/hbase-compute-stats.test
M testdata/workloads/functional-query/queries/QueryTest/show-stats.test
M testdata/workloads/functional-query/queries/QueryTest/stats-extrapolation.test
M testdata/workloads/functional-query/queries/QueryTest/truncate-table.test
M tests/custom_cluster/test_stats_extrapolation.py
M tests/metadata/test_explain.py
38 files changed, 1,882 insertions(+), 1,193 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/36/8136/1
--
To view, visit http://gerrit.cloudera.org:8080/8136
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I7f3e72471ac563adada4a4156033a85852b7c8b7
Gerrit-Change-Number: 8136
Gerrit-PatchSet: 1
Gerrit-Owner: Alex Behm 


[Impala-ASF-CR] IMPALA-5376: Implement all TPCDS test cases or alternates for Impala.

2017-09-25 Thread Tim Wood (Code Review)
Tim Wood has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8102 )

Change subject: IMPALA-5376: Implement all TPCDS test cases or alternates for 
Impala.
..


Patch Set 8:

(1 comment)

> Patch Set 6:
>
> (1 comment)
>
> > Job result output for PS 8 on jenkins.impala.io:
>  > https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/329/consoleText
> 
> This job is failing, yet I can't see why. Any ideas?

I've determined that tests/run-tests.py exits non-zero when an expected-fail 
test is executed.  This seems like wrong behavior from the framework.  Instead, 
it should exit nonzero unless every test case behaves as expected.  So when 
bugs behind xfail tests are fixed, those tests will pass, and fail the test 
run.  That's a reminder to fix the expectation for that case in the test.

I hestitate to include a framework fix like this in this ticket, because it 
will change the behavior of many tests.  I can open a ticket on this though.

http://gerrit.cloudera.org:8080/#/c/8102/6/testdata/workloads/tpcds/queries/tpcds-q47.test
File testdata/workloads/tpcds/queries/tpcds-q47.test:

http://gerrit.cloudera.org:8080/#/c/8102/6/testdata/workloads/tpcds/queries/tpcds-q47.test@10
PS6, Line 10: truncate(avg(sum(ss_sales_price)) over
> Is there a bug tracking the flap?
Done - Yes, IMPALA-5956. I've tagged all flapping tests with it to this point, 
and skipped them.



--
To view, visit http://gerrit.cloudera.org:8080/8102
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6e284888600a7a69d1f23fcb7dac21cbb13b7d66
Gerrit-Change-Number: 8102
Gerrit-PatchSet: 8
Gerrit-Owner: Tim Wood 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Matthew Mulder 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Wood 
Gerrit-Comment-Date: Mon, 25 Sep 2017 17:55:10 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects

2017-09-25 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7731 )

Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects
..


Patch Set 5:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/7731/6/be/src/catalog/catalog-server.h
File be/src/catalog/catalog-server.h:

http://gerrit.cloudera.org:8080/#/c/7731/6/be/src/catalog/catalog-server.h@131
PS6, Line 131: Also determines any
 :   /// deletions of catalog objects by looking at the
 :   /// difference of the last set of topic entry keys that were 
sent and the current set
 :   /// of topic entry keys
> Remove?
Done


http://gerrit.cloudera.org:8080/#/c/7731/6/be/src/catalog/catalog-server.cc
File be/src/catalog/catalog-server.cc:

http://gerrit.cloudera.org:8080/#/c/7731/6/be/src/catalog/catalog-server.cc@308
PS6, Line 308: bool topi
> should that be DCHECK_GT?  The old conditional to continue on was <= ...
Correct. Done


http://gerrit.cloudera.org:8080/#/c/7731/6/be/src/catalog/catalog.h
File be/src/catalog/catalog.h:

http://gerrit.cloudera.org:8080/#/c/7731/6/be/src/catalog/catalog.h@59
PS6, Line 59: Gets all Catalog objects and the metadata that is applicable for 
the given request.
:   /// Always returns all object names that exist in the Catalog, 
but allows for extended
:   /// metadata for objects that were modified after the specified 
version.
> is that accurate?  from our discussion, my understanding was that only obje
Yeah, wording is not clear. Changed the comment.


http://gerrit.cloudera.org:8080/#/c/7731/5/be/src/statestore/statestore.cc
File be/src/statestore/statestore.cc:

http://gerrit.cloudera.org:8080/#/c/7731/5/be/src/statestore/statestore.cc@177
PS5, Line 177: entry_it->second.SetDeleted(true);
 : entry_it->second.SetVersion(last_version_);
> What I mean is that topics that are transient do not get values in deleted
Hm, it's still not clear to me why a transient topic cannot get values in 
deleted topics, in the sense that I see the topic being transient and how to 
process deletions two orthogonal issues and I don't understand why one should 
influence the other.


http://gerrit.cloudera.org:8080/#/c/7731/6/be/src/statestore/statestore.cc
File be/src/statestore/statestore.cc:

http://gerrit.cloudera.org:8080/#/c/7731/6/be/src/statestore/statestore.cc@170
PS6, Line 170: topic_update_log_.erase(version);
> also, does that still make sense (now that we no longer do the SetValue(NUL
Done


http://gerrit.cloudera.org:8080/#/c/7731/6/common/thrift/CatalogInternalService.thrift
File common/thrift/CatalogInternalService.thrift:

http://gerrit.cloudera.org:8080/#/c/7731/6/common/thrift/CatalogInternalService.thrift@32
PS6, Line 32:
> does it include objects with changes in version > or >= the from_version?
Updated the comment (>). Done


http://gerrit.cloudera.org:8080/#/c/7731/6/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/7731/6/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@310
PS6, Line 310:  Views, Databases, and
 :* Functions)
> datasources, cachepools, sentry stuff ... I think the list is too big, may
Done



--
To view, visit http://gerrit.cloudera.org:8080/7731
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c
Gerrit-Change-Number: 7731
Gerrit-PatchSet: 5
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Mon, 25 Sep 2017 17:55:41 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects

2017-09-25 Thread Dimitris Tsirogiannis (Code Review)
Hello Bharath Vissapragada, Alex Behm, Vuk Ercegovac, Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/7731

to look at the new patch set (#7).

Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects
..

IMPALA-5538: Use explicit catalog versions for deleted objects

This commit changes the way deletions are handled in the catalog and
disseminated to the impalad nodes through the statestore. Previously,
deletions of catalog objects were not explicitly annotated with the
catalog version in which the deletion occured and the impalads were
using the max catalog version in a catalog update in order to decide
whether a deletion should be applied to the local catalog cache or not.
This works correctly under the assumption that
all the changes that occurred in the catalog between an update's min
and max catalog version are included in that update, i.e. no gaps or
missing updates. With the upcoming fix for IMPALA-5058, that constraint
will be relaxed, thus allowing for gaps in the catalog updates.

To avoid breaking the existing behavior, this patch introduced the
following changes:
* Deletions in the catalog are explicitly recorded in a log with
the catalog version in which they occurred. As before, added and deleted
catalog objects are sent to the statestore.
* Topic entries associated with deleted catalog objects have non-empty
values (besided keys) that contain minimal object metadata including the
catalog version.
* Statestore is no longer using the existence or not of
topic entry values in order to identify deleted topic entries. Deleted
topic entries should be explicitly marked as such by the statestore
subscribers that produce them.
* Statestore subscribers now use the 'deleted' flag to determine if a
topic entry corresponds to a deleted item.
* Impalads use the deleted objects' catalog versions when updating the
local catalog cache from a catalog update and not the update's maximum
catalog version.

Testing:
- No new tests were added as these paths are already exercised by
existing tests.
- Run all core tests.

Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c
---
M be/src/catalog/catalog-server.cc
M be/src/catalog/catalog-server.h
M be/src/catalog/catalog-util.cc
M be/src/catalog/catalog-util.h
M be/src/catalog/catalog.cc
M be/src/catalog/catalog.h
M be/src/scheduling/admission-controller.cc
M be/src/scheduling/admission-controller.h
M be/src/scheduling/scheduler-test-util.cc
M be/src/scheduling/scheduler.cc
M be/src/service/impala-server.cc
M be/src/statestore/statestore.cc
M be/src/statestore/statestore.h
M common/thrift/CatalogInternalService.thrift
M common/thrift/StatestoreService.thrift
M fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/JniCatalog.java
M tests/statestore/test_statestore.py
21 files changed, 480 insertions(+), 441 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/7731/7
--
To view, visit http://gerrit.cloudera.org:8080/7731
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c
Gerrit-Change-Number: 7731
Gerrit-PatchSet: 7
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-5376: Implement all TPCDS test cases or alternates for Impala.

2017-09-25 Thread David Knupp (Code Review)
David Knupp has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8102 )

Change subject: IMPALA-5376: Implement all TPCDS test cases or alternates for 
Impala.
..


Patch Set 6:

Tim, can you provide more context for the observation that "run-tests.py exits 
non-zero when an expected-fail test is executed?" We commonly have XFAIL'ed 
tests that do not cause jobs to fail, e.g. from 
https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/320/consoleFull

  03:28:03 XFAIL 
failure/test_failpoints.py::TestFailpoints::()::test_failpoints[table_format: 
hbase/none | exec_option: {'batch_size': 0, 'num_nodes': 0, 
'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | mt_dop: 4 | 
location: PREPARE | action: FAIL | query: select 1 from alltypessmall a join 
alltypessmall b on a.id = b.id]
  03:28:03   reason: MT_DOP not supported.
  03:28:03 XFAIL 
query_test/test_insert_behaviour.py::TestInsertBehaviour::()::test_insert_inherit_acls
  03:28:03   reason: [NOTRUN] Fails intermittently on test clusters
  03:28:03 === 1774 tests deselected by "-m 'execute_serially'" 
===
  03:28:03 = 178 passed, 8 skipped, 1774 deselected, 4 xfailed in 1966.41 
seconds =


--
To view, visit http://gerrit.cloudera.org:8080/8102
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6e284888600a7a69d1f23fcb7dac21cbb13b7d66
Gerrit-Change-Number: 8102
Gerrit-PatchSet: 6
Gerrit-Owner: Tim Wood 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Matthew Mulder 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Wood 
Gerrit-Comment-Date: Mon, 25 Sep 2017 18:08:36 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5376: Implement all TPCDS test cases or alternates for Impala.

2017-09-25 Thread Tim Wood (Code Review)
Tim Wood has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8102 )

Change subject: IMPALA-5376: Implement all TPCDS test cases or alternates for 
Impala.
..


Patch Set 8:

> Patch Set 8:
>
> (1 comment)
>
> > Patch Set 6:
> >
> > (1 comment)
> >
> > > Job result output for PS 8 on jenkins.impala.io:
> >  > https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/329/consoleText
> >
> > This job is failing, yet I can't see why. Any ideas?
>
> I've determined that tests/run-tests.py exits non-zero when an expected-fail 
> test is executed.  This seems like wrong behavior from the framework.  
> Instead, it should exit nonzero unless every test case behaves as expected.  
> So when bugs behind xfail tests are fixed, those tests will pass, and fail 
> the test run.  That's a reminder to fix the expectation for that case in the 
> test.
>
> I hestitate to include a framework fix like this in this ticket, because it 
> will change the behavior of many tests.  I can open a ticket on this though.

Filed IMPALA-5979 on the test driver behavior.


--
To view, visit http://gerrit.cloudera.org:8080/8102
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6e284888600a7a69d1f23fcb7dac21cbb13b7d66
Gerrit-Change-Number: 8102
Gerrit-PatchSet: 8
Gerrit-Owner: Tim Wood 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Matthew Mulder 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Wood 
Gerrit-Comment-Date: Mon, 25 Sep 2017 18:17:05 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5376: Implement all TPCDS test cases or alternates for Impala.

2017-09-25 Thread Tim Wood (Code Review)
Tim Wood has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8102 )

Change subject: IMPALA-5376: Implement all TPCDS test cases or alternates for 
Impala.
..


Patch Set 8:

> Patch Set 6:
>
> Tim, can you provide more context for the observation that "run-tests.py 
> exits non-zero when an expected-fail test is executed?" We commonly have 
> XFAIL'ed tests that do not cause jobs to fail, e.g. from 
> https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/320/consoleFull
>
>   03:28:03 XFAIL 
> failure/test_failpoints.py::TestFailpoints::()::test_failpoints[table_format: 
> hbase/none | exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | mt_dop: 4 | 
> location: PREPARE | action: FAIL | query: select 1 from alltypessmall a join 
> alltypessmall b on a.id = b.id]
>   03:28:03   reason: MT_DOP not supported.
>   03:28:03 XFAIL 
> query_test/test_insert_behaviour.py::TestInsertBehaviour::()::test_insert_inherit_acls
>   03:28:03   reason: [NOTRUN] Fails intermittently on test clusters
>   03:28:03 === 1774 tests deselected by "-m 'execute_serially'" 
> ===
>   03:28:03 = 178 passed, 8 skipped, 1774 deselected, 4 xfailed in 1966.41 
> seconds =

I filed that based on my observation of running a single xfail test and getting 
a nonzero exit.  Now I see the driver exits nonzero for a passing test as well, 
so time to poke around with the debugger. :|


--
To view, visit http://gerrit.cloudera.org:8080/8102
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6e284888600a7a69d1f23fcb7dac21cbb13b7d66
Gerrit-Change-Number: 8102
Gerrit-PatchSet: 8
Gerrit-Owner: Tim Wood 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Matthew Mulder 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Wood 
Gerrit-Comment-Date: Mon, 25 Sep 2017 18:24:28 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects

2017-09-25 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7731 )

Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects
..


Patch Set 7: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7731/5/be/src/statestore/statestore.cc
File be/src/statestore/statestore.cc:

http://gerrit.cloudera.org:8080/#/c/7731/5/be/src/statestore/statestore.cc@177
PS5, Line 177: Statestore::Subscriber::Subscriber(const SubscriberId& 
subscriber_id,
 : const TUniqueId& registration_id, const TNe
> Hm, it's still not clear to me why a transient topic cannot get values in d
okay. it wasn't clear to me if we can just take the value that was produced for 
an add/update and use it for delete, and expect that value to be the one that 
the subscriber expects for delete. But i guess assuming that is okay.



--
To view, visit http://gerrit.cloudera.org:8080/7731
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c
Gerrit-Change-Number: 7731
Gerrit-PatchSet: 7
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Mon, 25 Sep 2017 18:50:12 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3360: Codegen inserting into runtime filters

2017-09-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8029 )

Change subject: IMPALA-3360: Codegen inserting into runtime filters
..


Patch Set 7: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/8029
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I79cf23ad92dadaab996a50a2ca07ef9ebe8639bb
Gerrit-Change-Number: 8029
Gerrit-PatchSet: 7
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 25 Sep 2017 19:37:18 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3360: Codegen inserting into runtime filters

2017-09-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/8029 )

Change subject: IMPALA-3360: Codegen inserting into runtime filters
..

IMPALA-3360: Codegen inserting into runtime filters

This patch codegens PhjBuilder::InsertRuntimeFilters() and
FilterContext::Insert().

This allows us to unroll the loop over all the filters in
PhjBuilder::ProcessBuildBatch(), eliminate the branch on type that
happens in RawValue::GetHashValue(), and eliminate the AVX check
that happens in BloomFilter::Insert().

Testing:
- Ran existing runtime filter tests.
- Ran perf tests locally (all avg. over three runs):
  - Four way self join on tpch_parquet.lineitem. Should be a good case
for this as there's several large hash join build sides that will
benefit from the codegen. Total query running time improved ~7%
(from 16.07s to 14.91s).
  - Single join of tpch_parquet.lineitem against a selectively
filtered tpch_parquet.lineitem. Should be a bad case for this
patch, as the build side of the join is very small. Total query
running time regressed by about ~2% (from 0.73s to 0.75s) due to
an increase in codegen time (from 295ms to 309ms for the fragment
containing the hash join).

Change-Id: I79cf23ad92dadaab996a50a2ca07ef9ebe8639bb
Reviewed-on: http://gerrit.cloudera.org:8080/8029
Reviewed-by: Thomas Tauber-Marshall 
Tested-by: Impala Public Jenkins
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/partitioned-hash-join-builder-ir.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-builder.h
M be/src/util/CMakeLists.txt
A be/src/util/bloom-filter-ir.cc
M be/src/util/bloom-filter.h
10 files changed, 311 insertions(+), 9 deletions(-)

Approvals:
  Thomas Tauber-Marshall: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/8029
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I79cf23ad92dadaab996a50a2ca07ef9ebe8639bb
Gerrit-Change-Number: 8029
Gerrit-PatchSet: 8
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5870: Improve runtime profile for partial sort

2017-09-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8123 )

Change subject: IMPALA-5870: Improve runtime profile for partial sort
..


Patch Set 2: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/1265/


--
To view, visit http://gerrit.cloudera.org:8080/8123
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2b15af78d8299db8edc44ff820c85db1cbe0be1b
Gerrit-Change-Number: 8123
Gerrit-PatchSet: 2
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 25 Sep 2017 19:45:22 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5975: Work around broken beeline clients

2017-09-25 Thread Zach Amsden (Code Review)
Zach Amsden has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8132 )

Change subject: IMPALA-5975: Work around broken beeline clients
..


Patch Set 1:

I tested this with a local data load, but we need to kick off a GVO run as well.


--
To view, visit http://gerrit.cloudera.org:8080/8132
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id8b9f3dde4445513f1f389785a002c6cc6b3dada
Gerrit-Change-Number: 8132
Gerrit-PatchSet: 1
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Zach Amsden 
Gerrit-Comment-Date: Mon, 25 Sep 2017 19:56:32 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5376: Implement all TPCDS test cases or alternates for Impala.

2017-09-25 Thread Tim Wood (Code Review)
Tim Wood has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8102 )

Change subject: IMPALA-5376: Implement all TPCDS test cases or alternates for 
Impala.
..


Patch Set 8:

> Patch Set 8:
>
> > Patch Set 6:
> >
> > Tim, can you provide more context for the observation that "run-tests.py 
> > exits non-zero when an expected-fail test is executed?" We commonly have 
> > XFAIL'ed tests that do not cause jobs to fail, e.g. from 
> > https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/320/consoleFull
> >
> >   03:28:03 XFAIL 
> > failure/test_failpoints.py::TestFailpoints::()::test_failpoints[table_format:
> >  hbase/none | exec_option: {'batch_size': 0, 'num_nodes': 0, 
> > 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
> > 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | mt_dop: 4 | 
> > location: PREPARE | action: FAIL | query: select 1 from alltypessmall a 
> > join alltypessmall b on a.id = b.id]
> >   03:28:03   reason: MT_DOP not supported.
> >   03:28:03 XFAIL 
> > query_test/test_insert_behaviour.py::TestInsertBehaviour::()::test_insert_inherit_acls
> >   03:28:03   reason: [NOTRUN] Fails intermittently on test clusters
> >   03:28:03 === 1774 tests deselected by "-m 'execute_serially'" 
> > ===
> >   03:28:03 = 178 passed, 8 skipped, 1774 deselected, 4 xfailed in 
> > 1966.41 seconds =
>
> I filed that based on my observation of running a single xfail test and 
> getting a nonzero exit.  Now I see the driver exits nonzero for a passing 
> test as well, so time to poke around with the debugger. :|

Bug IMPALA-5979 has details of that investigation & one from Mike.


--
To view, visit http://gerrit.cloudera.org:8080/8102
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6e284888600a7a69d1f23fcb7dac21cbb13b7d66
Gerrit-Change-Number: 8102
Gerrit-PatchSet: 8
Gerrit-Owner: Tim Wood 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Matthew Mulder 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Tim Wood 
Gerrit-Comment-Date: Mon, 25 Sep 2017 19:58:38 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5844: use a MemPool for expr local allocations

2017-09-25 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8025 )

Change subject: IMPALA-5844: use a MemPool for expr local allocations
..


Patch Set 7:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/8025/7/be/src/exprs/agg-fn-evaluator.h
File be/src/exprs/agg-fn-evaluator.h:

PS7:
Generally, I think it's hard to understand the two different pools when it 
comes to aggregates since we don't document which pool different allocations 
come from. Maybe there is a simple high level rule that can be described?


http://gerrit.cloudera.org:8080/#/c/8025/7/be/src/exprs/agg-fn-evaluator.h@156
PS7, Line 156:  static void Serialize(const std::vector& 
evals, Tuple* dst);
 :   static void GetValue(const std::vector& 
evals, Tuple* src, Tuple* dst);
 :   static void Finalize(const std::vector& 
evals, Tuple* src, Tuple* dst);
which mem pool can dst varlen data live in? i think we need to document this 
all more clearly to make sense of it.


http://gerrit.cloudera.org:8080/#/c/8025/7/be/src/exprs/agg-fn-evaluator.h@162
PS7, Line 162: Serialize(), Finalize() and GetValue()
I guess that implies the answer to my previous question, but I think those 
should be documented explicitly.


http://gerrit.cloudera.org:8080/#/c/8025/7/be/src/exprs/agg-fn-evaluator.h@220
PS7, Line 220: Tuple* dst
which mem pool?


http://gerrit.cloudera.org:8080/#/c/8025/7/be/src/exprs/agg-fn-evaluator.h@225
PS7, Line 225: Note that StringVal result is
 :   /// from local allocation (which will be freed in the next 
QueryMaintenance()) so it
 :   /// needs to be copied out if it needs to survive beyond 
QueryMaintenance() (e.g. if
 :   /// 'dst' lives in a row batch).
maybe this should be reworded now that you're introducing a mechanism that 
allows the "local allocation" to be against the right mem-pool.



--
To view, visit http://gerrit.cloudera.org:8080/8025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4ba5a7542ed90a49a4b5586c040b5985a7d45b61
Gerrit-Change-Number: 8025
Gerrit-PatchSet: 7
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 25 Sep 2017 22:33:10 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2550: Switch to per-query exec rpc

2017-09-25 Thread Tianyi Wang (Code Review)
Tianyi Wang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/6535 )

Change subject: IMPALA-2550: Switch to per-query exec rpc
..


Patch Set 15:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6535/15/be/src/runtime/coordinator-backend-state.cc
File be/src/runtime/coordinator-backend-state.cc:

http://gerrit.cloudera.org:8080/#/c/6535/15/be/src/runtime/coordinator-backend-state.cc@363
PS15, Line 363:   local_params.__set_bloom_filter(rpc_params->bloom_filter);
Sorry for commenting on an old CR. I'm reading this part of code and have 
several questions:
1. Why does rpc_params need to be a shared_ptr? I think it is only referenced 
within the scope of this function.
2. Why is a copy made here?
2. Why is __set_bloom_filter called, given that local_params is 
copy-constructed from rpc_params?



--
To view, visit http://gerrit.cloudera.org:8080/6535
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I20769e420711737b6b385c744cef4851cee3facd
Gerrit-Change-Number: 6535
Gerrit-PatchSet: 15
Gerrit-Owner: Marcel Kornacker 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 25 Sep 2017 22:33:49 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2550: Switch to per-query exec rpc

2017-09-25 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/6535 )

Change subject: IMPALA-2550: Switch to per-query exec rpc
..


Patch Set 15:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6535/15/be/src/runtime/coordinator-backend-state.cc
File be/src/runtime/coordinator-backend-state.cc:

http://gerrit.cloudera.org:8080/#/c/6535/15/be/src/runtime/coordinator-backend-state.cc@363
PS15, Line 363:   local_params.__set_bloom_filter(rpc_params->bloom_filter);
> Sorry for commenting on an old CR. I'm reading this part of code and have s
Note that this change only moved the code around (into coordinator.cc), it 
didn't introduce this code. You'll have to go further back in time to see the 
change that introduced that code, and maybe there is insight there.

That said, I agree it doesn't seem like shared_ptr makes sense here (and 
usually, we should avoid use of shared_ptr in impala because it makes reasoning 
about lifetimes difficult).

>From the comment in Coordinator::UpdateFilter() it sounds like there was some 
>idea to parallelize the RPCs, which maybe is where the thought of shared_ptr 
>came in. I'm not sure if the code ever did that, though. Going back further in 
>git history may answer it.

// Make a 'master' copy that will be shared by all concurrent delivery RPC 
attempts.



--
To view, visit http://gerrit.cloudera.org:8080/6535
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I20769e420711737b6b385c744cef4851cee3facd
Gerrit-Change-Number: 6535
Gerrit-PatchSet: 15
Gerrit-Owner: Marcel Kornacker 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 25 Sep 2017 22:43:20 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2550: Switch to per-query exec rpc

2017-09-25 Thread Tianyi Wang (Code Review)
Tianyi Wang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/6535 )

Change subject: IMPALA-2550: Switch to per-query exec rpc
..


Patch Set 15:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6535/15/be/src/runtime/coordinator-backend-state.cc
File be/src/runtime/coordinator-backend-state.cc:

http://gerrit.cloudera.org:8080/#/c/6535/15/be/src/runtime/coordinator-backend-state.cc@363
PS15, Line 363:   local_params.__set_bloom_filter(rpc_params->bloom_filter);
> Note that this change only moved the code around (into coordinator.cc), it
Thanks for the reply! The attempt to parallelize makes sense. I'm working on 
surrounding code and may remove the unnecessary copy/shared_ptr along the way.



--
To view, visit http://gerrit.cloudera.org:8080/6535
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I20769e420711737b6b385c744cef4851cee3facd
Gerrit-Change-Number: 6535
Gerrit-PatchSet: 15
Gerrit-Owner: Marcel Kornacker 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 25 Sep 2017 22:59:33 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5525 Extend TestScannersFuzzing to test uncompressed parquet

2017-09-25 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8056 )

Change subject: IMPALA-5525 Extend TestScannersFuzzing to test uncompressed 
parquet
..


Patch Set 4:

(4 comments)

I like the overall approach. I have a few small naming/style issues, but I 
think this is getting closer.

http://gerrit.cloudera.org:8080/#/c/8056/4/tests/query_test/test_scanners_fuzz.py
File tests/query_test/test_scanners_fuzz.py:

http://gerrit.cloudera.org:8080/#/c/8056/4/tests/query_test/test_scanners_fuzz.py@101
PS4, Line 101: """ Clone an existing parquet table with codec as none in the
 : unique database. This cloned table is passed to 
run_fuzz_test
 : which clones the table and corrupts the table. The test 
later
 : checks that there is no crash while performing SQL 
queries on
 : a corrupt table.
 : """
I think this comment should focus on why this test is different from the 
others. For example, it should explain that the parquet tables in the default 
schema are always compressed. So, in order to test uncompressed parquet, we 
need to create a new source table. I think you can skip the last two sentences.


http://gerrit.cloudera.org:8080/#/c/8056/4/tests/query_test/test_scanners_fuzz.py@111
PS4, Line 111: db_name = unique_database
I would prefer to emphasize that the source and destination are the 
unique_database. To make that clearer, I think I would get rid of this variable 
and just use unique_database directly in each location.


http://gerrit.cloudera.org:8080/#/c/8056/4/tests/query_test/test_scanners_fuzz.py@117
PS4, Line 117: functional_parquet.alltypes
Can we extend this to do fuzzing on decimal_tbl as well? I was thinking this 
could be a loop that runs fuzzing over a list of tables (that happens to have 
two entries).


http://gerrit.cloudera.org:8080/#/c/8056/4/tests/query_test/test_scanners_fuzz.py@118
PS4, Line 118: .format(fq_tbl_name))
This indentation is a bit awkward. I don't think .format should be on its own 
line. One way to get around this is to use only 4 space indentation for the 
second line (" select"...) and then put the .format on that line.



--
To view, visit http://gerrit.cloudera.org:8080/8056
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I760de7203a51cf82b16016fa8043cadc7c8325bc
Gerrit-Change-Number: 8056
Gerrit-PatchSet: 4
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Pranay Singh
Gerrit-Comment-Date: Mon, 25 Sep 2017 23:25:30 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5307: part 1: don't transfer disk I/O buffers out of parquet

2017-09-25 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8085 )

Change subject: IMPALA-5307: part 1: don't transfer disk I/O buffers out of 
parquet
..


Patch Set 5:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/8085/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8085/5//COMMIT_MSG@7
PS5, Line 7: IMPALA-5307
Can you add a line at the bottom what the other part(s) would look like?


http://gerrit.cloudera.org:8080/#/c/8085/5//COMMIT_MSG@56
PS5, Line 56: 
+++---++-++++-+---+
Nit: You could make the second column smaller to make this more readable, and 
add a bottom delimiter line to indicate it was truncated on purposed and not by 
mistake.


http://gerrit.cloudera.org:8080/#/c/8085/5/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8085/5/be/src/exec/hdfs-parquet-scanner.cc@245
PS5, Line 245:   context_->ReleaseCompletedResources(nullptr, true);
I think it's best to change the whole file at once, or only change occurrences 
where necessary. This looks like it may be left from a previous patchset.


http://gerrit.cloudera.org:8080/#/c/8085/5/be/src/exec/parquet-column-readers.h
File be/src/exec/parquet-column-readers.h:

http://gerrit.cloudera.org:8080/#/c/8085/5/be/src/exec/parquet-column-readers.h@476
PS5, Line 476:   Status AllocateUncompressedDataPage(
Should we call this "AllocateUncompressedDataBuffer"? Otherwise it sounds to me 
like it'll only be needed for uncompressed pages.


http://gerrit.cloudera.org:8080/#/c/8085/5/be/src/exec/parquet-column-readers.h@477
PS5, Line 477:   int64_t size, const std::string& desc, uint8_t** buffer);
Maybe err_desc, err_detail, or detail? "desc" reminds me of descriptors.


http://gerrit.cloudera.org:8080/#/c/8085/5/be/src/exec/parquet-column-readers.h@485
PS5, Line 485: IsStringType
This does not say "VarLenStringType" but above in a comment you refer to 
var-len data. Can you clarify one of them?


http://gerrit.cloudera.org:8080/#/c/8085/5/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

http://gerrit.cloudera.org:8080/#/c/8085/5/be/src/exec/parquet-column-readers.cc@1075
PS5, Line 1075:   uncompressed_size, "uncompressed variable-length 
data", ©_buffer));
DCHECK(copy_buffer != nullptr); And maybe initialize it to nullptr, so that 
it's explicit what the allocation will do.



--
To view, visit http://gerrit.cloudera.org:8080/8085
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I767c1e2dabde7d5bd7a4d5c1ec6d14801b8260d2
Gerrit-Change-Number: 8085
Gerrit-PatchSet: 5
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Lars Volker 
Gerrit-Comment-Date: Mon, 25 Sep 2017 23:30:35 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2550: Switch to per-query exec rpc

2017-09-25 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/6535 )

Change subject: IMPALA-2550: Switch to per-query exec rpc
..


Patch Set 15:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6535/15/be/src/runtime/coordinator-backend-state.cc
File be/src/runtime/coordinator-backend-state.cc:

http://gerrit.cloudera.org:8080/#/c/6535/15/be/src/runtime/coordinator-backend-state.cc@363
PS15, Line 363:   local_params.__set_bloom_filter(rpc_params->bloom_filter);
> Thanks for the reply! The attempt to parallelize makes sense. I'm working o
Cleanup is definitely welcome.



--
To view, visit http://gerrit.cloudera.org:8080/6535
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I20769e420711737b6b385c744cef4851cee3facd
Gerrit-Change-Number: 6535
Gerrit-PatchSet: 15
Gerrit-Owner: Marcel Kornacker 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 25 Sep 2017 23:34:31 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5844: use a MemPool for expr result allocations

2017-09-25 Thread Tim Armstrong (Code Review)
Hello Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8025

to look at the new patch set (#8).

Change subject: IMPALA-5844: use a MemPool for expr result allocations
..

IMPALA-5844: use a MemPool for expr result allocations

"local" allocations containing expression results (either intermediate
or final results) have the following properties:
* They are usually small allocations
* They can be made frequently (e.g. every function call)
* They are owned and managed by the Impala runtime
* They are freed in bulk at various points in query execution.

A MemPool (i.e. bump allocator) is the right mechanism to manage
allocations with the above properties. Before this patch
FunctionContext's used a FreePool + vector of allocations to emulate the
above behaviour. This patch switches to using a MemPool to bring these
allocations in line with the rest of the codebase.

The steps required to do this conversion.
* Use a MemPool for FunctionContext local allocations.
* Identify appropriate MemPools for all of the local allocations from
  function contexts so that the memory lifetime is correct.
* Various cleanup and documentation of existing MemPools.
* Replaces calls to FreeLocalAllocations() with calls to
  MemPool::Clear()

More involved surgery was required in a few places:
* Made the Sorter own its comparator, exprs and MemPool.
* Remove FunctionContextImpl::ReallocateLocal() and just have
  StringFunctions::Replace() do the doubling itself to avoid
  the need for a special interface. Worst-case this doubles
  the memory requirements for Replace() since n / 2 + n / 4
  + n / 8 +  bytes of memory could be wasted instead of recycled
  for an n-byte output string.
* Provide a way redirect agg fn Serialize()/Finalize() allocations
  to come directly from the output RowBatch's MemPool. This is
  also potentially applicable to other places where we currently
  copy out strings from local allocations, e.g.
  AnalyticEvalNode::AddResultTuple() and Tuple::MaterializeExprs().
* --stress_free_pool_alloc was changed to instead intercept at the
  FunctionContext layer so that it retains the old behaviour even
  though allocations do not all come from FreePools.

The "local" allocation concept was not exposed directly in udf.h so this
patch also renames them to better reflect that they're used for expr
results.

Testing:
* ran exhaustive and ASAN

Change-Id: I4ba5a7542ed90a49a4b5586c040b5985a7d45b61
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/common/global-flags.cc
M be/src/exec/analytic-eval-node.cc
M be/src/exec/analytic-eval-node.h
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/exchange-node.cc
M be/src/exec/exchange-node.h
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/hash-table-test.cc
M be/src/exec/hash-table.cc
M be/src/exec/hash-table.h
M be/src/exec/hbase-table-sink.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-mt.cc
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scan-node.h
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/kudu-scanner.cc
M be/src/exec/kudu-scanner.h
M be/src/exec/kudu-table-sink.cc
M be/src/exec/nested-loop-join-node.cc
M be/src/exec/partial-sort-node.cc
M be/src/exec/partial-sort-node.h
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-aggregation-node.h
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-builder.h
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/partitioned-hash-join-node.h
M be/src/exec/plan-root-sink.cc
M be/src/exec/scanner-context.cc
M be/src/exec/scanner-context.h
M be/src/exec/sort-node.cc
M be/src/exec/sort-node.h
M be/src/exec/topn-node.cc
M be/src/exec/topn-node.h
M be/src/exec/union-node.cc
M be/src/exec/unnest-node.cc
M be/src/exprs/agg-fn-evaluator.cc
M be/src/exprs/agg-fn-evaluator.h
M be/src/exprs/case-expr.cc
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/expr-test.cc
M be/src/exprs/hive-udf-call.cc
M be/src/exprs/scalar-expr-evaluator.cc
M be/src/exprs/scalar-expr-evaluator.h
M be/src/exprs/scalar-fn-call.cc
M be/src/exprs/string-functions-ir.cc
M be/src/runtime/CMakeLists.txt
M be/src/runtime/data-stream-sender.cc
M be/src/runtime/data-stream-test.cc
M be/src/runtime/descriptors.cc
D be/src/runtime/free-pool.cc
M be/src/runtime/free-pool.h
M be/src/runtime/sorted-run-merger.cc
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
M be/src/runtime/tuple.cc
M be/src/service/fe-support.cc
M be/src/udf/udf-internal.h
M be/src/udf/udf-test-harness.cc
M be/src/udf/udf.cc
M be/src/udf/udf.h
M be/src/util/tuple-row-compare.cc
M be/src/util/tuple-row-compare.h
M testdata/workloads/functional-query/queries/QueryTest/alloc-fai

[Impala-ASF-CR] IMPALA-5844: use a MemPool for expr result allocations

2017-09-25 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8025 )

Change subject: IMPALA-5844: use a MemPool for expr result allocations
..


Patch Set 7:

(5 comments)

I think I addressed all the comments - I was in the process of renaming and 
updating comments so some of the comments commented on changed substantially.

http://gerrit.cloudera.org:8080/#/c/8025/6/be/src/exec/exec-node.h
File be/src/exec/exec-node.h:

http://gerrit.cloudera.org:8080/#/c/8025/6/be/src/exec/exec-node.h@324
PS6, Line 324: expr-managed
> Okay, keeping this specific to exprs, at least for now is fine, so that we
Done


http://gerrit.cloudera.org:8080/#/c/8025/6/be/src/exec/exec-node.h@333
PS6, Line 333: expr_tmp_mem_pool_
> let's go with expr_results_mem_pool_. And then I think we should explain it
Done. I started doing the rename and ended up realising that we could go the 
whole way and remove the "Local" terminology entirely because it is not exposed 
in udf.h


http://gerrit.cloudera.org:8080/#/c/8025/7/be/src/exprs/agg-fn-evaluator.h
File be/src/exprs/agg-fn-evaluator.h:

http://gerrit.cloudera.org:8080/#/c/8025/7/be/src/exprs/agg-fn-evaluator.h@156
PS7, Line 156:  static void Serialize(const std::vector& 
evals, Tuple* dst);
 :   static void GetValue(const std::vector& 
evals, Tuple* src, Tuple* dst);
 :   static void Finalize(const std::vector& 
evals, Tuple* src, Tuple* dst);
> which mem pool can dst varlen data live in? i think we need to document thi
I documented each of the functions in this header and tweaked the description 
in udf.h, which was vague.


http://gerrit.cloudera.org:8080/#/c/8025/7/be/src/exprs/agg-fn-evaluator.h@220
PS7, Line 220: Tuple* dst
> which mem pool?
Done


http://gerrit.cloudera.org:8080/#/c/8025/7/be/src/exprs/agg-fn-evaluator.h@225
PS7, Line 225: Note that StringVal result is
 :   /// from local allocation (which will be freed in the next 
QueryMaintenance()) so it
 :   /// needs to be copied out if it needs to survive beyond 
QueryMaintenance() (e.g. if
 :   /// 'dst' lives in a row batch).
> maybe this should be reworded now that you're introducing a mechanism that
Done



--
To view, visit http://gerrit.cloudera.org:8080/8025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4ba5a7542ed90a49a4b5586c040b5985a7d45b61
Gerrit-Change-Number: 8025
Gerrit-PatchSet: 7
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 25 Sep 2017 23:52:39 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5844: use a MemPool for expr result allocations

2017-09-25 Thread Tim Armstrong (Code Review)
Hello Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8025

to look at the new patch set (#9).

Change subject: IMPALA-5844: use a MemPool for expr result allocations
..

IMPALA-5844: use a MemPool for expr result allocations

"local" allocations containing expression results (either intermediate
or final results) have the following properties:
* They are usually small allocations
* They can be made frequently (e.g. every function call)
* They are owned and managed by the Impala runtime
* They are freed in bulk at various points in query execution.

A MemPool (i.e. bump allocator) is the right mechanism to manage
allocations with the above properties. Before this patch
FunctionContext's used a FreePool + vector of allocations to emulate the
above behaviour. This patch switches to using a MemPool to bring these
allocations in line with the rest of the codebase.

The steps required to do this conversion.
* Use a MemPool for FunctionContext local allocations.
* Identify appropriate MemPools for all of the local allocations from
  function contexts so that the memory lifetime is correct.
* Various cleanup and documentation of existing MemPools.
* Replaces calls to FreeLocalAllocations() with calls to
  MemPool::Clear()

More involved surgery was required in a few places:
* Made the Sorter own its comparator, exprs and MemPool.
* Remove FunctionContextImpl::ReallocateLocal() and just have
  StringFunctions::Replace() do the doubling itself to avoid
  the need for a special interface. Worst-case this doubles
  the memory requirements for Replace() since n / 2 + n / 4
  + n / 8 +  bytes of memory could be wasted instead of recycled
  for an n-byte output string.
* Provide a way redirect agg fn Serialize()/Finalize() allocations
  to come directly from the output RowBatch's MemPool. This is
  also potentially applicable to other places where we currently
  copy out strings from local allocations, e.g.
  AnalyticEvalNode::AddResultTuple() and Tuple::MaterializeExprs().
* --stress_free_pool_alloc was changed to instead intercept at the
  FunctionContext layer so that it retains the old behaviour even
  though allocations do not all come from FreePools.

The "local" allocation concept was not exposed directly in udf.h so this
patch also renames them to better reflect that they're used for expr
results.

Testing:
* ran exhaustive and ASAN

Change-Id: I4ba5a7542ed90a49a4b5586c040b5985a7d45b61
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/common/global-flags.cc
M be/src/exec/analytic-eval-node.cc
M be/src/exec/analytic-eval-node.h
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/exchange-node.cc
M be/src/exec/exchange-node.h
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/hash-table-test.cc
M be/src/exec/hash-table.cc
M be/src/exec/hash-table.h
M be/src/exec/hbase-table-sink.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-mt.cc
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scan-node.h
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/kudu-scanner.cc
M be/src/exec/kudu-scanner.h
M be/src/exec/kudu-table-sink.cc
M be/src/exec/nested-loop-join-node.cc
M be/src/exec/partial-sort-node.cc
M be/src/exec/partial-sort-node.h
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-aggregation-node.h
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-builder.h
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/partitioned-hash-join-node.h
M be/src/exec/plan-root-sink.cc
M be/src/exec/scanner-context.cc
M be/src/exec/scanner-context.h
M be/src/exec/sort-node.cc
M be/src/exec/sort-node.h
M be/src/exec/topn-node.cc
M be/src/exec/topn-node.h
M be/src/exec/union-node.cc
M be/src/exec/unnest-node.cc
M be/src/exprs/agg-fn-evaluator.cc
M be/src/exprs/agg-fn-evaluator.h
M be/src/exprs/case-expr.cc
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/expr-test.cc
M be/src/exprs/hive-udf-call.cc
M be/src/exprs/scalar-expr-evaluator.cc
M be/src/exprs/scalar-expr-evaluator.h
M be/src/exprs/scalar-fn-call.cc
M be/src/exprs/string-functions-ir.cc
M be/src/runtime/CMakeLists.txt
M be/src/runtime/data-stream-sender.cc
M be/src/runtime/data-stream-test.cc
M be/src/runtime/descriptors.cc
D be/src/runtime/free-pool.cc
M be/src/runtime/free-pool.h
M be/src/runtime/sorted-run-merger.cc
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
M be/src/runtime/tuple.cc
M be/src/service/fe-support.cc
M be/src/udf/udf-internal.h
M be/src/udf/udf-test-harness.cc
M be/src/udf/udf.cc
M be/src/udf/udf.h
M be/src/util/tuple-row-compare.cc
M be/src/util/tuple-row-compare.h
M testdata/workloads/functional-query/queries/QueryTest/alloc-fai

[Impala-ASF-CR] IMPALA-5844: use a MemPool for expr result allocations

2017-09-25 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8025 )

Change subject: IMPALA-5844: use a MemPool for expr result allocations
..


Patch Set 9:

Rebased


--
To view, visit http://gerrit.cloudera.org:8080/8025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4ba5a7542ed90a49a4b5586c040b5985a7d45b61
Gerrit-Change-Number: 8025
Gerrit-PatchSet: 9
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 25 Sep 2017 23:54:23 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5908: Allow SET to unset modified query options.

2017-09-25 Thread Philip Zeyliger (Code Review)
Hello Bharath Vissapragada, Matthew Jacobs, Tim Armstrong, Dan Hecht, Impala 
Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8070

to look at the new patch set (#10).

Change subject: IMPALA-5908: Allow SET to unset modified query options.
..

IMPALA-5908: Allow SET to unset modified query options.

The query 'SET =""' will now unset an option within the session,
reverting it to its default state.

This change became necessary when "SET" started returning an empty
string for unset options which don't have a default. The test
infrastructure (impala_test_suite.py) resets options to what it thinks
is its defaults, and, when this broke, some ASAN builds started to fail,
presumably due to a timing issue with how we re-use connections between
tests.

Previously, SessionState copied over the default options from the server
when the session was created and then mutated that. To support unsetting
options at the session layer, this change keeps a pointer to the default
server settings, keeps separately the mutations, and overlays the
options each time they're requested. Similarly, for configuration
overlays that happen per-query, the overlay is now done explicitly,
because empty per-query overlay values (key=..., value="") now have no effect.

Because "set key=''" is ambiguous between "set to the empty string" and
"unset", it's now impossible to set to the empty string, at the session
layer, an option that is configured at a previous layer. In practice,
this is just debug_action and request_pool. debug_action is essentially
an internal tool. For request_pool, this means that setting the default
request_pool via impalad command line is now a bad idea, as it can't
be cleared at a per-session level. For request_pool, the correct
course of action for users is to use placement rules, and to have a
default placement rule.

Testing:
* Added a simple test that triggered this side-effect without this code.
  Specifically, "impala-python infra/python/env/bin/py.test 
tests/metadata/test_set.py -s"
  with the modified set.test triggers.
* Amended tests/custom_cluster/test_admission_controller.py; it was
  useful for testing these code paths.
* Added cases to query-options-test to check behavior for both
  defaulted and non-defaulted values.
* Added a custom cluster test that checks that overlays are
  working against
* Ran an ASAN build where this was triggering previously.

Change-Id: Ia8c383e68064f839cb5000118901dff77b4e5cb9
---
M be/src/service/client-request-state.cc
M be/src/service/impala-beeswax-server.cc
M be/src/service/impala-hs2-server.cc
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M testdata/workloads/functional-query/queries/QueryTest/set.test
M tests/common/impala_test_suite.py
M tests/custom_cluster/test_admission_controller.py
A tests/custom_cluster/test_set_and_unset.py
M tests/hs2/hs2_test_suite.py
14 files changed, 266 insertions(+), 26 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/70/8070/10
--
To view, visit http://gerrit.cloudera.org:8080/8070
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia8c383e68064f839cb5000118901dff77b4e5cb9
Gerrit-Change-Number: 8070
Gerrit-PatchSet: 10
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5908: Allow SET to unset modified query options.

2017-09-25 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8070 )

Change subject: IMPALA-5908: Allow SET to unset modified query options.
..


Patch Set 10:

(8 comments)

I filed https://issues.apache.org/jira/browse/IMPALA-5981 for the documentation.

http://gerrit.cloudera.org:8080/#/c/8070/9//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8070/9//COMMIT_MSG@29
PS9, Line 29: hat is configured at a previous layer. In practice,
: this is just debug_action and request_pool. debug_actio
> okay
I did some analysis on cluster configurations in the wild, and very few 
clusters had request_pool set in default_query_options.


http://gerrit.cloudera.org:8080/#/c/8070/9//COMMIT_MSG@31
PS9, Line 31: an internal tool. For request_pool, this means that setting the 
default
> what do you mean by "before"? what was the before behavior you are comparin
I removed this; too ambiguous. I was trying to refer to the behavior of SET 
before and after this change.


http://gerrit.cloudera.org:8080/#/c/8070/9/be/src/service/impala-server.h
File be/src/service/impala-server.h:

http://gerrit.cloudera.org:8080/#/c/8070/9/be/src/service/impala-server.h@346
PS9, Line 346: Impala
> let's be super explicit and say ImpalaServer's
Done


http://gerrit.cloudera.org:8080/#/c/8070/9/be/src/service/impala-server.h@352
PS9, Line 352: set_query_options hav
> update
Done


http://gerrit.cloudera.org:8080/#/c/8070/9/be/src/service/query-options.cc
File be/src/service/query-options.cc:

http://gerrit.cloudera.org:8080/#/c/8070/9/be/src/service/query-options.cc@a61
PS9, Line 61:
> we had never set the dst isset bits? how did that not cause trouble before?
By accident. Many settings have isset true at creation time, so they would have 
worked to begin with. The ones that don't are below. I suspect that buffer pool 
settings were not overriding the following configs. Of these, request_pool has 
somewhat special handling (it's set in scheduler.cc). v_cpu_cores is not 
currently used, nor is reservation_request_timeout.

   compression_codec(false)
   request_pool(false)
   v_cpu_cores(false)
   reservation_request_timeout(false)
   buffer_pool_limit(false)
   seq_compression_mode(false)
   mt_dop(false)


http://gerrit.cloudera.org:8080/#/c/8070/9/common/thrift/ImpalaInternalService.thrift
File common/thrift/ImpalaInternalService.thrift:

http://gerrit.cloudera.org:8080/#/c/8070/9/common/thrift/ImpalaInternalService.thrift@75
PS9, Line 75:  to override the preceding layers; these
: // overrides ar
> maybe that could use some rewording (depending on what "defaults" is referr
Done.


http://gerrit.cloudera.org:8080/#/c/8070/9/common/thrift/ImpalaInternalService.thrift@79
PS9, Line 79: >=
: // or in the
> unset will also affect options set by OpenSession() RPC, right?  Let's clar
I tried to clarify this. I'm still not happy with it, but see what you think.

Writing this, I'm further convinced that implementing UNSET foo; (vs SET 
FOO="") is unhelpful: the ambiguity will just come around in other places 
because we parse strings for these values.


http://gerrit.cloudera.org:8080/#/c/8070/9/tests/custom_cluster/test_set_and_unset.py
File tests/custom_cluster/test_set_and_unset.py:

http://gerrit.cloudera.org:8080/#/c/8070/9/tests/custom_cluster/test_set_and_unset.py@52
PS9, Line 52: # Use a "request overlay" to change the option for a specific
: # request within a session. We run a r
> i'm not following that comment given you had just used SET to see the query
I updated the comment. And added another test case. Which failed, so I updated 
some more code... This time, setting the per-request conf overlay to "" would 
cause the option to be reset to its default, and it makes more logical sense 
for that to be a no-op and the session setting to take hold.

I'm still pondering the fragility here and don't have any simple solutions. 
Clearest would be to carry all four layers of state around, and merge them as 
late as possible. You'd still need the mask type because Thrift's "isset" 
doesn't capture everything.



--
To view, visit http://gerrit.cloudera.org:8080/8070
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8c383e68064f839cb5000118901dff77b4e5cb9
Gerrit-Change-Number: 8070
Gerrit-PatchSet: 10
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 26 Sep 2017 04:39:08 +
Gerrit-HasComments: Yes