[Impala-ASF-CR] IMPALA-2550: Switch to per-query exec rpc

2017-04-15 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has uploaded a new patch set (#5).

Change subject: IMPALA-2550: Switch to per-query exec rpc
..

IMPALA-2550: Switch to per-query exec rpc

Coordinator:
- FragmentInstanceState -> BackendState, which in turn records
  FragmentInstanceStats

QueryState
- does query-wide setup in a separate thread (which also launches
  the instance exec threads)
- has a query-wide 'prepared' state at which point all static setup
  is done and all FragmentInstanceStates are accessible

Also renamed QueryExecState to ClientRequestState.

Simplified handling of execution status (in FragmentInstanceState):
- status only transmitted via ReportExecStatus rpc
- in particular, it's not returned anymore from the Cancel rpc

FIS: Fixed bugs related to partially-prepared state (in Close() and 
ReleaseThreadToken())

Change-Id: I20769e420711737b6b385c744cef4851cee3facd
---
M be/src/benchmarks/expr-benchmark.cc
M be/src/codegen/codegen-anyval.h
M be/src/common/status.cc
M be/src/common/status.h
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/hdfs-avro-table-writer.cc
M be/src/exec/hdfs-avro-table-writer.h
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/exec/hdfs-table-writer.cc
M be/src/exec/hdfs-table-writer.h
M be/src/exprs/expr-test.cc
M be/src/runtime/CMakeLists.txt
M be/src/runtime/buffered-block-mgr-test.cc
A be/src/runtime/coordinator-backend-state.cc
A be/src/runtime/coordinator-backend-state.h
A be/src/runtime/coordinator-filter-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/data-stream-test.cc
A be/src/runtime/debug-options.cc
A be/src/runtime/debug-options.h
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/mem-tracker.cc
D be/src/runtime/plan-fragment-executor.cc
D be/src/runtime/plan-fragment-executor.h
M be/src/runtime/query-exec-mgr.cc
M be/src/runtime/query-exec-mgr.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/runtime/runtime-filter-bank.cc
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
M be/src/runtime/test-env.cc
M be/src/scheduling/query-schedule.cc
M be/src/scheduling/query-schedule.h
M be/src/service/CMakeLists.txt
M be/src/service/child-query.cc
M be/src/service/child-query.h
R be/src/service/client-request-state.cc
R be/src/service/client-request-state.h
M be/src/service/fe-support.cc
M be/src/service/impala-beeswax-server.cc
M be/src/service/impala-hs2-server.cc
M be/src/service/impala-http-handler.cc
M be/src/service/impala-internal-service.cc
M be/src/service/impala-internal-service.h
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/testutil/desc-tbl-builder.cc
M be/src/testutil/fault-injection-util.h
M be/src/util/error-util-test.cc
M be/src/util/error-util.cc
M be/src/util/error-util.h
M be/src/util/uid-util.h
M common/thrift/ExecStats.thrift
M common/thrift/ImpalaInternalService.thrift
M tests/common/test_result_verifier.py
69 files changed, 3,743 insertions(+), 3,654 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/35/6535/5
-- 
To view, visit http://gerrit.cloudera.org:8080/6535
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I20769e420711737b6b385c744cef4851cee3facd
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Marcel Kornacker 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Michael Ho 


[Impala-ASF-CR] IMPALA-2550: Switch to per-query exec rpc

2017-04-15 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-2550: Switch to per-query exec rpc
..


Patch Set 4:

(50 comments)

http://gerrit.cloudera.org:8080/#/c/6535/4/be/src/exec/data-sink.cc
File be/src/exec/data-sink.cc:

PS4, Line 66:   
> nit: wrong indentation
how so?


http://gerrit.cloudera.org:8080/#/c/6535/4/be/src/runtime/coordinator-backend-state.cc
File be/src/runtime/coordinator-backend-state.cc:

PS4, Line 58: //done_(false) {
> delete
Done


Line 64:   VLOG_QUERY << "BS::Init(): query_id=" << PrintId(query_id_) << " 
state_idx=" << state_idx_ << " host=" << host_;
> long line
i'll remove all of the extra debug logging at the end.


PS4, Line 104: rpc_params->fragment_ctxs.back().fragment.idx
 :  != params.fragment_exec_params.fragment.idx) {
> The assumption is that all fragment instances belonging to the same fragmen
Done


PS4, Line 112: params.fragment_exec_params.fragment.idx
> May as well factor this out as a local variable as it's used above too. The
Done


PS4, Line 170: lock_guard l(lock_);
> Is it necessary to hold the lock while creating BackendConnection below ?
the lock is low-contention, and it protects status_.


PS4, Line 173: client_connect_status;
> Is this not used or did you intend to use this instead of status_ ?
Done


PS4, Line 184:   const string ERR_TEMPLATE =
 :   "ExecQueryFInstances rpc query_id=$0 failed: $1";
> Shouldn't this live in common/thrift/generate_error_codes.py ?
out of scope for this patch.


PS4, Line 332:  lock_guard l2(lock_);
> How heavily contended is this lock ? Will this become a bottleneck if we ha
no, it'll (eventually) be the backend doing the reporting, not the individual 
instances.


PS4, Line 346:   
> nit: indentation is off.
Done


Line 365: if (status_.ok()) {
> Is this block empty now ? It only contains comment, right ?
i left that in there as a reminder to myself to think this through again.


PS4, Line 391: num_remaining_instances_ == 0 || !status_.ok();
> Why not call IsDone() here ?
IsDone() simply returns a bool, not sure what you mean.


PS4, Line 393: //if (*done) stopwatch_.Stop();
> delete
Done


PS4, Line 434:   if (!status.ok()) return false;
> Does this warrant a log message ? It's not entirely clear what's the status
it wasn't there before, and it has the potential to spam the log.


PS4, Line 513:  NULL
> nullptr
Done


PS4, Line 549:
//DCHECK_EQ(fragment_profiles_[instance_state.fragment_idx()].num_instances,
 : //node_exec_summary.exec_stats.size());
> ?
Done


http://gerrit.cloudera.org:8080/#/c/6535/4/be/src/runtime/coordinator-backend-state.h
File be/src/runtime/coordinator-backend-state.h:

PS4, Line 45: .
> Can you please also add a comment stating that multiple fragments run a sam
the hierarchy fragment -> fragment instance is already well-documented 
elsewhere, so i'm not going to point that out again.


PS4, Line 100:   //const vector& instance_stats() const { return 
instance_stats_; }
 :   //MonotonicStopWatch* stopwatch() { return &stopwatch_; }
> delete
Done


PS4, Line 109: //InstanceStats* GetInstanceStats(const TUniqueId& instance_id);
> delete
Done


Line 111: return num_remaining_instances_ == 0 || !status_.ok();
> nit: one line ?
only for getters and setters.


PS4, Line 113:  &error_log_;
> How does this work if multiple callers modify the map at the same time ?
they don't, the comment above stipulates that you need to hold lock_


PS4, Line 114:   //const std::vector& instance_profiles() 
const {
 : //return instance_profiles_;
 :   //}
> delete
Done


PS4, Line 133: #if 0
> delete
Done


PS4, Line 143: /
> delete
Done


PS4, Line 223: //bool done_;
> Still needed ?
Done


http://gerrit.cloudera.org:8080/#/c/6535/4/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

Line 113: torn_down_(false) {}
> Should the constructor also initialize coord_state_idx_ to -1 in case there
Done


PS4, Line 187: FinishBackendStartup
> Would a better name be VerifyBackendStartup(); ?
Not particularly, because this really blocks until startup finished.


PS4, Line 234: new FragmentStats(avg_profile_name, root_profile_name, 
num_instances, obj_pool())
> Should this be added to obj_pool() too ? fragment_stats_ only contains poin
i think that was the intention.


Line 340:   // Set the 'pending_count_' to zero to indicate that for a 
filter with local-only
> long line
Done


Line 397:   //CountingBarrier* exec_complete_barrier = 
exec_complete_barrier_.get();
> Delete ?
Done


PS4, Line 400: [backend_state, this, &debug_options]() {
 :   backend_state->Exec(query_ctx_, debug_options, 
filter_routing_table_,
 : exec_complete_barrier_.get());
 : })
> IMHO, it seems easier to read if this is defined as a function like the exi
wha

[Impala-ASF-CR] [DOCS] Remove references to DSSD storage appliance

2017-04-15 Thread Jim Apple (Code Review)
Jim Apple has abandoned this change.

Change subject: [DOCS] Remove references to DSSD storage appliance
..


Abandoned

-- 
To view, visit http://gerrit.cloudera.org:8080/6412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: abandon
Gerrit-Change-Id: I9e47fae9b098c6be85a769fdbac8ea2b87d6c167
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Michael Brown 


[Impala-ASF-CR] [DOCS] Remove references to DSSD storage appliance

2017-04-15 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: [DOCS] Remove references to DSSD storage appliance
..


Patch Set 1:

> This is more of a marketing-related change, so I'll make it in
 > downstream docs but not upstream.

OK. When you decide to no longer pursue a patch, you can press "Abandon". I'll 
do that.

-- 
To view, visit http://gerrit.cloudera.org:8080/6412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9e47fae9b098c6be85a769fdbac8ea2b87d6c167
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5031: Apply UBSan options to catalogd

2017-04-15 Thread Jim Apple (Code Review)
Jim Apple has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6652

Change subject: IMPALA-5031:  Apply UBSan options to catalogd
..

IMPALA-5031:  Apply UBSan options to catalogd

catalogd runs some C++ code, and that code can have undefined
behavior. This patch sets up the catalogd environment to give stack
traces and it excludes the libstdc++ undefined behavior from causing a
warning to be printed, as described in commit b41c2114f42f92441e96b4.

Change-Id: I520f8620d9b9f516ca5c55da5294de619e8e2d8f
---
M bin/start-catalogd.sh
1 file changed, 3 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/52/6652/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6652
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I520f8620d9b9f516ca5c55da5294de619e8e2d8f
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 


[Impala-ASF-CR] IMPALA-3224: De-Cloudera non-docs JIRA URLs

2017-04-15 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: IMPALA-3224: De-Cloudera non-docs JIRA URLs
..


Patch Set 5:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/6487/2/testdata/bin/create-load-data.sh
File testdata/bin/create-load-data.sh:

Line 38: # TODO: We need a better way of managing how these get set. See 
IMPALA-4346
> nit: single line?
Done


Line 185:   # Force load the dataset if we detect a schema change.
> nit: single line?
Done


Line 345: ${IMPALA_HOME}/testdata/bin/lzo_indexer.sh /test-warehouse
> nit: single line?
Done


Line 418: # the local filesystem requires the x permission (while on HDFS 
it requires the r
> nit: single line?
Done


http://gerrit.cloudera.org:8080/#/c/6487/2/tests/comparison/db_connection.py
File tests/comparison/db_connection.py:

Line 837:   # This can be remove if 
https://github.com/cloudera/impyla/pull/142 is merged.
> nit: single line?
Done


http://gerrit.cloudera.org:8080/#/c/6487/2/tests/comparison/discrepancy_searcher.py
File tests/comparison/discrepancy_searcher.py:

Line 193: elif isinstance(test_val, float) \
> nit: single line?
Done


Line 198: comparison_result.test_row = test_row
> nit: single line?
Done


http://gerrit.cloudera.org:8080/#/c/6487/2/tests/custom_cluster/test_kudu_not_available.py
File tests/custom_cluster/test_kudu_not_available.py:

Line 47:   self.assert_failure("SELECT * FROM tinytable", cursor)
> nit: single line?
Done


http://gerrit.cloudera.org:8080/#/c/6487/2/tests/stress/concurrent_select.py
File tests/stress/concurrent_select.py:

Line 904: if "memory limit exceeded" in caught_msg or \
> nit: single line?
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6487
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I28ea06e89341de234f9005fdc72a2e43f0ab8182
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Lars Volker 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3224: De-Cloudera non-docs JIRA URLs

2017-04-15 Thread Jim Apple (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6487

to look at the new patch set (#5).

Change subject: IMPALA-3224: De-Cloudera non-docs JIRA URLs
..

IMPALA-3224: De-Cloudera non-docs JIRA URLs

John Russell is planning to fix the URLS in docs in a separate commit.

Fixed using:

(git ls-files | xargs replace \
'https://issues.cloudera.org/browse/IMPALA' 'IMPALA' --) && \
git checkout HEAD docs

Change-Id: I28ea06e89341de234f9005fdc72a2e43f0ab8182
---
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M shell/shell_output.py
M testdata/bin/compute-table-stats.sh
M testdata/bin/create-load-data.sh
M testdata/bin/load-test-warehouse-snapshot.sh
M testdata/bin/setup-hdfs-env.sh
M tests/comparison/db_connection.py
M tests/comparison/discrepancy_searcher.py
M tests/comparison/query_generator.py
M tests/custom_cluster/test_kudu_not_available.py
M tests/stress/concurrent_select.py
11 files changed, 24 insertions(+), 36 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/87/6487/5
-- 
To view, visit http://gerrit.cloudera.org:8080/6487
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I28ea06e89341de234f9005fdc72a2e43f0ab8182
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Lars Volker 


[Impala-ASF-CR] IMPALA-3742: Partitions and sort INSERTs for Kudu tables

2017-04-15 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-3742: Partitions and sort INSERTs for Kudu tables
..


Patch Set 4:

> Perf results from running on the 10 node cluster:
 > 
 > For smaller queries that we were able to handle previously, there's
 > a regression of about 10% in overall query running time (all
 > averaged over 3 runs):
 > 240.13s with the patch vs. 224.58s previously for 200m rows
 > inserted
 > 472.97s with the patch vs. 433.05s previously for 400m rows
 > inserted
 > That's unfortunate, but can be improved in the future, e.g. by
 > codegen-ing the partition function.
 > 
 > But, we can now handle significantly larger inserts - I was seeing
 > timeouts regularly at > 400m rows previously, but with the patch
 > I've tested up to 6b row inserts without any timeouts.

Those are good results, and the perf regression is fairly negligible, so let's 
not worry about that for now.

-- 
To view, visit http://gerrit.cloudera.org:8080/6559
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I84ce0032a1b10958fdf31faef225372c5c38fdc4
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No