[Impala-ASF-CR] IMPALA-4135: Thrift threaded server times-out connections during high load

2016-10-12 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-4135: Thrift threaded server times-out connections 
during high load
..


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4519/9/be/src/common/global-flags.cc
File be/src/common/global-flags.cc:

Line 116: "to a thread pool to finish setup, reducing the chances that 
connections time out "
> what's the rationale for making this optional and leaving the old logic in 
While we're confident in this implementation, it's not a bad idea to have a 
get-out-of-jail flag until we've seen it be successful in real-world 
deployments. 

This only controls which kind of object gets created, so keeping it doesn't add 
much complexity over removing it.


-- 
To view, visit http://gerrit.cloudera.org:8080/4519
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie50e728974ef31a9d49132a0b3f7cde2a4f3356d
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Juan Yu 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4135: Thrift threaded server times-out connections during high load

2016-10-12 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4135: Thrift threaded server times-out connections 
during high load
..


Patch Set 9:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/4519/9/be/src/common/global-flags.cc
File be/src/common/global-flags.cc:

Line 116: "to a thread pool to finish setup, reducing the chances that 
connections time out "
what's the rationale for making this optional and leaving the old logic in 
place?


http://gerrit.cloudera.org:8080/#/c/4519/9/be/src/rpc/TAcceptQueueServer.h
File be/src/rpc/TAcceptQueueServer.h:

Line 22: #ifndef _THRIFT_SERVER_TACCEPTQUEUESERVER_H_
change file name to reflect our naming conventions


-- 
To view, visit http://gerrit.cloudera.org:8080/4519
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie50e728974ef31a9d49132a0b3f7cde2a4f3356d
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Juan Yu 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4135: Thrift threaded server times-out connections during high load

2016-10-07 Thread Henry Robinson (Code Review)
Henry Robinson has submitted this change and it was merged.

Change subject: IMPALA-4135: Thrift threaded server times-out connections 
during high load
..


IMPALA-4135: Thrift threaded server times-out connections during high load

During times of high load, Thrift's TThreadedServer can't keep up with the
rate of new socket connections, causing some to time out.

This patch creates TAcceptQueueServer, which is a modified version of
TThreadedServer that calls accept() and then hands the returned TTransport
off to a thread pool to handle setting up the connection. This ensures that
accept() is called as quickly as possible, preventing connections from timing
out while waiting.

It also adds a metric, connection-setup-queue-size, to monitor the number
of accepted connections waiting to be processed.

A flag, --accepted_cnxn_queue_depth, controls the size of the accepted
connection buffer.

Testing:
- New test added to thrift-server-test. (Disabled by default, due to
  high ulimit -n requirement)
- Locally with the repro shown in IMPALA-4135.
- On the 16-node with a real repro query.
- Ran the stress test for a while.

Change-Id: Ie50e728974ef31a9d49132a0b3f7cde2a4f3356d
Reviewed-on: http://gerrit.cloudera.org:8080/4519
Tested-by: Internal Jenkins
Reviewed-by: Henry Robinson 
---
M be/src/common/global-flags.cc
M be/src/rpc/CMakeLists.txt
A be/src/rpc/TAcceptQueueServer.cpp
A be/src/rpc/TAcceptQueueServer.h
M be/src/rpc/thrift-server-test.cc
M be/src/rpc/thrift-server.cc
M be/src/rpc/thrift-server.h
M common/thrift/metrics.json
8 files changed, 545 insertions(+), 14 deletions(-)

Approvals:
  Henry Robinson: Looks good to me, approved
  Internal Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/4519
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ie50e728974ef31a9d49132a0b3f7cde2a4f3356d
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Juan Yu 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-4135: Thrift threaded server times-out connections during high load

2016-10-07 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-4135: Thrift threaded server times-out connections 
during high load
..


Patch Set 8: Code-Review+2

Rebase (plus add a flag and disable the test), carry +2.

-- 
To view, visit http://gerrit.cloudera.org:8080/4519
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie50e728974ef31a9d49132a0b3f7cde2a4f3356d
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Juan Yu 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4135: Thrift threaded server times-out connections during high load

2016-10-07 Thread Internal Jenkins (Code Review)
Internal Jenkins has posted comments on this change.

Change subject: IMPALA-4135: Thrift threaded server times-out connections 
during high load
..


Patch Set 8: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/4519
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie50e728974ef31a9d49132a0b3f7cde2a4f3356d
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Juan Yu 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4135: Thrift threaded server times-out connections during high load

2016-10-07 Thread Henry Robinson (Code Review)
Henry Robinson has uploaded a new patch set (#8).

Change subject: IMPALA-4135: Thrift threaded server times-out connections 
during high load
..

IMPALA-4135: Thrift threaded server times-out connections during high load

During times of high load, Thrift's TThreadedServer can't keep up with the
rate of new socket connections, causing some to time out.

This patch creates TAcceptQueueServer, which is a modified version of
TThreadedServer that calls accept() and then hands the returned TTransport
off to a thread pool to handle setting up the connection. This ensures that
accept() is called as quickly as possible, preventing connections from timing
out while waiting.

It also adds a metric, connection-setup-queue-size, to monitor the number
of accepted connections waiting to be processed.

A flag, --accepted_cnxn_queue_depth, controls the size of the accepted
connection buffer.

Testing:
- New test added to thrift-server-test. (Disabled by default, due to
  high ulimit -n requirement)
- Locally with the repro shown in IMPALA-4135.
- On the 16-node with a real repro query.
- Ran the stress test for a while.

Change-Id: Ie50e728974ef31a9d49132a0b3f7cde2a4f3356d
---
M be/src/common/global-flags.cc
M be/src/rpc/CMakeLists.txt
A be/src/rpc/TAcceptQueueServer.cpp
A be/src/rpc/TAcceptQueueServer.h
M be/src/rpc/thrift-server-test.cc
M be/src/rpc/thrift-server.cc
M be/src/rpc/thrift-server.h
M common/thrift/metrics.json
8 files changed, 545 insertions(+), 14 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/19/4519/8
-- 
To view, visit http://gerrit.cloudera.org:8080/4519
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie50e728974ef31a9d49132a0b3f7cde2a4f3356d
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Juan Yu 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-4135: Thrift threaded server times-out connections during high load

2016-10-05 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change.

Change subject: IMPALA-4135: Thrift threaded server times-out connections 
during high load
..


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4519/6//COMMIT_MSG
Commit Message:

PS6, Line 25: a whil
> a while
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4519
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie50e728974ef31a9d49132a0b3f7cde2a4f3356d
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Juan Yu 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4135: Thrift threaded server times-out connections during high load

2016-10-05 Thread Thomas Tauber-Marshall (Code Review)
Hello Henry Robinson,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4519

to look at the new patch set (#7).

Change subject: IMPALA-4135: Thrift threaded server times-out connections 
during high load
..

IMPALA-4135: Thrift threaded server times-out connections during high load

During times of high load, Thrift's TThreadedServer can't keep up with the
rate of new socket connections, causing some to time out.

This patch creates TAcceptQueueServer, which is a modified version of
TThreadedServer that calls accept() and then hands the returned TTransport
off to a thread pool to handle setting up the connection. This ensures that
accept() is called as quickly as possible, preventing connections from timing
out while waiting.

It also adds a metric, connection-setup-queue-size, to monitor the number
of accepted connections waiting to be processed.

Testing:
- New test added to thrift-server-test.
- Locally with the repro shown in IMPALA-4135.
- On the 16-node with a real repro query.
- Ran the stress test for a while.

Change-Id: Ie50e728974ef31a9d49132a0b3f7cde2a4f3356d
---
M be/src/common/global-flags.cc
M be/src/rpc/CMakeLists.txt
A be/src/rpc/TAcceptQueueServer.cpp
A be/src/rpc/TAcceptQueueServer.h
M be/src/rpc/thrift-server-test.cc
M be/src/rpc/thrift-server.cc
M be/src/rpc/thrift-server.h
M common/thrift/metrics.json
8 files changed, 525 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/19/4519/7
-- 
To view, visit http://gerrit.cloudera.org:8080/4519
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie50e728974ef31a9d49132a0b3f7cde2a4f3356d
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Juan Yu 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-4135: Thrift threaded server times-out connections during high load

2016-10-04 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-4135: Thrift threaded server times-out connections 
during high load
..


Patch Set 6: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4519/6//COMMIT_MSG
Commit Message:

PS6, Line 25: awhile
a while


-- 
To view, visit http://gerrit.cloudera.org:8080/4519
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie50e728974ef31a9d49132a0b3f7cde2a4f3356d
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Juan Yu 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4135: Thrift threaded server times-out connections during high load

2016-10-04 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change.

Change subject: IMPALA-4135: Thrift threaded server times-out connections 
during high load
..


Patch Set 6:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/4519/5//COMMIT_MSG
Commit Message:

PS5, Line 18: It also adds a metric, connection-setup-queue-size, to monitor 
the number
: of accepted connections waiting to be processe
> no longer true.
Done


http://gerrit.cloudera.org:8080/#/c/4519/5/be/src/rpc/TAcceptQueueServer.h
File be/src/rpc/TAcceptQueueServer.h:

PS5, Line 20: // significant changes noted inline below.
: 
> strange line break in comment again
Done


PS5, Line 45: d then imme
> "separate thread, asynchronously. "
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4519
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie50e728974ef31a9d49132a0b3f7cde2a4f3356d
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Juan Yu 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4135: Thrift threaded server times-out connections during high load

2016-10-04 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has uploaded a new patch set (#6).

Change subject: IMPALA-4135: Thrift threaded server times-out connections 
during high load
..

IMPALA-4135: Thrift threaded server times-out connections during high load

During times of high load, Thrift's TThreadedServer can't keep up with the
rate of new socket connections, causing some to time out.

This patch creates TAcceptQueueServer, which is a modified version of
TThreadedServer that calls accept() and then hands the returned TTransport
off to a thread pool to handle setting up the connection. This ensures that
accept() is called as quickly as possible, preventing connections from timing
out while waiting.

It also adds a metric, connection-setup-queue-size, to monitor the number
of accepted connections waiting to be processed.

Testing:
- New test added to thrift-server-test.
- Locally with the repro shown in IMPALA-4135.
- On the 16-node with a real repro query.
- Ran the stress test for awhile.

Change-Id: Ie50e728974ef31a9d49132a0b3f7cde2a4f3356d
---
M be/src/common/global-flags.cc
M be/src/rpc/CMakeLists.txt
A be/src/rpc/TAcceptQueueServer.cpp
A be/src/rpc/TAcceptQueueServer.h
M be/src/rpc/thrift-server-test.cc
M be/src/rpc/thrift-server.cc
M be/src/rpc/thrift-server.h
M common/thrift/metrics.json
8 files changed, 525 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/19/4519/6
-- 
To view, visit http://gerrit.cloudera.org:8080/4519
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie50e728974ef31a9d49132a0b3f7cde2a4f3356d
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Juan Yu 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-4135: Thrift threaded server times-out connections during high load

2016-10-03 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-4135: Thrift threaded server times-out connections 
during high load
..


Patch Set 5:

(3 comments)

This looks good to me. 

One last thing: we should think about how we're going to monitor this. Could 
you add metrics to this server to measure the queue size? You can add a 
"InitMetrics(MetricGroup*)" call to TAcceptQueueServer, called from 
ThriftServer. The most important metric would be a queue length one, 
incremented when the accept thread adds a connection to the queue, and 
decremented on the other side.

http://gerrit.cloudera.org:8080/#/c/4519/5//COMMIT_MSG
Commit Message:

PS5, Line 18: This patch has been tested locally with the repro shown in 
IMPALA-4135, but
: it still needs to be tested in a real cluster.
no longer true.

Add a comment about what testing you did.


http://gerrit.cloudera.org:8080/#/c/4519/5/be/src/rpc/TAcceptQueueServer.h
File be/src/rpc/TAcceptQueueServer.h:

PS5, Line 20: // with the
: // significant changes noted inline below.
strange line break in comment again


PS5, Line 45: thread pool
"separate thread, asynchronously. "


-- 
To view, visit http://gerrit.cloudera.org:8080/4519
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie50e728974ef31a9d49132a0b3f7cde2a4f3356d
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Juan Yu 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4135: Thrift threaded server times-out connections during high load

2016-10-03 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change.

Change subject: IMPALA-4135: Thrift threaded server times-out connections 
during high load
..


Patch Set 5:

The patch passed an exhaustive run on Jenkins:
http://sandbox.jenkins.cloudera.com/job/impala-umbrella-build-and-test/4911/

I have also now, with Mostafa's help, come up with a query that reliably 
reproduces the issue on the 16 node cluster and succeeds on the 16 node with 
this patch deployed:

 select /* +straight_join */ count(*) from
 tpch_300_parquet.orders A join /* +shuffle */  tpch_300_parquet.orders B on 
A.o_orderkey = B.o_orderkey 
 join /* +shuffle */   tpch_300_parquet.orders C on c.o_orderkey = B.o_orderkey
 join /* +shuffle */   tpch_300_parquet.orders D on d.o_orderkey = B.o_orderkey
 join /* +shuffle */   tpch_300_parquet.orders E on e.o_orderkey = B.o_orderkey
 join /* +shuffle */   tpch_300_parquet.orders F on f.o_orderkey = B.o_orderkey
 join /* +shuffle */   tpch_300_parquet.orders G on g.o_orderkey = B.o_orderkey
 join /* +shuffle */   tpch_300_parquet.orders H on h.o_orderkey = B.o_orderkey
 join /* +shuffle */   tpch_300_parquet.orders I on i.o_orderkey = B.o_orderkey
 join /* +shuffle */   tpch_300_parquet.orders J on j.o_orderkey = B.o_orderkey
 join /* +shuffle */   tpch_300_parquet.orders K on k.o_orderkey = B.o_orderkey
 join /* +shuffle */   tpch_300_parquet.orders L on l.o_orderkey = B.o_orderkey
 join /* +shuffle */   tpch_300_parquet.orders M on m.o_orderkey = B.o_orderkey
 join /* +shuffle */   tpch_300_parquet.orders N on n.o_orderkey = B.o_orderkey
 join /* +shuffle */   tpch_300_parquet.orders O on o.o_orderkey = B.o_orderkey
 join /* +shuffle */   tpch_300_parquet.orders P on p.o_orderkey = B.o_orderkey
 join /* +shuffle */   tpch_300_parquet.orders Q on q.o_orderkey = B.o_orderkey
 join /* +shuffle */   tpch_300_parquet.orders R on r.o_orderkey = B.o_orderkey
 join /* +shuffle */   tpch_300_parquet.orders S on s.o_orderkey = B.o_orderkey
 join /* +shuffle */   tpch_300_parquet.orders T on t.o_orderkey = B.o_orderkey
 join /* +shuffle */   tpch_300_parquet.orders U on u.o_orderkey = B.o_orderkey
 join /* +shuffle */   tpch_300_parquet.orders V on v.o_orderkey = B.o_orderkey
 where a.o_orderkey < 10;

Currently working on testing it with Kerberos on.

-- 
To view, visit http://gerrit.cloudera.org:8080/4519
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie50e728974ef31a9d49132a0b3f7cde2a4f3356d
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Juan Yu 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4135: Thrift threaded server times-out connections during high load

2016-09-30 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change.

Change subject: IMPALA-4135: Thrift threaded server times-out connections 
during high load
..


Patch Set 5:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/4519/4/be/src/rpc/TAcceptQueueServer.cpp
File be/src/rpc/TAcceptQueueServer.cpp:

PS4, Line 136: SetupCon
> doAccept() isn't a great name (I know it was from my patch!) because this i
Done


PS4, Line 204: rift code 
> connection_handler_pool, or something similar.
Done


PS4, Line 205: texpr int CONNECTION_SE
> This can be defined in this method as a constexpr.
Done


http://gerrit.cloudera.org:8080/#/c/4519/4/be/src/rpc/TAcceptQueueServer.h
File be/src/rpc/TAcceptQueueServer.h:

PS4, Line 96: New - 
> strange line breaks in this comment.
Done


PS4, Line 97: // up the connection and starting a thread to
> move into .cc file
Done


http://gerrit.cloudera.org:8080/#/c/4519/4/be/src/rpc/thrift-server-test.cc
File be/src/rpc/thrift-server-test.cc:

PS4, Line 173: ManyConcurrent
> Call this ManyConcurrentConnection or something.
Done


Line 175:   // waiting to be accepted. (IMPALA-4135)
> Mention that this does not always fail. How long does it take to run?
It takes about 2 seconds on my machine.


-- 
To view, visit http://gerrit.cloudera.org:8080/4519
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie50e728974ef31a9d49132a0b3f7cde2a4f3356d
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Juan Yu 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4135: Thrift threaded server times-out connections during high load

2016-09-30 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has uploaded a new patch set (#5).

Change subject: IMPALA-4135: Thrift threaded server times-out connections 
during high load
..

IMPALA-4135: Thrift threaded server times-out connections during high load

During times of high load, Thrift's TThreadedServer can't keep up with the
rate of new socket connections, causing some to time out.

This patch creates TAcceptQueueServer, which is a modified version of
TThreadedServer that calls accept() and then hands the returned TTransport
off to a thread pool to handle setting up the connection. This ensures that
accept() is called as quickly as possible, preventing connections from timing
out while waiting.

This patch has been tested locally with the repro shown in IMPALA-4135, but
it still needs to be tested in a real cluster.

Change-Id: Ie50e728974ef31a9d49132a0b3f7cde2a4f3356d
---
M be/src/common/global-flags.cc
M be/src/rpc/CMakeLists.txt
A be/src/rpc/TAcceptQueueServer.cpp
A be/src/rpc/TAcceptQueueServer.h
M be/src/rpc/thrift-server-test.cc
M be/src/rpc/thrift-server.cc
6 files changed, 459 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/19/4519/5
-- 
To view, visit http://gerrit.cloudera.org:8080/4519
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie50e728974ef31a9d49132a0b3f7cde2a4f3356d
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Juan Yu 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-4135: Thrift threaded server times-out connections during high load

2016-09-29 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-4135: Thrift threaded server times-out connections 
during high load
..


Patch Set 4:

(7 comments)

Looking pretty good.

http://gerrit.cloudera.org:8080/#/c/4519/4/be/src/rpc/TAcceptQueueServer.cpp
File be/src/rpc/TAcceptQueueServer.cpp:

PS4, Line 136: doAccept
doAccept() isn't a great name (I know it was from my patch!) because this isn't 
actually doing the work of accept, but instead establishing a connection. 
Perhaps call this SetupConnection() ?

You can use Impala's convention for method capitalisation; again makes it 
easier to see what was added.


PS4, Line 204: acceptPool
connection_handler_pool, or something similar.


PS4, Line 205: ACCEPT_THREAD_POOL_SIZE
This can be defined in this method as a constexpr.


http://gerrit.cloudera.org:8080/#/c/4519/4/be/src/rpc/TAcceptQueueServer.h
File be/src/rpc/TAcceptQueueServer.h:

PS4, Line 96: thread
strange line breaks in this comment.


PS4, Line 97: static const int ACCEPT_THREAD_POOL_SIZE = 1;
move into .cc file


http://gerrit.cloudera.org:8080/#/c/4519/4/be/src/rpc/thrift-server-test.cc
File be/src/rpc/thrift-server-test.cc:

PS4, Line 173: ConnectTimeout
Call this ManyConcurrentConnection or something.


Line 175:   // waiting to be accepted.(IMPALA-4135)
Mention that this does not always fail. How long does it take to run?


-- 
To view, visit http://gerrit.cloudera.org:8080/4519
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie50e728974ef31a9d49132a0b3f7cde2a4f3356d
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Juan Yu 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4135: Thrift threaded server times-out connections during high load

2016-09-29 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change.

Change subject: IMPALA-4135: Thrift threaded server times-out connections 
during high load
..


Patch Set 4:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/4519/3/be/src/rpc/TAcceptQueueServer.cpp
File be/src/rpc/TAcceptQueueServer.cpp:

PS3, Line 204: 
> this is an important enough parameter that I'd make a constant for it, and 
Done


PS3, Line 216: 
> If this returns false it's because the queue is shut down. Better to say th
Done


PS3, Line 218: break;
> set stop_ to true to cleanup?
Done


http://gerrit.cloudera.org:8080/#/c/4519/3/be/src/rpc/TAcceptQueueServer.h
File be/src/rpc/TAcceptQueueServer.h:

Line 46:  *
> add a reference to IMPALA-4135 here.
Done


PS3, Line 105: 
> Is this used anywhere?
Done


http://gerrit.cloudera.org:8080/#/c/4519/3/be/src/rpc/thrift-server-test.cc
File be/src/rpc/thrift-server-test.cc:

Line 174:   // Test that a large number of concurrent connections will all 
succeed and not time out
> Add a comment about what you're testing and the associated JIRA.
Done


PS3, Line 178: _OK(server->Start(
> Does this work in a threaded context - does the test fail if the thread hit
Yes, although the test runs to completion before reporting the error.


PS3, Line 180: ThreadPool pool(
> I'm a bit concerned about failing with ulimit errors, particularly because 
Agreed, this test has a lot of potential for flakiness or other pain. Not sure 
how else to do an automatic test of this issue, though.

As written, it doesn't even actually fail on my machine without the fix, unless 
I add an artificial slowdown.


PS3, Line 173: TEST(ConcurrencyTest, ConnectTimeout) {
 :   // Test that a large number of concurrent connections will all 
succeed and not time out
 :   // waiting to be accepted.(IMPALA-4135)
 :   int port = GetServerPort();
 :   ThriftServer* server = new ThriftServer("DummyServer", 
MakeProcessor(), port);
 :   ASSERT_OK(server->Start());
 : 
 :   ThreadPool pool(
 :   "group", "test", 256, 1, [port](int tid, const 
int64_t& item) {
 :  
> This relies on a running impala internal service - you probably had an Impa
Done


http://gerrit.cloudera.org:8080/#/c/4519/1/be/src/server/TAcceptQueueThreadedServer.cpp
File be/src/server/TAcceptQueueThreadedServer.cpp:

PS1, Line 233: 
> Oops, sorry about that.
No problem.


-- 
To view, visit http://gerrit.cloudera.org:8080/4519
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie50e728974ef31a9d49132a0b3f7cde2a4f3356d
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Juan Yu 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4135: Thrift threaded server times-out connections during high load

2016-09-29 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has uploaded a new patch set (#4).

Change subject: IMPALA-4135: Thrift threaded server times-out connections 
during high load
..

IMPALA-4135: Thrift threaded server times-out connections during high load

During times of high load, Thrift's TThreadedServer can't keep up with the
rate of new socket connections, causing some to time out.

This patch creates TAcceptQueueServer, which is a modified version of
TThreadedServer that calls accept() and then hands the returned TTransport
off to a thread pool to handle setting up the connection. This ensures that
accept() is called as quickly as possible, preventing connections from timing
out while waiting.

This patch has been tested locally with the repro shown in IMPALA-4135, but
it still needs to be tested in a real cluster.

Change-Id: Ie50e728974ef31a9d49132a0b3f7cde2a4f3356d
---
M be/src/common/global-flags.cc
M be/src/rpc/CMakeLists.txt
A be/src/rpc/TAcceptQueueServer.cpp
A be/src/rpc/TAcceptQueueServer.h
M be/src/rpc/thrift-server-test.cc
M be/src/rpc/thrift-server.cc
6 files changed, 457 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/19/4519/4
-- 
To view, visit http://gerrit.cloudera.org:8080/4519
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie50e728974ef31a9d49132a0b3f7cde2a4f3356d
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Juan Yu 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-4135: Thrift threaded server times-out connections during high load

2016-09-29 Thread Juan Yu (Code Review)
Juan Yu has posted comments on this change.

Change subject: IMPALA-4135: Thrift threaded server times-out connections 
during high load
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/4519/3/be/src/rpc/TAcceptQueueServer.cpp
File be/src/rpc/TAcceptQueueServer.cpp:

PS3, Line 218: 
set stop_ to true to cleanup?


http://gerrit.cloudera.org:8080/#/c/4519/1/be/src/server/TAcceptQueueThreadedServer.cpp
File be/src/server/TAcceptQueueThreadedServer.cpp:

PS1, Line 233: if (stop_) {
> Its not in the while loop.
Oops, sorry about that.


-- 
To view, visit http://gerrit.cloudera.org:8080/4519
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie50e728974ef31a9d49132a0b3f7cde2a4f3356d
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Juan Yu 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4135: Thrift threaded server times-out connections during high load

2016-09-29 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-4135: Thrift threaded server times-out connections 
during high load
..


Patch Set 3:

(8 comments)

Some comments before I head off to Strata. I would run clang-format over the 
new thrift files. That way they'll be more readable to Impala developers, while 
keeping the structure and idioms of the thrift code.

http://gerrit.cloudera.org:8080/#/c/4519/3/be/src/rpc/TAcceptQueueServer.cpp
File be/src/rpc/TAcceptQueueServer.cpp:

PS3, Line 204: 1
this is an important enough parameter that I'd make a constant for it, and put 
your comments about only using thread on that. We don't want someone changing 
the number of threads without understanding the thread-safety implications.


PS3, Line 216: acceptPool.Offer returned false.
If this returns false it's because the queue is shut down. Better to say that 
clearly in the error message so users have an idea what this might mean. Also 
you can say that it's unexpected - we never shut down our servers so the queue 
should never be shut down.


http://gerrit.cloudera.org:8080/#/c/4519/3/be/src/rpc/TAcceptQueueServer.h
File be/src/rpc/TAcceptQueueServer.h:

Line 46:  * connections will time out while in the OS accept queue.
add a reference to IMPALA-4135 here.


PS3, Line 105: std::mutex big_lock_;
Is this used anywhere?


http://gerrit.cloudera.org:8080/#/c/4519/3/be/src/rpc/thrift-server-test.cc
File be/src/rpc/thrift-server-test.cc:

Line 174:   ThreadPool pool("group", "test", 256, 1, [](int tid, 
const int64_t& item) {
Add a comment about what you're testing and the associated JIRA.


PS3, Line 178: ASSERT_OK(status);
Does this work in a threaded context - does the test fail if the thread hits an 
ASSERT?


PS3, Line 180: for (int i = 0; i < 1024 * 16; ++i) pool.Offer(i);
I'm a bit concerned about failing with ulimit errors, particularly because both 
client and server should be in the same process. Have you ever seen this repro 
with fewer concurrent connections - say 8K?


PS3, Line 173: TEST(ConcurrencyTest, ConnectTimeout) {
 :   ThreadPool pool("group", "test", 256, 1, [](int 
tid, const int64_t& item) {
 : using Client = ThriftClient;
 : Client* client = new Client("127.0.0.1", 22000, "", 
NULL, false);
 : Status status = client->Open();
 : ASSERT_OK(status);
 :   });
 :   for (int i = 0; i < 1024 * 16; ++i) pool.Offer(i);
 :   pool.DrainAndShutdown();
 : }
This relies on a running impala internal service - you probably had an Impala 
process running when you ran this? That won't be true in our builds. Use the 
same server code that all the other tests use to start a server in this process.


-- 
To view, visit http://gerrit.cloudera.org:8080/4519
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie50e728974ef31a9d49132a0b3f7cde2a4f3356d
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Juan Yu 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4135: Thrift threaded server times-out connections during high load

2016-09-28 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has uploaded a new patch set (#3).

Change subject: IMPALA-4135: Thrift threaded server times-out connections 
during high load
..

IMPALA-4135: Thrift threaded server times-out connections during high load

During times of high load, Thrift's TThreadedServer can't keep up with the
rate of new socket connections, causing some to time out.

This patch creates TAcceptQueueServer, which is a modified version of
TThreadedServer that calls accept() and then hands the returned TTransport
off to a thread pool to handle setting up the connection. This ensures that
accept() is called as quickly as possible, preventing connections from timing
out while waiting.

This patch has been tested locally with the repro shown in IMPALA-4135, but
it still needs to be tested in a real cluster.

Change-Id: Ie50e728974ef31a9d49132a0b3f7cde2a4f3356d
---
M be/src/common/global-flags.cc
M be/src/rpc/CMakeLists.txt
A be/src/rpc/TAcceptQueueServer.cpp
A be/src/rpc/TAcceptQueueServer.h
M be/src/rpc/thrift-server-test.cc
M be/src/rpc/thrift-server.cc
6 files changed, 449 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/19/4519/3
-- 
To view, visit http://gerrit.cloudera.org:8080/4519
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie50e728974ef31a9d49132a0b3f7cde2a4f3356d
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Juan Yu 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-4135: Thrift threaded server times-out connections during high load

2016-09-28 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change.

Change subject: IMPALA-4135: Thrift threaded server times-out connections 
during high load
..


Patch Set 1:

(12 comments)

The main problem with subclassing TThreadedServer is that TThreadedServer::Task 
is defined in the thrift cpp file, so we can't access it. We could just copy 
that part, but between that, the copy of serve() that we have to make, and 
boiler plate around the constructors that would be necessary, subclasses 
doesn't really save any copied code other than init() and stop(), so it seems 
cleaner to me to keep the relationships between the classes simple by just 
copying everything rather than subclassing.

I'm not sure what to do about formatting - currently, TAcceptQueueServer.h/cpp 
are written in Thrift's style (eg. wrapped lines indented 2 spaces rather than 
4), though I may have gotten some of it wrong. Of course, I can switch it to 
Impala style if desired.

For the cluster repro - I've crafted a query that consistently gets the "timed 
out" error on the 16 node cluster without the fix. However, with the fix, after 
running somewhat longer, it fails with a 'memory limit exceeded' error. 
Obviously it would be preferable to have a query that both repros and runs to 
completion on a fixed cluster, so I'm working on that, but it does seem like 
the patch works.

http://gerrit.cloudera.org:8080/#/c/4519/1//COMMIT_MSG
Commit Message:

PS1, Line 18: This patch has been tested locally with the repro shown in 
IMPALA-4135
> Can you add the test, modified if necessary?
Done


http://gerrit.cloudera.org:8080/#/c/4519/1/be/src/rpc/thrift-server.cc
File be/src/rpc/thrift-server.cc:

Line 419:   server_.reset(new TAcceptQueueThreadedServer(processor_, 
server_socket,
> let's make the choice of server configurable by a flag that we can turn off
Done


http://gerrit.cloudera.org:8080/#/c/4519/1/be/src/server/CMakeLists.txt
File be/src/server/CMakeLists.txt:

PS1, Line 24: add_library(ThriftAcceptQueueThreadedServer
: TAcceptQueueThreadedServer.cpp
:   )
: add_dependencies(ThriftAcceptQueueThreadedServer thrift-deps)
> Was there a reason you wanted this as a separate library? I think it's fine
Done


http://gerrit.cloudera.org:8080/#/c/4519/1/be/src/server/TAcceptQueueThreadedServer.cpp
File be/src/server/TAcceptQueueThreadedServer.cpp:

Line 144: inputTransportFactory_->getTransport(client);
> do you see much slowdown in synchronizing around these few lines that creat
Most of this is fast and synchronizing it doesn't affect performance much.


Line 164: shared_ptr thread = 
shared_ptr(threadFactory_->newThread(runnable));
> Is this definitely thread-safe? I think this might be the bottleneck (or ma
The bottleneck is 'thread->start()' below. I'm fairly confident that's thread 
safe, as essentially all it does is create a new boost::thread.


Line 200:   ThreadPool> acceptPool("server-accept", 
"accept-worker", 1, 1,
> How did you arrive at 1 as the default number of worker threads? What happe
I saw the same behavior as you, where using more than 1 thread here was 
actually slower than just 1. I don't know why that's happening (Michael's fix 
for IMPALA-4026 currently under review doesn't seem to fix it), but given that 
just using 1 thread was enough to solve the problem for the repro in the JIRA, 
and given that only using one prevents us from having to worry about thread 
safety, it seems like the best choice, assuming that it also works in a real 
cluster repro. I could also make it configurable.


PS1, Line 211: Offer
> This can return false, right?
Done


PS1, Line 233: if (stop_) {
> shouldn't this be out of the while loop? to guarantee a proper cleanup.
Its not in the while loop.


Line 236:   acceptPool.DrainAndShutdown();
> I'm not sure we ever actually stop the thrift servers, so it might be moot 
Done


PS1, Line 252: stop_ = false
> why set stop_ back to false?
I guess so that it can be restarted? (this was copied from the thrift code, and 
I think its better to leave it as close to the original behavior as possible)


http://gerrit.cloudera.org:8080/#/c/4519/1/be/src/server/TAcceptQueueThreadedServer.h
File be/src/server/TAcceptQueueThreadedServer.h:

PS1, Line 1: // This file was copied from 
apache::thrift::server::TThreadedServer.cpp v0.9.0, with the
   : // significant changes noted inline below.
   : /*
   :  * Licensed to the Apache Software Foundation (ASF) under one
   :  * or more contributor license agreements. See the NOTICE file
   :  * distributed with this work for additional information
   :  * regarding copyright ownership. The ASF licenses this file
   :  * to you under the Apache License, Version 2.0 (the
   :  * "License"); you may not use this file except in compliance
   :  * with the License. You may obta

[Impala-ASF-CR] IMPALA-4135: Thrift threaded server times-out connections during high load

2016-09-28 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has uploaded a new patch set (#2).

Change subject: IMPALA-4135: Thrift threaded server times-out connections 
during high load
..

IMPALA-4135: Thrift threaded server times-out connections during high load

During times of high load, Thrift's TThreadedServer can't keep up with the
rate of new socket connections, causing some to time out.

This patch creates TAcceptQueueServer, which is a modified version of
TThreadedServer that calls accept() and then hands the returned TTransport
off to a thread pool to handle setting up the connection. This ensures that
accept() is called as quickly as possible, preventing connections from timing
out while waiting.

This patch has been tested locally with the repro shown in IMPALA-4135, but
it still needs to be tested in a real cluster.

Change-Id: Ie50e728974ef31a9d49132a0b3f7cde2a4f3356d
---
M be/src/common/global-flags.cc
M be/src/rpc/CMakeLists.txt
A be/src/rpc/TAcceptQueueServer.cpp
A be/src/rpc/TAcceptQueueServer.h
M be/src/rpc/thrift-server-test.cc
M be/src/rpc/thrift-server.cc
6 files changed, 451 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/19/4519/2
-- 
To view, visit http://gerrit.cloudera.org:8080/4519
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie50e728974ef31a9d49132a0b3f7cde2a4f3356d
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Juan Yu 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 


[Impala-ASF-CR] IMPALA-4135: Thrift threaded server times-out connections during high load

2016-09-27 Thread Juan Yu (Code Review)
Juan Yu has posted comments on this change.

Change subject: IMPALA-4135: Thrift threaded server times-out connections 
during high load
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4519/1/be/src/server/TAcceptQueueThreadedServer.cpp
File be/src/server/TAcceptQueueThreadedServer.cpp:

PS1, Line 211: Offer
This can return false, right?
shouldn't we handle the case it returns false?


-- 
To view, visit http://gerrit.cloudera.org:8080/4519
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie50e728974ef31a9d49132a0b3f7cde2a4f3356d
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Juan Yu 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4135: Thrift threaded server times-out connections during high load

2016-09-27 Thread Juan Yu (Code Review)
Juan Yu has posted comments on this change.

Change subject: IMPALA-4135: Thrift threaded server times-out connections 
during high load
..


Patch Set 1:

(2 comments)

code style, we use 4 space indent for continued line

http://gerrit.cloudera.org:8080/#/c/4519/1/be/src/server/TAcceptQueueThreadedServer.cpp
File be/src/server/TAcceptQueueThreadedServer.cpp:

PS1, Line 233: if (stop_) {
shouldn't this be out of the while loop? to guarantee a proper cleanup.


PS1, Line 252: stop_ = false
why set stop_ back to false?


-- 
To view, visit http://gerrit.cloudera.org:8080/4519
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie50e728974ef31a9d49132a0b3f7cde2a4f3356d
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Juan Yu 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4135: Thrift threaded server times-out connections during high load

2016-09-25 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: IMPALA-4135: Thrift threaded server times-out connections 
during high load
..


Patch Set 1:

(7 comments)

I think Sailesh's suggestion is a good one. Is there any state in 
TThreadedServer that can't be accessed by a subclass? 

 I agree that a separate library seems unnecessary. There doesn't seem any need 
for a separate top-level dir; I expected a new directory under rpc/. But the 
simplest is probably just to put the file in rpc/ and be done with it.

http://gerrit.cloudera.org:8080/#/c/4519/1/be/src/rpc/thrift-server.cc
File be/src/rpc/thrift-server.cc:

Line 419:   server_.reset(new TAcceptQueueThreadedServer(processor_, 
server_socket,
let's make the choice of server configurable by a flag that we can turn off if 
needed.


http://gerrit.cloudera.org:8080/#/c/4519/1/be/src/server/TAcceptQueueThreadedServer.cpp
File be/src/server/TAcceptQueueThreadedServer.cpp:

Line 144: inputTransportFactory_->getTransport(client);
do you see much slowdown in synchronizing around these few lines that create 
protocols, transports and processors? If not, let's do that to make sure we 
rule out any possibility of race conditions.


Line 164: shared_ptr thread = 
shared_ptr(threadFactory_->newThread(runnable));
Is this definitely thread-safe? I think this might be the bottleneck (or maybe 
it's start()?) so it's going to be useful to keep unsynchronized.


Line 200:   ThreadPool> acceptPool("server-accept", 
"accept-worker", 1, 1,
How did you arrive at 1 as the default number of worker threads? What happens 
as that number increases? If it *has* to be one, to keep doAccept() 
thread-safe, add a comment here to that effect.


Line 236:   acceptPool.DrainAndShutdown();
I'm not sure we ever actually stop the thrift servers, so it might be moot - 
but shouldn't you signal to the worker threads that they should shut down and 
therefore not create any new threads?


http://gerrit.cloudera.org:8080/#/c/4519/1/be/src/server/TAcceptQueueThreadedServer.h
File be/src/server/TAcceptQueueThreadedServer.h:

PS1, Line 1: // This file was copied from 
apache::thrift::server::TThreadedServer.cpp v0.9.0, with the
   : // significant changes noted inline below.
   : /*
   :  * Licensed to the Apache Software Foundation (ASF) under one
   :  * or more contributor license agreements. See the NOTICE file
   :  * distributed with this work for additional information
   :  * regarding copyright ownership. The ASF licenses this file
   :  * to you under the Apache License, Version 2.0 (the
   :  * "License"); you may not use this file except in compliance
   :  * with the License. You may obtain a copy of the License at
   :  *
   :  *   http://www.apache.org/licenses/LICENSE-2.0
   :  *
   :  * Unless required by applicable law or agreed to in writing,
   :  * software distributed under the License is distributed on an
   :  * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
   :  * KIND, either express or implied. See the License for the
   :  * specific language governing permissions and limitations
   :  * under the License.
   :  */
> I think this is fine, but if you haven't already, can you ask Jim or Henry 
AFAIK it's completely fine - copying from another ASF project is legit.

That said, I'd keep the license header as the first thing in the file (and if 
you subclass, so much the better).


PS1, Line 44: and effectively creating a new,
:  * internally managed, accept queue.
not exactly - the connections have been accepted, the next state is transport 
set-up.


-- 
To view, visit http://gerrit.cloudera.org:8080/4519
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie50e728974ef31a9d49132a0b3f7cde2a4f3356d
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4135: Thrift threaded server times-out connections during high load

2016-09-23 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-4135: Thrift threaded server times-out connections 
during high load
..


Patch Set 1:

(3 comments)

Thanks! Not a full review yet. I still need to spend time reading 
TAcceptQueueThreadedServer.cpp carefully.

http://gerrit.cloudera.org:8080/#/c/4519/1//COMMIT_MSG
Commit Message:

PS1, Line 18: This patch has been tested locally with the repro shown in 
IMPALA-4135
Can you add the test, modified if necessary?


http://gerrit.cloudera.org:8080/#/c/4519/1/be/src/server/CMakeLists.txt
File be/src/server/CMakeLists.txt:

PS1, Line 24: add_library(ThriftAcceptQueueThreadedServer
: TAcceptQueueThreadedServer.cpp
:   )
: add_dependencies(ThriftAcceptQueueThreadedServer thrift-deps)
Was there a reason you wanted this as a separate library? I think it's fine to 
put this code in rpc/ since that's where we have our thrift-server.


http://gerrit.cloudera.org:8080/#/c/4519/1/be/src/server/TAcceptQueueThreadedServer.h
File be/src/server/TAcceptQueueThreadedServer.h:

PS1, Line 1: // This file was copied from 
apache::thrift::server::TThreadedServer.cpp v0.9.0, with the
   : // significant changes noted inline below.
   : /*
   :  * Licensed to the Apache Software Foundation (ASF) under one
   :  * or more contributor license agreements. See the NOTICE file
   :  * distributed with this work for additional information
   :  * regarding copyright ownership. The ASF licenses this file
   :  * to you under the Apache License, Version 2.0 (the
   :  * "License"); you may not use this file except in compliance
   :  * with the License. You may obtain a copy of the License at
   :  *
   :  *   http://www.apache.org/licenses/LICENSE-2.0
   :  *
   :  * Unless required by applicable law or agreed to in writing,
   :  * software distributed under the License is distributed on an
   :  * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
   :  * KIND, either express or implied. See the License for the
   :  * specific language governing permissions and limitations
   :  * under the License.
   :  */
I think this is fine, but if you haven't already, can you ask Jim or Henry if 
there's anything else we have to do for legal?


-- 
To view, visit http://gerrit.cloudera.org:8080/4519
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie50e728974ef31a9d49132a0b3f7cde2a4f3356d
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4135: Thrift threaded server times-out connections during high load

2016-09-22 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change.

Change subject: IMPALA-4135: Thrift threaded server times-out connections 
during high load
..


Patch Set 1:

Have you considered making TAcceptQueueThreadedServer inherit from 
TThreadedServer and just overloading the serve() functions, and adding the 
doAccept()?

The rest of the code looks mostly the same.

-- 
To view, visit http://gerrit.cloudera.org:8080/4519
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie50e728974ef31a9d49132a0b3f7cde2a4f3356d
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4135: Thrift threaded server times-out connections during high load

2016-09-22 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/4519

Change subject: IMPALA-4135: Thrift threaded server times-out connections 
during high load
..

IMPALA-4135: Thrift threaded server times-out connections during high load

During times of high load, Thrift's TThreadedServer can't keep up with the
rate of new socket connections, causing some to time out.

This patch creates TAcceptQueueThreadedServer, which is a modified version of
TThreadedServer that calls accept() and then hands the returned TTransport
off to a thread pool to handle setting up the connection. This ensures that
accept() is called as quickly as possible, preventing connections from timing
out while waiting.

This patch has been tested locally with the repro shown in IMPALA-4135, but
it still needs to be tested in a real cluster.

Change-Id: Ie50e728974ef31a9d49132a0b3f7cde2a4f3356d
---
M be/CMakeLists.txt
M be/src/rpc/thrift-server.cc
A be/src/server/CMakeLists.txt
A be/src/server/TAcceptQueueThreadedServer.cpp
A be/src/server/TAcceptQueueThreadedServer.h
5 files changed, 449 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/19/4519/1
-- 
To view, visit http://gerrit.cloudera.org:8080/4519
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ie50e728974ef31a9d49132a0b3f7cde2a4f3356d
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall