Maksim,
Great work!
Discussion inline below.
Recommended next steps.
1) Add the setup steps required to get all of this working to the user's
guide. File a jira.
2) Figure out a way to automate these tests. Might be hard on Apache infra.
Kevin.
On 10/25/13 8:55 AM, Maksim Kononenko wrote:
Hi guys,
I was researching/testing Knox HA with Apache HTTP Server + mod_proxy +
mod_proxy_balancer.
Here is what I found.
I. 3 load balancer scheduler algorithms available for use: Request
Counting, Weighted Traffic Counting and Pending Request Counting. (
http://httpd.apache.org/docs/2.2/mod/mod_proxy_balancer.html#scheduler)
II. Load balancer stickyness. (
http://httpd.apache.org/docs/2.2/mod/mod_proxy_balancer.html#stickyness)
I configured and tested stickyness. Worked as it had to be.
III. Failover. (
http://httpd.apache.org/docs/2.2/mod/mod_proxy.html#proxypass)
1. I ran foolowing use cases:
a) Knox instance is down before client request comes in.
Steps:
- Configure Apache HTTP Server to proxy two Knox instances;
- Shoot down Knox instance A;
- Execute client request;
- Verify that Knox instance A is marked as unavailable and
client's request is redirected to Knox instance B;
- Verify that all subsequent requests in scope of the same
client's session are passed just to Knox instance B;
- Verify that client's requests in scope of new session are
tried to be passed to Knox instance A.
It is required because Knox instance A could be started
before new client's session.
This seems a little sub-optimal to me but there may be nothing we can do
about it.
The issue that I have is that I don't think Apache should be trying
instance-A first every time in this case.
So the question is how is Apache distributing load over instance-A and
instance-B?
Does it always try instance-A first or does it sometimes try instance-B
first?
In addition if it gets a failure for instance-A ideally it would take it
out of the "pool" for some (ideally configurable) period of time.
This use case works fine.
b) Knox instance goes down when it processes client's PUT request.
Steps:
- Start executing PUT file to HDFS with medium size (200Mb);
- After some time shoot down Knox instance which processes
this request;
- Verify that client gets 500 status code and no failover
takes place.
This use case works as it is described. Apache HTTP Server is
not able to do failover in this case.
c) Knox instance goes down when it processes client's GET request.
Steps:
- Start executing GET file from HDFS with medium size
(200Mb);
- After some time shoot down Knox instance which processes
this request;
- Verify that client gets 200 status code, 'Content-Length'
header with value equals to file size and some bytes in the body.
To execute this test I used as a client:
1) HttpClient - it doesn't produce any error when
stream is closed.
2) CURL - it doesn't produce any error when stream is
closed.
3) Firefox browser - it doesn't produce any error when
stream is closed.
All clients just download available bytes before stream
is closed, so client has to manually compare 'Content-Length' header value
and received bytes length.
- No failover takes place.
This use case works as it is described. Apache HTTP Server is
not able to do failover in this case.
This is unexpected and unfortunate.
I would have hoped that HttpClient and cURL at least would provide some
indication that the stream was incomplete according to the
Content-Length header.
The only thing I would recommend you trying is taking Knox out of the
picture, use cURL to GET the same file directly from HDFS, kill the
DataNode halfway through the stream and ensure that you see the same
behavior on the client side.
2. Additional use cases.
What new cases could you advise?
I just want to confirm that you have tested a scenario for HDFS where
the call to the NameNode goes to instance-A and the subsequent call to
the DataNode goes to instance-B and this works.
IV. What functionality did I miss?
Other than the note above I don't see anything missing.
Maksim.
--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to
which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender immediately
and delete it from your system. Thank You.