marc fleury wrote:
>
> |IMHO, this kind of long transactions void scalability due to the
> |locking issues. It should be avoided by jBoss users, but we
> |should also be tolerant and allow it.
>
> maybe it is a matter of time... one minute is short imho one hour long
If you are sitting in front of your terminal and waiting for
a response one minute feels _very_ long...
> |During server overload timeouts are normal. If we had no timeouts,
> |or if the timeout is set to a very long time, users would call IT
> |staff and complain: "The server does not reply, nothing happens
> |when I try to use it."
>
> good point...
>
> |But if timeouts are set to a sensible value, the user of an
> |overloaded server would get a "timed out" error before calling
> |IT staff.
>
> sensible is the key... 5 min? 10 min?
This is the big question. Not easily answered.
I guess it depends on how long users are going to
wait before calling for help.
A problem with overload is that usually most of the
users of the overloaded server calls at the same
time.
The rule of thumb is: As low as possible, but high
enough that we never see timeouts unless something
is wrong.
For the project I intend to use jBoss for, a
transaction may involve two or three servers across
slow (but not overloaded) WAN links. Here I expect
to use a timeout of about 40 seconds.
> |We could, but please consider the consequences: Every bug report
> |we get that would have been "timeout problem" will instead be
> |"server hang problem", as no users are going to wait an hour
> |to see if a timeout would happen. And this would not only be the
> |user having a long transaction: Other users may be locked out
> |because of a long transaction, and experience a "server hang
> |problem" too.
>
> Before we get to what the users percieve we must get past what the
> developers and the J2EE administrators percieve of us :)))
Personally I don't care much about the users (please don't
tell anybody), but I know that administrators and
decisionmakers do.
This is why I speak about how users react to problems.
Most users will probably never know that they are using
jBoss.
> |TimeoutFactory only works with absolute time in milliseconds
> |since epoch.
>
> what's epoch
Would have been more correct is I said "start of epoch".
In JVM and UNIX this is Jan 01, 1970 00:00:00am.
So the time I speak about is just the Java long
representation of Date/Time.
> |A lot of server implementations get around this by
> |taking extra steps to ensure that the server does not
> |start working on more jobs than it can handle.
> |This is often done by some kind of incoming request
> |queue. When a request comes in and the server is
> |experiencing overload the request is enqueued. This way
>
> yes I was thinking about that yesterday. In fact our Metrics Interceptor
> can be the first one to admit queued request (JMS) and turn the valve
> "on-off" as the load goes....
>
> we are in sync on that it seems. Wanna look at it?
In the past I have been playing quite a bit with the
implications of server overload and how to avoid it.
I've also done some work on advanced queueing and
traffic control in networks and on data links, as this
is related to overload avoidance. So I might have some
experience to contribute here.
I have to say that completely avoiding overload problems
is impossible to do in Java alone. We can only try to
do our best. Note how a server reacts to raising overload
if we ignore such pesky problems as lack of threads and
operating system crashes:
- First the system seems slow, but otherwise it works
normal.
- Then we start getting timeouts because it has become
very slow.
- Then we get nothing but timeouts. All invocations
time out because of extreme overload. All time in the
server process is spent on receiving requests and maybe
processing the start of some requests before timeout.
Other processes on the same server may still work.
- Now it gets ugly. All invocations no longer reach the
Java VM. The OS kernel is so busy processing incoming
requests that it has to drop some of these due to lack
of kernel-level buffers.
- Finally we get to a point sometimes referred to as
receive livelock, as the entire server with all
processes looks like it has locked up for good. This
happens when all processor time is spent servicing
receive interrupts from the network cards.
The receive livelock seen on extreme loads is a problem
inherent in interrupt driven kernels, and is due to the
fact that interrupt processing has the highest possible
priority. To avoid it you would have to rewrite the OS
to work with intelligent network cards or put some traffic
control box in front of your server.
While we cannot completely avoid overload we can do
a lot to avoid it if it doesn't get too extreme.
If we should do some kind of request policing, three
things are important:
1) It should be done as soon as possible. Any and all
processor resources spent on a request that is later
dropped are wasted, which is bad in a overload
situation. It would be best if this is done before
deserialization of the request. But that may be hard
to do and then we would not be able to use the request
contents for policing decisions.
2) The queueing should use as little processor
resources as possible. When the overload gets bad
a lot of the total processor time will be spent on
the enqueueing and dequeueing of requests that must
be dropped. I'm not sure if JMS will do here; it is
heavy-weight and distributed while we need something
light-weight and local.
3) The policing system should be flexible. A simple
fifo queue can do for a crude system to avoid
overload. But there are lots of options for more
advanced queueing here. For example, a simple fifo
queue will be unfair if a single user is trying to
overload a server. To avoid this a fifo for each
user could be used, and the fifos could then be
serviced in a round-robin scheme. Some implementors
might want to give better service to clients that
pay more. For this a priority queue could be used.
And those that want to do some really advanced
service differentiation might want to use class
based queueing using feedback from a metrics
interceptor for the CBQ metrics. This might sound
hairy, but I don't think that making an interface
that could handle all of this if/when implemented
would be hard.
Personally I could live without a policing system,
but if present I would probably use some kind of
priority queueing. This would make it possible to
give background processing lower priority.
> |(almost) no processor time is spent on the request.
> |When the server finds time to service the request, it
> |is taken out of the queue to be serviced. But if the
> |server cannot find time to service the request, the
> |request is either simply dropped, or an error is
> |returned to the client.
> |This approach is sometimes called "request policing",
>
> glad to hear it is standard procedure, it is also the way SAP does it btw
>
> |as the server acts as a traffic controller for the
> |incoming requests.
> |Depending on how the input queue is implemented this
> |can also help with some types of DOS attacks.
> |
> |Sorry for the length of this answer, but I am afraid
>
> no, no don't be sorry, the whole made a lot of sense.
> I still believe we should "prioritize" who we need to wooo with the server
> and I think the short time outs make us a disservice with the admins and
> developers... timeout setting to short can be done by the admin at a latter
> stage but for now let's not missunderstand who we need to WIN RIGHT NOW!
> (Admins, developers already love us :)))
I think we basically agree on this.
What we really need might be a per-bean transaction
timeout setting in jboss.xml. If we have this and
the associated metadata, it would be trivial to add
a new method TxManager.begin(int timeout).
Best Regards,
Ole Husgaard.