[HttpComponent][HttpCore] Refactoring of HTTP message processing code

Oleg Kalnichevski Mon, 05 Dec 2005 13:24:54 -0800

The Rationale 
=============
(1) Harmonize HTTP message processing API 

The HTTP message processing API in HttpClient 3.0 is inconsistent.
Methods that share similar function have inconsistent names and
different argument signatures. There are several static utility classes
intended to process HTTP primitives sitting in the util package.


(2) Avoid the use of objects with synchronized access

StringBuffer and ByteArrayOutputStream used internally by HttpClient 3.0
synchronize on each object mutation, which causes significant
performance degradation

(3) Eliminate excessive garbage creation when processing HTTP messages

Refactoring
===========
(1) The HttpCore _should_ now have a consistent API for HTTP message
processing. Parseable HTTP primitives now come with #parse methods to
convert a char array to a HTTP primitive and #parseAll to convert a char
array to a sequence of HTTP primitives. In those cases where
Object#toString() may not always do the job, HTTP primitives come with
#format and #formatAll methods. All HTTP primitives can work with either
String or CharArrayBuffer. The code from static utility classes have
been merged into the logically related HTTP primitive classes

Consistency of an API is a different thing to different people, though,
so please review and complain loudly if you disagree.

(2) StringBuffer and ByteArrayOutputStream gotten rid of in favor of
unsynchronized ByteArrayBuffer and CharArrayBuffer classes

(3) HttpClient 3.0 produces approximately up to 50 intermediate objects
per average HTTP request and 1-2 per content chunk (if chunk-encoded)
primarily due to the abuse of String#trim() and String#substring()
methods. This is a lot of garbage

The refactored code, to the contrary, generates _virtually_ zero garbage
(1 intermediate object I can think of) when parsing HTTP header and zero
garbage when parsing content chunks. Moreover HTTP headers are tokenized
only when needed, thus unused headers never get parsed and converted to
high level Objects, further reducing amount of garbage required to
process a request.

Reduced garbage comes at the price of somewhat uglier code, though.

* Header class can be initialized with an instance of CharArrayBuffer
which is copied by reference, not by value. Theoretically one can still
mutate the original CharArrayBuffer instance thus possibly rendering the
Header instance corrupt. 

This problem can be solved by making Header an interface with two impls:
an immutable public class and a package private instantiated by passing
a reference to a CharArrayBuffer. I just thought that would be an
overkill though. Please let me know if you disagree

* NumUtils#parseUnsignedInt is used instead of standard Integer#parseInt
to parse integer values in the HTTP messages such as protocol version
and chunk size. The parseUnsignedInt method produces no intermediate
garbage whereas the use of Integer#parseInt usually entails creation of
an intermediate string object

I understand this is a questionable design decision and will not object
strongly should the majority decide this change be reverted

I am planning to do some benchmarking to see if the 'near-zero-garbage'
HTTP message processing results in any tangible performance gains. So
far it appears to have produced an absolutely negligible ~1-2%
performance increase when running in JRE 1.5

Oleg


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[HttpComponent][HttpCore] Refactoring of HTTP message processing code

Reply via email to