Hi,
I've been using CHICKEN to speak to HBase via the Stargate REST API.
I've managed to build a binding with the rest-bind egg and it works.
However, it's very slow but it doesn't saturate CPU or IO bandwidth.
I have a benchmark where it requests 124 cells from HBase using the
scanner API. This takes about 5 seconds, of which less than 0.2 seconds
are actually spend doing anything at all:
-----
#: 124
0.18s CPU time, 0.02s GC time (major), 157977/130286 mutations
(total/tracked), 4/273 GCs (major/minor)
0.32user 0.02system 0:05.20elapsed 6%CPU (0avgtext+0avgdata
24096maxresident)k
0inputs+32outputs (0major+3514minor)pagefaults 0swaps
-----
There are some more benchmarks here:
http://paste.call-cc.org/paste?id=b46e6a3905ae611f2dcce3c3214e7d20384fa8ed
The benchmark is almost the same from compiled code as it is via csi.
I've tried attaching the debugger to the process and I always catch it
in __poll_nocancel so I suspect that it's getting stuck in the
scheduler.
If I tell the HTTP request to use HTTP/1.0 rather than HTTP/1.1 then it
doesn't uses a new HTTP connection for each request and goes
significantly faster (but still only gets up to about 17% CPU rather
than 0.4%):
-----
$ time csi -s extractor.scm
#: 124
0.112s CPU time, 0.02s GC time (major), 33088/5388 mutations
(total/tracked), 5/218 GCs (major/minor)
real 0m0.790s
user 0m0.260s
sys 0m0.012s
-----
As you can see from the numbers above, it's still wasting a
considerable amount of time.
I did a benchmark using curl as well in order to rule out the other end
of the REST API:
Reusing a single connection:
-----
$ time seq 1 124 | sed
s#.*#http://localhost:8080/GridSearch/scanner/14351513400672ee69118#
|xargs -n 124 curl > /dev/null 2>&1
real 0m0.079s
user 0m0.024s
sys 0m0.004s
-----
Using a connection per request:
-----
$ time seq 1 124 | sed
s#.*#http://localhost:8080/GridSearch/scanner/14351514443804a6fb774#
|xargs -n 1 curl > /dev/null 2>&1
real 0m1.472s
user 0m0.472s
sys 0m0.036s
-----
I did a bit more profiling and http-client spends almost all of its
time in intarweb's read-response procedure which, in turn, spends all
it's time in its own safe-read-line procedure. Swapping safe-read-line
for read-line doesn't change anything.
My timings show that things get stuck in read-line for 38 or 40ms per
call. For 124 calls that's about 4.712 seconds which is most of the 5
seconds run time.
I've looked briefly into ports.scm and library.scm in the CHICKEN
source but didn't have much luck understanding what was going on.
make-input-port's read-char procedure appears to call (read) whilst
inside the scope of a lambda called read which has lots of arguments so
I'm clearly missing something.
Any help about how to improve this situation would be greatfully
appreciated.
Thanks! :-)
--
andy...@ashurst.eu.org
http://www.ashurst.eu.org/
http://www.gonumber.com/andyjpb
0x7EBA75FF
_______________________________________________
Chicken-users mailing list
Chicken-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/chicken-users