Yes, but, as I understand, this is logging during processing a single 
stream of data, in a single call to http_parser_execute(), specifically 
inside of a for loop looping over exactly the data I logged.  That should 
be internally consistent, at least, right?  If the stream says it has an 
8192 byte chunk coming, and there's more than that many bytes of date, 8192 
bytes later there should be another chunk size in the stream (which, there 
is not, which of course is why it's throwing the HPE_INVALID_CHUNK_SIZE 
error as it reads some non-hex characters at the offset its expecting the 
next chunk size).

Anyway, I managed to capture it as it went bad with tcpdump (I added  -U -B 
1000000, not sure if that helped or the timing was just different).  Short 
result is: it looks like the data was bad as it came in over TCP!  I'll go 
into detail and post the dumps below, but this quickly leads me to ask 
(after apologizing about thinking this was a node bug): why does this 
generally fail on 0.8 and never on 0.6?

My guess is that on 0.6 an HTTP parse error simply disconnected the socket, 
and all of our AWS APIs simply retry any requests which get a socket 
disconnect (as those happen rather frequently with AWS).  We also 
automatically retry on a bunch of 5xx http response codes, so if any of 
those were being generated, we'd retry on those as well.  I'll add some 
additional logging and run this on node 0.6 again to see if I can verify 
this.

Back to the details on the symptom at hand (just in case 
I misinterpreted some of this data, it might be good to get another set of 
eyes on it).  In this case, it didn't actually get a http parser error, it 
simply handed back what looks like bad data (includes the HTTP chunk size 
lines inserted in the middle of it), perhaps because the socket got closed 
or something (although it seems if it got closed prematurely that perhaps 
that should have generated a parsing error since it was expecting more 
bytes? I'm not sure on this, exactly).  I've uploaded the parsed/printed 
TCP dump of the end of the relevant stream (starting from before the last 
good chunk size line) [1], as well as the tail of the data that was 
returned to my app, starting from the same point [2].

As far as I can tell, the stream indicated there was an 8kb chunk coming, 
8kb later it indicates another 8kb chunk is coming, but then 1296 bytes 
later there's another chunk size line ("\r\n2000\r\n").  At the packet 
level, the final packet (starting at seq 230681) appears to be simply 
repeated data from 1304 bytes back in the stream, but then terminated 
unexpectedly (never gets to the 8kb of data the earlier chunk size 
indicated was coming).  No other packets show up in this stream in my dump.

I can not think of any possible thing on our side (either my code or 
node's) that could cause this, so my conclusion is that something is going 
wrong within Amazon's Dynamo DB (or routing between there and our servers, 
though I've tried both inside and outside of their datacenters with the 
same results).  In retrospect, every bad packet I've gotten has been on 
DynamoDB queries, despite us slamming a couple other Amazon APIs with 
similar levels of traffic during this test, so that also points the finger 
in that direction.  I think I will try and bring this up on the DynamoDB 
forums, and just make our DDB client more robust if it gets low-level 
parsing errors (although I'm a little afraid of what could happen if it 
happens to accidentally parse correctly because of some of our data 
happening to match up!).

Thank you very much for your help with this.

[1]  http://thesilentb.com/files/tcpdump.txt 
[2]  http://thesilentb.com/files/packet.txt 

  Jimb Esser
  Cloud Party, Inc

On Monday, September 24, 2012 6:17:18 PM UTC-7, Ben Noordhuis wrote:
>
> On Tue, Sep 25, 2012 at 12:48 AM, Jimb Esser <[email protected]<javascript:>> 
> wrote: 
> > Patch for logging was to replace the two occurrences of 
> > SET_ERRNO(HPE_INVALID_CHUNK_SIZE) with: 
> >           { 
> >             const char *p2; 
> >             for (p2=data; p2 != data + len; p2++) { 
> >               fprintf(stderr, "0x%02x, ", *p2); 
> >               if (p == p2) { 
> >                 fprintf(stderr, "/******/ "); 
> >               } 
> >             } 
> >           } 
> >           fprintf(stderr, "HPE_INVALID_CHUNK_SIZE case 1\n"); 
> >           SET_ERRNO(HPE_INVALID_CHUNK_SIZE); 
> > 
> > I then loaded the stderr output into a trivial JS program to write it 
> out as 
> > a binary file. 
> > 
> > When you say you don't see anything obviously wrong in the dump itself, 
> do 
> > you mean in looks like a valid HTTP stream?  It looks to me like, near 
> the 
> > end, it says an 8k data chunk is coming (2000\r\n), and then provides 
> less 
> > than 8k data and starts another chunk (with the original 8k terminating 
> in 
> > the middle of the second chunk).  I am basing this entirely on 
> assumptions 
> > about the HTTP protocol gleaned from assumptions and reading 
> http_parser.c, 
> > though, so I could be quite mistaken =). 
>
> The HTTP parser works across TCP packet boundaries. A request or 
> response doesn't necessarily fit in a single packet. 
>
> Unless you're benchmarking with a concurrency of 1 (a single client 
> that issues sequential requests), you'll see HTTP requests and 
> responses getting interleaved. 
>

-- 
Job Board: http://jobs.nodejs.org/
Posting guidelines: 
https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
You received this message because you are subscribed to the Google
Groups "nodejs" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/nodejs?hl=en?hl=en

Reply via email to