Re: Client did not send nnn bytes as expected

Amit Kasher Wed, 17 Dec 2008 23:53:06 -0800

I really appreciate you attempts to help.
In reply to your last post, jec:
* We can't reproduce this in the lab...
* We see a combination of FIN, FIN-ACK and RST.
* We haven't seen any suspicious traceroutes... nothing
differentiating suffering clients from non-suffering client.
* We don't do anything "special" - these are normal GWT service call
requests from browser to server.
* We tried that, as well as different MTU sizes... no clue
* This occurs without any SSL involved, and regardless what browser
being used (IE, FF).


Unfortunately we gave up on the persistent attempt to get to the root
of that issue. We now just assume that it's some low level network
issue (level 1-2) that causes some of the packets not to arrive, in an
unexplained combination with a higher level network issue (level 3-6)
that causes the packet's data to split at exactly 80%.

In order to deal with this situation, we implemented a high level
(GWT) configurable retry mechanism, with timeout support. This
resolves the symptoms, and in effect solves the problem.

We don't mind contributing this mechanism (both client and server
code), if someone is interested or believes GWT needs this kind of
mechanism.

Thanks again,
Amit

On Dec 5, 8:09 pm, jchimene <jchim...@gmail.com> wrote:
> Hi Amit,
>
> You don't make this easy, do you...
>
> o     Just to be clear: goodness happens when the client sends 2 TCP
> packets; which become three IP packets on the wire; which are
> reassembled by the server into 2 TCP packets.
>        Badness happens when the client sends 2 TCP packets; which
> become three IP packets on the wire; which are reassembled into one
> complete TCP packet and 1 incomplete TCP packet.
>       Can you reproduce this in your lab? I'm guessing "no", otherwise
> you would not have deployed the app...
>
> o     Do you see a NAK at the client after the dropped fragment?
>
> o     Pls. try traceroute from your lab and from the client box. What
> are the differences?
>
> o     It's now appearing to be an IP issue. The fact that the
> fragmentation doesn't occur on the larger packet is interesting.
>
> o     The two separate TCP packets leads to an assumption that you can
> identify requests from the same client box at the server. IOW, you
> have an
>        application-level protocol that lets you reassemble the two
> packets into a single request. I'm sure this is the case, but such a
> design isn't explicitly stated in your
>        message. Your server application never sees the 2 -> 3 split,
> since the normal case is that your server app only sees 2 packets from
> the client. I'm reluctant to say this, but
>        part of this process may require proof that the protocol design
> is resilient to network transmission errors.
>
> o     I'd start playing around w/ different packet sizes and
> transmission rates (via ping) to see if you can trip any triggers. It
> may be a combination of buffering/congestion
>        between the client and the server.
>        Did you try ping w/ different packet sizes? I realize that you
> have different servers. Does the connection between the client and
> server occur over the public switched network
>        or does it use a private circuit?
>
> o     There have been posts in this thread w/r/t/ SSL and IE. Are they
> relevant?
>
> Cheers,
> jec
>
> On Dec 5, 1:21 am, Amit Kasher <amitkas...@gmail.com> wrote:
>
> > Hi,
> > We have spent the past 2 days working on this, and have some new
> > findings.
>
> > We have made contact to one of our customers who is encountering this
> > issue more frequently than others, and he granted us access to his
> > computer (using logmein). We installed WireShark on his computer, as
> > well as on the server. We managed to reproduced the problem with both
> > sniffers in action, and analyze the exact correlating TCP segments
> > according to their sequence and ack numbers. Here are the results.
>
> > This is what happens in the valid state:
> > The client sends 2 TCP segments for a GWT service calls, which are
> > supposed to be reassembled to a single PDU which is the entire single
> > HTTP request. The first segment always contains the HTTP request
> > header, and the second TCP segment always contains the HTTP request
> > body. For instance, we see that the client sends a first segment of
> > size 969 bytes, and a second segment of size 454 bytes. In the server
> > we see that these 2 segments become 3 segments. The first is still 969
> > bytes and contains the HTTP request header; the second is 363 bytes
> > (80% of the original second segment), and the third is the remaining
> > 91 bytes (20% of the original 454 bytes).
>
> > In the invalid state, when the problem occurs, the third segment
> > simply does not arrive in the server. It seems that something in the
> > way has split the second 454 bytes segment to 2 segments, and only
> > sent the first one to the server.
>
> > 1. If this is something in the client's machine, how come we don't see
> > it in the sniffer? (we even tried removing all firewall/antivirus
> > software, reinstalling the network card driver)
> > 2. If this is not something in the client's machine, how come some
> > clients encounter this much more than others, that never encounter
> > this?
>
> > Can it be some kind of network equipment that some of our clients
> > (reminder - different ISPs) go through, and others don't?
>
> > Unfortunately, this new info still leaves us clueless...
>
> > On Dec 3, 5:16 pm, jchimene <jchim...@gmail.com> wrote:
>
> > > On Dec 2, 11:20 pm, Amit Kasher <amitkas...@gmail.com> wrote:
>
> > > > Hi and thanks again for your responses.
>
> > > No Prob.
>
> > > If this "opportunity for excellence" is as pervasive as you suspect,
> > > installing software on a client's computer should be a non-starter
> > > from the perspective that installing it on *any* computer *anywhere on
> > > the planet* should reliably reproduce the issue. You say that tcpdump
> > > shows the packet truncation, so I'm not sure I understand the
> > > requirement to install something on a client machine. My goal in these
> > > past responses has been to absolutely prove that it's the
> > > serialization code (by factoring out the serialization code using
> > > ping), not something peculiar to the transport or session layers.
>
> > > Are you using the public switched network to provide client/server
> > > connectivity? If not, nothing you've said so far would eliminate your
> > > network transport service.
>
> > > I find it hard to believe it's GWT, as the cargo size is so small as
> > > to be insignificant, and others would have reported this issue by now.
> > > I have to admit that I'm not a user of Java serialization, so there
> > > may have been reports of this serialization issues of which I'm
> > > blissfully unaware. From everything you're saying, it really looks
> > > like the problem is in user-space. It may be a certain code path that
> > > leads to the same serialization invocation logic. I'd start pulling
> > > this code apart, instrumenting the hell out of it and running it
> > > through JUnit or some such automated testing environment. Again, I
> > > understand you've probably done this...
>
> > > I'm wondering if there's a specific byte-pattern that's causing this.
> > > Have you tried reordering the structure members? Also, have you
> > > eliminated buffer corruption issues? Since it's cross-browser, what
> > > does the -pretty flag + Firebug reveal? Esp. when profiling the code?
> > > (Although I must admit that you've probably tried all that type of
> > > debugging by now).
>
> > > Bueno Suerte,
> > > jec
>
> > > > A few more subtle observations and insights:
> > > > 1. It's probably not the server. There are several reasons that lead
> > > > us to believe that the server is not the cause of this issue: (a) We
> > > > switched hosting providers. (b) These providers reside in completely
> > > > different geographical locations - countries. (c) We have always been
> > > > using JBoss on CentOS, but this issue occurs both when we work with
> > > > Apache as a front end using mod_jk to tomcat, as well as when
> > > > eliminating this tier and having clients go directly to tomcat - using
> > > > it as an HTTP server. (d) tcpdump sniffer explicitly shows that the
> > > > server receives ALWAYS EXACTLY 80% of the request payload. Unless this
> > > > is something even lower level in that machine (the VPS software used -
> > > > virtuozzo, the network card/driver, etc.), these observations pretty
> > > > much provides an alibi for the server... I think we'd better focus on
> > > > other places.
> > > > 2. There are indications that this is not inside the browser as well:
> > > > (a) It happens in several GWT versions. (b) It happens "to" all
> > > > browsers, which provides a strong clue, since this code is completely
> > > > different from browser to browser - GWT uses MsXMLHTTP activeX in IE,
> > > > while using completely other objects in other browsers. Since this is
> > > > the underlying mechanism used to perform RPC, it seems that if it
> > > > happens for more than one of them, low chances that this is the cause.
> > > > Still it seems that this MUST be the GWT/client code, since these
> > > > clients, to whom this issue occurs much more often, don't have
> > > > problems in any other websites (we managed to talk to several of
> > > > them).
> > > > One thing that comes to mind is perhaps the GWT serialization code? I
> > > > don't know...
>
> > > > Therefore, currently, aside from the possibility that there's a bug in
> > > > the GWT serialization code, there's also the possibility that it's
> > > > something in the network, even though these clients are from various
> > > > ISPs, and geographical locations. Yes, I notice the dead end as
> > > > well...
>
> > > > These observations somewhat reduce the anticipated benefit (let alone
> > > > the feasibility...) of several of your (MUCH APPRECIATED, THOUGH)
> > > > suggestions:
> > > > 1. ping from the lab
> > > > 2. perl HTTP server
>
> > > > Despite that, we ARE happy about any suggestion and willing to put the
> > > > required effort, so we'll try to make progress in these direction.
>
> > > > Our situation now is that we assume that the data arrives corrupted to
> > > > the server, and we should see how this data comes out of the client.
> > > > Therefore we will also try to install a sniffer in a client computer
> > > > in which this occurs (though we have been trying to do that for quite
> > > > a long time now).
>
> > > > On Dec 2, 10:29 pm, jchimene <jchim...@gmail.com> wrote:
>
> > > > > Hi Amit,
>
> > > > > One other thing:
>
> > > > > I'm getting the impression that you also have a custom server. If it's
> > > > > an identical configuration across all server instances, than you also
> > > > > have to prove that it's not the server. Again, I'd code a simple HTTP
> > > > > server in Perl (because there's no problem so intractable that it
> > > > > can't be made worse with a Perl application) and use it to test
> > > > > against your application.
>
> > > > > Cheers,
> > > > > jec
>
> > > > > On Dec 2, 9:11 am, Amit Kasher <amitkas...@gmail.com> wrote:
>
> > > > > > Hi,
> > > > > > Thanks for your reply. Answers are inline.
>
> > > > > > On Dec 2, 5:50 pm, jchimene <jchim...@gmail.com> wrote:> Hi,
>
> > > > > > > A few questions:
>
> > > > > > > o
>
> ...
>
> read more »
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Google Web Toolkit" group.
To post to this group, send email to Google-Web-Toolkit@googlegroups.com
To unsubscribe from this group, send email to 
google-web-toolkit+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/Google-Web-Toolkit?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: Client did not send nnn bytes as expected

Reply via email to