Hi,
We have spent the past 2 days working on this, and have some new
findings.

We have made contact to one of our customers who is encountering this
issue more frequently than others, and he granted us access to his
computer (using logmein). We installed WireShark on his computer, as
well as on the server. We managed to reproduced the problem with both
sniffers in action, and analyze the exact correlating TCP segments
according to their sequence and ack numbers. Here are the results.

This is what happens in the valid state:
The client sends 2 TCP segments for a GWT service calls, which are
supposed to be reassembled to a single PDU which is the entire single
HTTP request. The first segment always contains the HTTP request
header, and the second TCP segment always contains the HTTP request
body. For instance, we see that the client sends a first segment of
size 969 bytes, and a second segment of size 454 bytes. In the server
we see that these 2 segments become 3 segments. The first is still 969
bytes and contains the HTTP request header; the second is 363 bytes
(80% of the original second segment), and the third is the remaining
91 bytes (20% of the original 454 bytes).

In the invalid state, when the problem occurs, the third segment
simply does not arrive in the server. It seems that something in the
way has split the second 454 bytes segment to 2 segments, and only
sent the first one to the server.

1. If this is something in the client's machine, how come we don't see
it in the sniffer? (we even tried removing all firewall/antivirus
software, reinstalling the network card driver)
2. If this is not something in the client's machine, how come some
clients encounter this much more than others, that never encounter
this?

Can it be some kind of network equipment that some of our clients
(reminder - different ISPs) go through, and others don't?

Unfortunately, this new info still leaves us clueless...

On Dec 3, 5:16 pm, jchimene <[EMAIL PROTECTED]> wrote:
> On Dec 2, 11:20 pm, Amit Kasher <[EMAIL PROTECTED]> wrote:
>
> > Hi and thanks again for your responses.
>
> No Prob.
>
> If this "opportunity for excellence" is as pervasive as you suspect,
> installing software on a client's computer should be a non-starter
> from the perspective that installing it on *any* computer *anywhere on
> the planet* should reliably reproduce the issue. You say that tcpdump
> shows the packet truncation, so I'm not sure I understand the
> requirement to install something on a client machine. My goal in these
> past responses has been to absolutely prove that it's the
> serialization code (by factoring out the serialization code using
> ping), not something peculiar to the transport or session layers.
>
> Are you using the public switched network to provide client/server
> connectivity? If not, nothing you've said so far would eliminate your
> network transport service.
>
> I find it hard to believe it's GWT, as the cargo size is so small as
> to be insignificant, and others would have reported this issue by now.
> I have to admit that I'm not a user of Java serialization, so there
> may have been reports of this serialization issues of which I'm
> blissfully unaware. From everything you're saying, it really looks
> like the problem is in user-space. It may be a certain code path that
> leads to the same serialization invocation logic. I'd start pulling
> this code apart, instrumenting the hell out of it and running it
> through JUnit or some such automated testing environment. Again, I
> understand you've probably done this...
>
> I'm wondering if there's a specific byte-pattern that's causing this.
> Have you tried reordering the structure members? Also, have you
> eliminated buffer corruption issues? Since it's cross-browser, what
> does the -pretty flag + Firebug reveal? Esp. when profiling the code?
> (Although I must admit that you've probably tried all that type of
> debugging by now).
>
> Bueno Suerte,
> jec
>
>
>
> > A few more subtle observations and insights:
> > 1. It's probably not the server. There are several reasons that lead
> > us to believe that the server is not the cause of this issue: (a) We
> > switched hosting providers. (b) These providers reside in completely
> > different geographical locations - countries. (c) We have always been
> > using JBoss on CentOS, but this issue occurs both when we work with
> > Apache as a front end using mod_jk to tomcat, as well as when
> > eliminating this tier and having clients go directly to tomcat - using
> > it as an HTTP server. (d) tcpdump sniffer explicitly shows that the
> > server receives ALWAYS EXACTLY 80% of the request payload. Unless this
> > is something even lower level in that machine (the VPS software used -
> > virtuozzo, the network card/driver, etc.), these observations pretty
> > much provides an alibi for the server... I think we'd better focus on
> > other places.
> > 2. There are indications that this is not inside the browser as well:
> > (a) It happens in several GWT versions. (b) It happens "to" all
> > browsers, which provides a strong clue, since this code is completely
> > different from browser to browser - GWT uses MsXMLHTTP activeX in IE,
> > while using completely other objects in other browsers. Since this is
> > the underlying mechanism used to perform RPC, it seems that if it
> > happens for more than one of them, low chances that this is the cause.
> > Still it seems that this MUST be the GWT/client code, since these
> > clients, to whom this issue occurs much more often, don't have
> > problems in any other websites (we managed to talk to several of
> > them).
> > One thing that comes to mind is perhaps the GWT serialization code? I
> > don't know...
>
> > Therefore, currently, aside from the possibility that there's a bug in
> > the GWT serialization code, there's also the possibility that it's
> > something in the network, even though these clients are from various
> > ISPs, and geographical locations. Yes, I notice the dead end as
> > well...
>
> > These observations somewhat reduce the anticipated benefit (let alone
> > the feasibility...) of several of your (MUCH APPRECIATED, THOUGH)
> > suggestions:
> > 1. ping from the lab
> > 2. perl HTTP server
>
> > Despite that, we ARE happy about any suggestion and willing to put the
> > required effort, so we'll try to make progress in these direction.
>
> > Our situation now is that we assume that the data arrives corrupted to
> > the server, and we should see how this data comes out of the client.
> > Therefore we will also try to install a sniffer in a client computer
> > in which this occurs (though we have been trying to do that for quite
> > a long time now).
>
> > On Dec 2, 10:29 pm, jchimene <[EMAIL PROTECTED]> wrote:
>
> > > Hi Amit,
>
> > > One other thing:
>
> > > I'm getting the impression that you also have a custom server. If it's
> > > an identical configuration across all server instances, than you also
> > > have to prove that it's not the server. Again, I'd code a simple HTTP
> > > server in Perl (because there's no problem so intractable that it
> > > can't be made worse with a Perl application) and use it to test
> > > against your application.
>
> > > Cheers,
> > > jec
>
> > > On Dec 2, 9:11 am, Amit Kasher <[EMAIL PROTECTED]> wrote:
>
> > > > Hi,
> > > > Thanks for your reply. Answers are inline.
>
> > > > On Dec 2, 5:50 pm, jchimene <[EMAIL PROTECTED]> wrote:> Hi,
>
> > > > > A few questions:
>
> > > > > o Are all packets sent to the server the same size?
>
> > > > No, they are not.
>
> > > > > o What is that size?
>
> > > > This depends on the service call - somewhere between 150 and 2000
> > > > bytes.
> > > > I will mention again that by using a sniffer (tcpdump), it seems that
> > > > EVERY time this issue occurs, the actual packets the server receives
> > > > are ALWAYS EXACTLY 80% of what it should have received. This, again,
> > > > was very encouraging to find as a clue, but unfortunately led me
> > > > nowhere.
>
> > > > > o Have you checked for other types of congestion?
>
> > > > Congestion? Unfortunately, I don't have any control over the client's
> > > > environment since this is an internet application and I can't
> > > > reproduce it.
>
> > > > > o Is this entirely TCP/IP? Have you checked maxrss?
>
> > > > maxrss? I'm not sure I understood the relevance... TCP/IP is obviously
> > > > used, it is the underlying protocol of HTTP...
>
> > > > > o Have you enabled logging on intermediate nodes to see if there are
> > > > > congestion issues?
>
> > > > I wish I could... I don't have any control over any node before the
> > > > server. It is a CentOS VPS hosted internet application. I will state
> > > > that this occurred in several hosting providers, in several countries
> > > > and geographical locations.
>
> > > > > o Is this related to a specific time of day (although it probably
> > > > > happens between 10:00 and 14:00...)
>
> > > > I didn't find any correlation between the time of day and the
> > > > occurrence of this. Obviously, this is normalized to the usage load,
> > > > as you implied.
>
> > > > > o Do you have a world-wide net? If so, does the problem travel across
> > > > > time zones?
>
> > > > My users are not from around the world, but as I stated - this issue
> > > > occurred when using hosting providers around the world.
>
> > > > > Cheers,
> > > > > jec
>
> > > > > On Dec 2, 2:13 am, Amit Kasher <[EMAIL PROTECTED]> wrote:
>
> > > > > > Hi,
> > > > > > Does anyone has any new insights about this issue? We've been
> > > > > > investigating for over a year(!), and we seem to not be the only
> > > > > > ones...
>
> > > > > >http://tinyurl.com/5rqfp5
>
> > > > > > Thanks.
>
>
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Google Web Toolkit" group.
To post to this group, send email to Google-Web-Toolkit@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/Google-Web-Toolkit?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to