Hi Amit, A few final observations:
o You should be seeing SYN packets that will track the IP packet segments, particularly when the client sends 2 TCP packets; which become three IP packets on the wire. Look for retransmission requests, which would be SYN-ACK packets where the sequence number doesn't increment (IOW NAKs) o Compare router configurations in your lab and in the field, particularly if this is a service provided using a virtual private network o Look for indications of mismanaged TCP flow control (window size) and congestion control Cheers, jec On Dec 18, 12:52 am, Amit Kasher <amitkas...@gmail.com> wrote: > I really appreciate you attempts to help. > In reply to your last post, jec: > * We can't reproduce this in the lab... > * We see a combination of FIN, FIN-ACK and RST. > * We haven't seen any suspicious traceroutes... nothing > differentiating suffering clients from non-suffering client. > * We don't do anything "special" - these are normal GWT service call > requests from browser to server. > * We tried that, as well as different MTU sizes... no clue > * This occurs without any SSL involved, and regardless what browser > being used (IE, FF). > > Unfortunately we gave up on the persistent attempt to get to the root > of that issue. We now just assume that it's some low level network > issue (level 1-2) that causes some of the packets not to arrive, in an > unexplained combination with a higher level network issue (level 3-6) > that causes the packet's data to split at exactly 80%. > > In order to deal with this situation, we implemented a high level > (GWT) configurable retry mechanism, with timeout support. This > resolves the symptoms, and in effect solves the problem. > > We don't mind contributing this mechanism (both client and server > code), if someone is interested or believes GWT needs this kind of > mechanism. > > Thanks again, > Amit > > On Dec 5, 8:09 pm, jchimene <jchim...@gmail.com> wrote: > > > Hi Amit, > > > You don't make this easy, do you... > > > o Just to be clear: goodness happens when the client sends 2 TCP > > packets; which become three IP packets on the wire; which are > > reassembled by the server into 2 TCP packets. > > Badness happens when the client sends 2 TCP packets; which > > become three IP packets on the wire; which are reassembled into one > > complete TCP packet and 1 incomplete TCP packet. > > Can you reproduce this in your lab? I'm guessing "no", otherwise > > you would not have deployed the app... > > > o Do you see a NAK at the client after the dropped fragment? > > > o Pls. try traceroute from your lab and from the client box. What > > are the differences? > > > o It's now appearing to be an IP issue. The fact that the > > fragmentation doesn't occur on the larger packet is interesting. > > > o The two separate TCP packets leads to an assumption that you can > > identify requests from the same client box at the server. IOW, you > > have an > > application-level protocol that lets you reassemble the two > > packets into a single request. I'm sure this is the case, but such a > > design isn't explicitly stated in your > > message. Your server application never sees the 2 -> 3 split, > > since the normal case is that your server app only sees 2 packets from > > the client. I'm reluctant to say this, but > > part of this process may require proof that the protocol design > > is resilient to network transmission errors. > > > o I'd start playing around w/ different packet sizes and > > transmission rates (via ping) to see if you can trip any triggers. It > > may be a combination of buffering/congestion > > between the client and the server. > > Did you try ping w/ different packet sizes? I realize that you > > have different servers. Does the connection between the client and > > server occur over the public switched network > > or does it use a private circuit? > > > o There have been posts in this thread w/r/t/ SSL and IE. Are they > > relevant? > > > Cheers, > > jec > > > On Dec 5, 1:21 am, Amit Kasher <amitkas...@gmail.com> wrote: > > > > Hi, > > > We have spent the past 2 days working on this, and have some new > > > findings. > > > > We have made contact to one of our customers who is encountering this > > > issue more frequently than others, and he granted us access to his > > > computer (using logmein). We installed WireShark on his computer, as > > > well as on the server. We managed to reproduced the problem with both > > > sniffers in action, and analyze the exact correlating TCP segments > > > according to their sequence and ack numbers. Here are the results. > > > > This is what happens in the valid state: > > > The client sends 2 TCP segments for a GWT service calls, which are > > > supposed to be reassembled to a single PDU which is the entire single > > > HTTP request. The first segment always contains the HTTP request > > > header, and the second TCP segment always contains the HTTP request > > > body. For instance, we see that the client sends a first segment of > > > size 969 bytes, and a second segment of size 454 bytes. In the server > > > we see that these 2 segments become 3 segments. The first is still 969 > > > bytes and contains the HTTP request header; the second is 363 bytes > > > (80% of the original second segment), and the third is the remaining > > > 91 bytes (20% of the original 454 bytes). > > > > In the invalid state, when the problem occurs, the third segment > > > simply does not arrive in the server. It seems that something in the > > > way has split the second 454 bytes segment to 2 segments, and only > > > sent the first one to the server. > > > > 1. If this is something in the client's machine, how come we don't see > > > it in the sniffer? (we even tried removing all firewall/antivirus > > > software, reinstalling the network card driver) > > > 2. If this is not something in the client's machine, how come some > > > clients encounter this much more than others, that never encounter > > > this? > > > > Can it be some kind of network equipment that some of our clients > > > (reminder - different ISPs) go through, and others don't? > > > > Unfortunately, this new info still leaves us clueless... > > > > On Dec 3, 5:16 pm, jchimene <jchim...@gmail.com> wrote: > > > > > On Dec 2, 11:20 pm, Amit Kasher <amitkas...@gmail.com> wrote: > > > > > > Hi and thanks again for your responses. > > > > > No Prob. > > > > > If this "opportunity for excellence" is as pervasive as you suspect, > > > > installing software on a client's computer should be a non-starter > > > > from the perspective that installing it on *any* computer *anywhere on > > > > the planet* should reliably reproduce the issue. You say that tcpdump > > > > shows the packet truncation, so I'm not sure I understand the > > > > requirement to install something on a client machine. My goal in these > > > > past responses has been to absolutely prove that it's the > > > > serialization code (by factoring out the serialization code using > > > > ping), not something peculiar to the transport or session layers. > > > > > Are you using the public switched network to provide client/server > > > > connectivity? If not, nothing you've said so far would eliminate your > > > > network transport service. > > > > > I find it hard to believe it's GWT, as the cargo size is so small as > > > > to be insignificant, and others would have reported this issue by now. > > > > I have to admit that I'm not a user of Java serialization, so there > > > > may have been reports of this serialization issues of which I'm > > > > blissfully unaware. From everything you're saying, it really looks > > > > like the problem is in user-space. It may be a certain code path that > > > > leads to the same serialization invocation logic. I'd start pulling > > > > this code apart, instrumenting the hell out of it and running it > > > > through JUnit or some such automated testing environment. Again, I > > > > understand you've probably done this... > > > > > I'm wondering if there's a specific byte-pattern that's causing this. > > > > Have you tried reordering the structure members? Also, have you > > > > eliminated buffer corruption issues? Since it's cross-browser, what > > > > does the -pretty flag + Firebug reveal? Esp. when profiling the code? > > > > (Although I must admit that you've probably tried all that type of > > > > debugging by now). > > > > > Bueno Suerte, > > > > jec > > > > > > A few more subtle observations and insights: > > > > > 1. It's probably not the server. There are several reasons that lead > > > > > us to believe that the server is not the cause of this issue: (a) We > > > > > switched hosting providers. (b) These providers reside in completely > > > > > different geographical locations - countries. (c) We have always been > > > > > using JBoss on CentOS, but this issue occurs both when we work with > > > > > Apache as a front end using mod_jk to tomcat, as well as when > > > > > eliminating this tier and having clients go directly to tomcat - using > > > > > it as an HTTP server. (d) tcpdump sniffer explicitly shows that the > > > > > server receives ALWAYS EXACTLY 80% of the request payload. Unless this > > > > > is something even lower level in that machine (the VPS software used - > > > > > virtuozzo, the network card/driver, etc.), these observations pretty > > > > > much provides an alibi for the server... I think we'd better focus on > > > > > other places. > > > > > 2. There are indications that this is not inside the browser as well: > > > > > (a) It happens in several GWT versions. (b) It happens "to" all > > > > > browsers, which provides a strong clue, since this code is completely > > > > > different from browser to browser - GWT uses MsXMLHTTP activeX in IE, > > > > > while using completely other objects in other browsers. Since this is > > > > > the underlying mechanism used to perform RPC, it seems that if it > > > > > happens for more than one of them, low chances that this is the cause. > > > > > Still it seems that this MUST be the GWT/client code, since these > > > > > clients, to whom this issue occurs much more often, don't have > > > > > problems in any other websites (we managed to talk to several of > > > > > them). > > > > > One thing that comes to mind is perhaps the GWT serialization code? I > > > > > don't know... > > > > > > Therefore, currently, aside from the possibility that there's a bug in > > > > > the GWT serialization code, there's also the possibility that it's > > > > > something in the network, even though these clients are from various > > > > > ISPs, > > ... > > read more » --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Google Web Toolkit" group. To post to this group, send email to Google-Web-Toolkit@googlegroups.com To unsubscribe from this group, send email to google-web-toolkit+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/Google-Web-Toolkit?hl=en -~----------~----~----~----~------~----~------~--~---