RE: tomcat 5.0.16 Replication
Hrmmm, perhaps I should reboot using the non-SMP kernel and try it. I'll have to do that when I get back to the servers. -Original Message- From: Steve Nelson [mailto:[EMAIL PROTECTED] Sent: Friday, January 09, 2004 2:04 PM To: 'Tomcat Users List' Subject: RE: tomcat 5.0.16 Replication uname -a machine #1) Linux draco 2.4.20-8smp #1 SMP Thu Mar 13 17:45:54 EST 2003 i686 i686 i386 GNU/Linux machine #2) Linux scorpio 2.4.20-8smp #1 SMP Thu Mar 13 17:45:54 EST 2003 i686 i686 i386 GNU/Linux java -version: java version "1.4.2_03" Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_03-b02) Java HotSpot(TM) Client VM (build 1.4.2_03-b02, mixed mode) same on both -Original Message- From: Filip Hanik [mailto:[EMAIL PROTECTED] Sent: Friday, January 09, 2004 1:56 PM To: Tomcat Users List Subject: RE: tomcat 5.0.16 Replication [EMAIL PROTECTED] bin]# uname -a Linux rh9 2.4.20-8 #1 Thu Mar 13 17:54:28 EST 2003 i686 i686 i386 GNU/Linux [EMAIL PROTECTED] bin]# java -version java version "1.4.2_03" Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_03-b02) Java HotSpot(TM) Client VM (build 1.4.2_03-b02, mixed mode) -Original Message- From: Steve Nelson [mailto:[EMAIL PROTECTED] Sent: Friday, January 09, 2004 11:05 AM To: 'Tomcat Users List' Subject: RE: tomcat 5.0.16 Replication sun JDK 1.4.2 for Linux Kernel 2.4.20-8smp Tomcat 5.0.16 with catalina-cluster.jar from CVS head Hrmmmare yours SMP servers? Could be something odd with synch if that is the case. -Original Message- From: Filip Hanik [mailto:[EMAIL PROTECTED] Sent: Friday, January 09, 2004 1:01 PM To: Tomcat Users List Subject: RE: tomcat 5.0.16 Replication interesting, mine doesn't work at all unless I set the LD_ASSUME_KERNEL what VM (version and name) are you using? Filip -Original Message- From: Steve Nelson [mailto:[EMAIL PROTECTED] Sent: Friday, January 09, 2004 10:59 AM To: 'Tomcat Users List' Subject: RE: tomcat 5.0.16 Replication Now that's really very strange. I am running RH9 and everything seems to go through just fine. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Friday, January 09, 2004 12:56 PM To: Tomcat Users List Subject: Re: tomcat 5.0.16 Replication The replication message ACK never get back to the sender. So my webpages never loads without that flag. I think it is only needed under REDHAT 9. Jean-Philippe Bélanger Steve Nelson wrote: >I don't seem to need the ld_assume_kernel thing. What are the symptoms when >it is required? > > >-Original Message- >From: [EMAIL PROTECTED] >[mailto:[EMAIL PROTECTED] >Sent: Friday, January 09, 2004 12:33 PM >To: Tomcat Users List >Subject: Re: tomcat 5.0.16 Replication > > >Just tried the CVS head and everything works with any CPU going crazy! >only if ld_assume_kernel is set to 2.4 > >One more question for you Filip, is the useDirtyFlag working at all? It >seams like even if it's set to true, the whole session gets replicated >after each request. :( > >Jean-Philippe > >[EMAIL PROTECTED] wrote: > > > >>Hurray for Fillip! :) >> >>I'll get the CVS head for the module today and test this out. >>Happy to see that it got fixed that quickly! >> >>Thanks again and I'll let you know how it goes >> >>Jean-Philippe >> >>Filip Hanik wrote: >> >> >> >>>Jean-Philippe and Steve, >>>I fixed the bug, and tried replication on RH9. Immediately it didn't >>>work. >>>The problem is that when RH9 tries to write the ACK back to the NIO >>>socket, >>>it never reaches the other node. and times out after a long time. >>> >>>I set LD_ASSUME_KERNEL=2.4 and it started to work >>> >>>Filip >>> >>>-Original Message- >>>From: Filip Hanik [mailto:[EMAIL PROTECTED] >>>Sent: Thursday, January 08, 2004 6:43 PM >>>To: Tomcat Users List >>>Subject: RE: tomcat 5.0.16 Replication >>> >>> >>>ok guys, >>>good news. The 100% cpu is totally my fault. I messed up on that one. >>>I was registering OP_WRITE as an interest >>>this is not good :) >>>checking in the working code in 15 min, some more regression tests >>>Filip >>> >>>-Original Message- >>>From: Filip Hanik [mailto:[EMAIL PROTECTED] >>>Sent: Thursday, January 08, 2004 2:54 PM >>>To: Tomcat Users List >>>Subject: RE: tomcat 5.0.16 Replication >>> >>> >>>another code change was, that I am now accepting keys for OP_READ and >>>OP_WRITE. before it was only OP_READ, >>>but for synchronous
RE: tomcat 5.0.16 Replication
uname -a machine #1) Linux draco 2.4.20-8smp #1 SMP Thu Mar 13 17:45:54 EST 2003 i686 i686 i386 GNU/Linux machine #2) Linux scorpio 2.4.20-8smp #1 SMP Thu Mar 13 17:45:54 EST 2003 i686 i686 i386 GNU/Linux java -version: java version "1.4.2_03" Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_03-b02) Java HotSpot(TM) Client VM (build 1.4.2_03-b02, mixed mode) same on both -Original Message- From: Filip Hanik [mailto:[EMAIL PROTECTED] Sent: Friday, January 09, 2004 1:56 PM To: Tomcat Users List Subject: RE: tomcat 5.0.16 Replication [EMAIL PROTECTED] bin]# uname -a Linux rh9 2.4.20-8 #1 Thu Mar 13 17:54:28 EST 2003 i686 i686 i386 GNU/Linux [EMAIL PROTECTED] bin]# java -version java version "1.4.2_03" Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_03-b02) Java HotSpot(TM) Client VM (build 1.4.2_03-b02, mixed mode) -Original Message- From: Steve Nelson [mailto:[EMAIL PROTECTED] Sent: Friday, January 09, 2004 11:05 AM To: 'Tomcat Users List' Subject: RE: tomcat 5.0.16 Replication sun JDK 1.4.2 for Linux Kernel 2.4.20-8smp Tomcat 5.0.16 with catalina-cluster.jar from CVS head Hrmmmare yours SMP servers? Could be something odd with synch if that is the case. -Original Message- From: Filip Hanik [mailto:[EMAIL PROTECTED] Sent: Friday, January 09, 2004 1:01 PM To: Tomcat Users List Subject: RE: tomcat 5.0.16 Replication interesting, mine doesn't work at all unless I set the LD_ASSUME_KERNEL what VM (version and name) are you using? Filip -Original Message- From: Steve Nelson [mailto:[EMAIL PROTECTED] Sent: Friday, January 09, 2004 10:59 AM To: 'Tomcat Users List' Subject: RE: tomcat 5.0.16 Replication Now that's really very strange. I am running RH9 and everything seems to go through just fine. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Friday, January 09, 2004 12:56 PM To: Tomcat Users List Subject: Re: tomcat 5.0.16 Replication The replication message ACK never get back to the sender. So my webpages never loads without that flag. I think it is only needed under REDHAT 9. Jean-Philippe Bélanger Steve Nelson wrote: >I don't seem to need the ld_assume_kernel thing. What are the symptoms when >it is required? > > >-Original Message- >From: [EMAIL PROTECTED] >[mailto:[EMAIL PROTECTED] >Sent: Friday, January 09, 2004 12:33 PM >To: Tomcat Users List >Subject: Re: tomcat 5.0.16 Replication > > >Just tried the CVS head and everything works with any CPU going crazy! >only if ld_assume_kernel is set to 2.4 > >One more question for you Filip, is the useDirtyFlag working at all? It >seams like even if it's set to true, the whole session gets replicated >after each request. :( > >Jean-Philippe > >[EMAIL PROTECTED] wrote: > > > >>Hurray for Fillip! :) >> >>I'll get the CVS head for the module today and test this out. >>Happy to see that it got fixed that quickly! >> >>Thanks again and I'll let you know how it goes >> >>Jean-Philippe >> >>Filip Hanik wrote: >> >> >> >>>Jean-Philippe and Steve, >>>I fixed the bug, and tried replication on RH9. Immediately it didn't >>>work. >>>The problem is that when RH9 tries to write the ACK back to the NIO >>>socket, >>>it never reaches the other node. and times out after a long time. >>> >>>I set LD_ASSUME_KERNEL=2.4 and it started to work >>> >>>Filip >>> >>>-Original Message- >>>From: Filip Hanik [mailto:[EMAIL PROTECTED] >>>Sent: Thursday, January 08, 2004 6:43 PM >>>To: Tomcat Users List >>>Subject: RE: tomcat 5.0.16 Replication >>> >>> >>>ok guys, >>>good news. The 100% cpu is totally my fault. I messed up on that one. >>>I was registering OP_WRITE as an interest >>>this is not good :) >>>checking in the working code in 15 min, some more regression tests >>>Filip >>> >>>-Original Message- >>>From: Filip Hanik [mailto:[EMAIL PROTECTED] >>>Sent: Thursday, January 08, 2004 2:54 PM >>>To: Tomcat Users List >>>Subject: RE: tomcat 5.0.16 Replication >>> >>> >>>another code change was, that I am now accepting keys for OP_READ and >>>OP_WRITE. before it was only OP_READ, >>>but for synchronous replication I need both. >>> >>>this is good info, I just got RH9 installed. will be trying it out >>>this and >>>next week. >>> >>>Filip >>> >>>-Original Message- >>>From: [EMAIL PROTECTED] >>>[mailto:[EMAIL PROTE
RE: tomcat 5.0.16 Replication
[EMAIL PROTECTED] bin]# uname -a Linux rh9 2.4.20-8 #1 Thu Mar 13 17:54:28 EST 2003 i686 i686 i386 GNU/Linux [EMAIL PROTECTED] bin]# java -version java version "1.4.2_03" Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_03-b02) Java HotSpot(TM) Client VM (build 1.4.2_03-b02, mixed mode) -Original Message- From: Steve Nelson [mailto:[EMAIL PROTECTED] Sent: Friday, January 09, 2004 11:05 AM To: 'Tomcat Users List' Subject: RE: tomcat 5.0.16 Replication sun JDK 1.4.2 for Linux Kernel 2.4.20-8smp Tomcat 5.0.16 with catalina-cluster.jar from CVS head Hrmmmare yours SMP servers? Could be something odd with synch if that is the case. -Original Message- From: Filip Hanik [mailto:[EMAIL PROTECTED] Sent: Friday, January 09, 2004 1:01 PM To: Tomcat Users List Subject: RE: tomcat 5.0.16 Replication interesting, mine doesn't work at all unless I set the LD_ASSUME_KERNEL what VM (version and name) are you using? Filip -Original Message- From: Steve Nelson [mailto:[EMAIL PROTECTED] Sent: Friday, January 09, 2004 10:59 AM To: 'Tomcat Users List' Subject: RE: tomcat 5.0.16 Replication Now that's really very strange. I am running RH9 and everything seems to go through just fine. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Friday, January 09, 2004 12:56 PM To: Tomcat Users List Subject: Re: tomcat 5.0.16 Replication The replication message ACK never get back to the sender. So my webpages never loads without that flag. I think it is only needed under REDHAT 9. Jean-Philippe Bélanger Steve Nelson wrote: >I don't seem to need the ld_assume_kernel thing. What are the symptoms when >it is required? > > >-Original Message- >From: [EMAIL PROTECTED] >[mailto:[EMAIL PROTECTED] >Sent: Friday, January 09, 2004 12:33 PM >To: Tomcat Users List >Subject: Re: tomcat 5.0.16 Replication > > >Just tried the CVS head and everything works with any CPU going crazy! >only if ld_assume_kernel is set to 2.4 > >One more question for you Filip, is the useDirtyFlag working at all? It >seams like even if it's set to true, the whole session gets replicated >after each request. :( > >Jean-Philippe > >[EMAIL PROTECTED] wrote: > > > >>Hurray for Fillip! :) >> >>I'll get the CVS head for the module today and test this out. >>Happy to see that it got fixed that quickly! >> >>Thanks again and I'll let you know how it goes >> >>Jean-Philippe >> >>Filip Hanik wrote: >> >> >> >>>Jean-Philippe and Steve, >>>I fixed the bug, and tried replication on RH9. Immediately it didn't >>>work. >>>The problem is that when RH9 tries to write the ACK back to the NIO >>>socket, >>>it never reaches the other node. and times out after a long time. >>> >>>I set LD_ASSUME_KERNEL=2.4 and it started to work >>> >>>Filip >>> >>>-Original Message- >>>From: Filip Hanik [mailto:[EMAIL PROTECTED] >>>Sent: Thursday, January 08, 2004 6:43 PM >>>To: Tomcat Users List >>>Subject: RE: tomcat 5.0.16 Replication >>> >>> >>>ok guys, >>>good news. The 100% cpu is totally my fault. I messed up on that one. >>>I was registering OP_WRITE as an interest >>>this is not good :) >>>checking in the working code in 15 min, some more regression tests >>>Filip >>> >>>-Original Message- >>>From: Filip Hanik [mailto:[EMAIL PROTECTED] >>>Sent: Thursday, January 08, 2004 2:54 PM >>>To: Tomcat Users List >>>Subject: RE: tomcat 5.0.16 Replication >>> >>> >>>another code change was, that I am now accepting keys for OP_READ and >>>OP_WRITE. before it was only OP_READ, >>>but for synchronous replication I need both. >>> >>>this is good info, I just got RH9 installed. will be trying it out >>>this and >>>next week. >>> >>>Filip >>> >>>-Original Message- >>>From: [EMAIL PROTECTED] >>>[mailto:[EMAIL PROTECTED] >>>Sent: Thursday, January 08, 2004 11:46 AM >>>To: Tomcat Users List >>>Subject: Re: tomcat 5.0.16 Replication >>> >>> >>>The only changes in the ReplicationListener class is the try catch that >>>was added. >>> >>>the code logic is the same. Weird enough. So it's probably elsewhere >>>that something changed in the state of the SelectionKey. >>> >>>Jean-Philippe Bélanger >>> >>>Steve Nelson wrote: >>> >>> >>> &
Re: tomcat 5.0.16 Replication
uname -a reports: 2.4.20-8 #1 Thu Mar 13 17:54:28 EST 2003 i686 i686 i386 GNU/Linux Filip Hanik wrote: interesting, mine doesn't work at all unless I set the LD_ASSUME_KERNEL what VM (version and name) are you using? Filip -Original Message- From: Steve Nelson [mailto:[EMAIL PROTECTED] Sent: Friday, January 09, 2004 10:59 AM To: 'Tomcat Users List' Subject: RE: tomcat 5.0.16 Replication Now that's really very strange. I am running RH9 and everything seems to go through just fine. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Friday, January 09, 2004 12:56 PM To: Tomcat Users List Subject: Re: tomcat 5.0.16 Replication The replication message ACK never get back to the sender. So my webpages never loads without that flag. I think it is only needed under REDHAT 9. Jean-Philippe Bélanger Steve Nelson wrote: I don't seem to need the ld_assume_kernel thing. What are the symptoms when it is required? -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Friday, January 09, 2004 12:33 PM To: Tomcat Users List Subject: Re: tomcat 5.0.16 Replication Just tried the CVS head and everything works with any CPU going crazy! only if ld_assume_kernel is set to 2.4 One more question for you Filip, is the useDirtyFlag working at all? It seams like even if it's set to true, the whole session gets replicated after each request. :( Jean-Philippe [EMAIL PROTECTED] wrote: Hurray for Fillip! :) I'll get the CVS head for the module today and test this out. Happy to see that it got fixed that quickly! Thanks again and I'll let you know how it goes Jean-Philippe Filip Hanik wrote: Jean-Philippe and Steve, I fixed the bug, and tried replication on RH9. Immediately it didn't work. The problem is that when RH9 tries to write the ACK back to the NIO socket, it never reaches the other node. and times out after a long time. I set LD_ASSUME_KERNEL=2.4 and it started to work Filip -Original Message- From: Filip Hanik [mailto:[EMAIL PROTECTED] Sent: Thursday, January 08, 2004 6:43 PM To: Tomcat Users List Subject: RE: tomcat 5.0.16 Replication ok guys, good news. The 100% cpu is totally my fault. I messed up on that one. I was registering OP_WRITE as an interest this is not good :) checking in the working code in 15 min, some more regression tests Filip -Original Message- From: Filip Hanik [mailto:[EMAIL PROTECTED] Sent: Thursday, January 08, 2004 2:54 PM To: Tomcat Users List Subject: RE: tomcat 5.0.16 Replication another code change was, that I am now accepting keys for OP_READ and OP_WRITE. before it was only OP_READ, but for synchronous replication I need both. this is good info, I just got RH9 installed. will be trying it out this and next week. Filip -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Thursday, January 08, 2004 11:46 AM To: Tomcat Users List Subject: Re: tomcat 5.0.16 Replication The only changes in the ReplicationListener class is the try catch that was added. the code logic is the same. Weird enough. So it's probably elsewhere that something changed in the state of the SelectionKey. Jean-Philippe Bélanger Steve Nelson wrote: I was just about to try this actually. I found through googling alot of people having problems with select with 1.4 and NIO with Redhat 9. They were actually experiencing crashes though. To verify your results I just put a Thread.Sleep(1); where you suggested and I also see the jump in performance. Something must have changed in ReplicationListener that causes this because the 5.0.16 version doesn't seem to have the problem. I'll see if I can figure it out when I get back to where I can diff the files. -Steve -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Thursday, January 08, 2004 12:25 PM To: Tomcat Users List Subject: Re: tomcat 5.0.16 Replication More content for you Filip. I've checked and followed the code of the listen event in ReplicationListener.java Here's what happening: selector.select(timeout) -> return immediatly with one SelectorKey available That key is not Acceptable and not Readable so it immediatly skip those IFs and loops back to the beginning. I've put traces and this is executed once every millisecond hence the 100% load on the server. Just to make sure, I've put a Thread.sleep(10) at the end of the loop and the CPU dropped back to 0% and the replication still worked nicely but probably a little slower since the wait of 10ms. I don't know much about those NIO packages but seams like the select(timeout) method shouldn't return a SelectorKey of that state. with any waiting. Let me know what you can dig from those. Jean-Philippe Bélanger [EMAIL PROTECTED] wrote: Hi Filip. I did
RE: tomcat 5.0.16 Replication
sun JDK 1.4.2 for Linux Kernel 2.4.20-8smp Tomcat 5.0.16 with catalina-cluster.jar from CVS head Hrmmmare yours SMP servers? Could be something odd with synch if that is the case. -Original Message- From: Filip Hanik [mailto:[EMAIL PROTECTED] Sent: Friday, January 09, 2004 1:01 PM To: Tomcat Users List Subject: RE: tomcat 5.0.16 Replication interesting, mine doesn't work at all unless I set the LD_ASSUME_KERNEL what VM (version and name) are you using? Filip -Original Message- From: Steve Nelson [mailto:[EMAIL PROTECTED] Sent: Friday, January 09, 2004 10:59 AM To: 'Tomcat Users List' Subject: RE: tomcat 5.0.16 Replication Now that's really very strange. I am running RH9 and everything seems to go through just fine. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Friday, January 09, 2004 12:56 PM To: Tomcat Users List Subject: Re: tomcat 5.0.16 Replication The replication message ACK never get back to the sender. So my webpages never loads without that flag. I think it is only needed under REDHAT 9. Jean-Philippe Bélanger Steve Nelson wrote: >I don't seem to need the ld_assume_kernel thing. What are the symptoms when >it is required? > > >-Original Message- >From: [EMAIL PROTECTED] >[mailto:[EMAIL PROTECTED] >Sent: Friday, January 09, 2004 12:33 PM >To: Tomcat Users List >Subject: Re: tomcat 5.0.16 Replication > > >Just tried the CVS head and everything works with any CPU going crazy! >only if ld_assume_kernel is set to 2.4 > >One more question for you Filip, is the useDirtyFlag working at all? It >seams like even if it's set to true, the whole session gets replicated >after each request. :( > >Jean-Philippe > >[EMAIL PROTECTED] wrote: > > > >>Hurray for Fillip! :) >> >>I'll get the CVS head for the module today and test this out. >>Happy to see that it got fixed that quickly! >> >>Thanks again and I'll let you know how it goes >> >>Jean-Philippe >> >>Filip Hanik wrote: >> >> >> >>>Jean-Philippe and Steve, >>>I fixed the bug, and tried replication on RH9. Immediately it didn't >>>work. >>>The problem is that when RH9 tries to write the ACK back to the NIO >>>socket, >>>it never reaches the other node. and times out after a long time. >>> >>>I set LD_ASSUME_KERNEL=2.4 and it started to work >>> >>>Filip >>> >>>-Original Message- >>>From: Filip Hanik [mailto:[EMAIL PROTECTED] >>>Sent: Thursday, January 08, 2004 6:43 PM >>>To: Tomcat Users List >>>Subject: RE: tomcat 5.0.16 Replication >>> >>> >>>ok guys, >>>good news. The 100% cpu is totally my fault. I messed up on that one. >>>I was registering OP_WRITE as an interest >>>this is not good :) >>>checking in the working code in 15 min, some more regression tests >>>Filip >>> >>>-Original Message- >>>From: Filip Hanik [mailto:[EMAIL PROTECTED] >>>Sent: Thursday, January 08, 2004 2:54 PM >>>To: Tomcat Users List >>>Subject: RE: tomcat 5.0.16 Replication >>> >>> >>>another code change was, that I am now accepting keys for OP_READ and >>>OP_WRITE. before it was only OP_READ, >>>but for synchronous replication I need both. >>> >>>this is good info, I just got RH9 installed. will be trying it out >>>this and >>>next week. >>> >>>Filip >>> >>>-Original Message- >>>From: [EMAIL PROTECTED] >>>[mailto:[EMAIL PROTECTED] >>>Sent: Thursday, January 08, 2004 11:46 AM >>>To: Tomcat Users List >>>Subject: Re: tomcat 5.0.16 Replication >>> >>> >>>The only changes in the ReplicationListener class is the try catch that >>>was added. >>> >>>the code logic is the same. Weird enough. So it's probably elsewhere >>>that something changed in the state of the SelectionKey. >>> >>>Jean-Philippe Bélanger >>> >>>Steve Nelson wrote: >>> >>> >>> >>> >>> >>>>I was just about to try this actually. I found through googling alot of >>>>people >>>>having problems with select with 1.4 and NIO with Redhat 9. They were >>>>actually >>>>experiencing crashes though. >>>> >>>>To verify your results I just put a Thread.Sleep(1); where you >>>> >>>> >>>> >>>suggested and >>> >>> >>
RE: tomcat 5.0.16 Replication
interesting, mine doesn't work at all unless I set the LD_ASSUME_KERNEL what VM (version and name) are you using? Filip -Original Message- From: Steve Nelson [mailto:[EMAIL PROTECTED] Sent: Friday, January 09, 2004 10:59 AM To: 'Tomcat Users List' Subject: RE: tomcat 5.0.16 Replication Now that's really very strange. I am running RH9 and everything seems to go through just fine. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Friday, January 09, 2004 12:56 PM To: Tomcat Users List Subject: Re: tomcat 5.0.16 Replication The replication message ACK never get back to the sender. So my webpages never loads without that flag. I think it is only needed under REDHAT 9. Jean-Philippe Bélanger Steve Nelson wrote: >I don't seem to need the ld_assume_kernel thing. What are the symptoms when >it is required? > > >-Original Message- >From: [EMAIL PROTECTED] >[mailto:[EMAIL PROTECTED] >Sent: Friday, January 09, 2004 12:33 PM >To: Tomcat Users List >Subject: Re: tomcat 5.0.16 Replication > > >Just tried the CVS head and everything works with any CPU going crazy! >only if ld_assume_kernel is set to 2.4 > >One more question for you Filip, is the useDirtyFlag working at all? It >seams like even if it's set to true, the whole session gets replicated >after each request. :( > >Jean-Philippe > >[EMAIL PROTECTED] wrote: > > > >>Hurray for Fillip! :) >> >>I'll get the CVS head for the module today and test this out. >>Happy to see that it got fixed that quickly! >> >>Thanks again and I'll let you know how it goes >> >>Jean-Philippe >> >>Filip Hanik wrote: >> >> >> >>>Jean-Philippe and Steve, >>>I fixed the bug, and tried replication on RH9. Immediately it didn't >>>work. >>>The problem is that when RH9 tries to write the ACK back to the NIO >>>socket, >>>it never reaches the other node. and times out after a long time. >>> >>>I set LD_ASSUME_KERNEL=2.4 and it started to work >>> >>>Filip >>> >>>-Original Message- >>>From: Filip Hanik [mailto:[EMAIL PROTECTED] >>>Sent: Thursday, January 08, 2004 6:43 PM >>>To: Tomcat Users List >>>Subject: RE: tomcat 5.0.16 Replication >>> >>> >>>ok guys, >>>good news. The 100% cpu is totally my fault. I messed up on that one. >>>I was registering OP_WRITE as an interest >>>this is not good :) >>>checking in the working code in 15 min, some more regression tests >>>Filip >>> >>>-Original Message- >>>From: Filip Hanik [mailto:[EMAIL PROTECTED] >>>Sent: Thursday, January 08, 2004 2:54 PM >>>To: Tomcat Users List >>>Subject: RE: tomcat 5.0.16 Replication >>> >>> >>>another code change was, that I am now accepting keys for OP_READ and >>>OP_WRITE. before it was only OP_READ, >>>but for synchronous replication I need both. >>> >>>this is good info, I just got RH9 installed. will be trying it out >>>this and >>>next week. >>> >>>Filip >>> >>>-Original Message- >>>From: [EMAIL PROTECTED] >>>[mailto:[EMAIL PROTECTED] >>>Sent: Thursday, January 08, 2004 11:46 AM >>>To: Tomcat Users List >>>Subject: Re: tomcat 5.0.16 Replication >>> >>> >>>The only changes in the ReplicationListener class is the try catch that >>>was added. >>> >>>the code logic is the same. Weird enough. So it's probably elsewhere >>>that something changed in the state of the SelectionKey. >>> >>>Jean-Philippe Bélanger >>> >>>Steve Nelson wrote: >>> >>> >>> >>> >>> >>>>I was just about to try this actually. I found through googling alot of >>>>people >>>>having problems with select with 1.4 and NIO with Redhat 9. They were >>>>actually >>>>experiencing crashes though. >>>> >>>>To verify your results I just put a Thread.Sleep(1); where you >>>> >>>> >>>> >>>suggested and >>> >>> >>> >>> >>>>I also see the jump in performance. >>>> >>>>Something must have changed in ReplicationListener that causes this >>>>because >>>>the 5.0.16 >>>>version doesn't seem to have the problem. I'll see if I can figure >>>>it out >>>>when I ge
RE: tomcat 5.0.16 Replication
Now that's really very strange. I am running RH9 and everything seems to go through just fine. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Friday, January 09, 2004 12:56 PM To: Tomcat Users List Subject: Re: tomcat 5.0.16 Replication The replication message ACK never get back to the sender. So my webpages never loads without that flag. I think it is only needed under REDHAT 9. Jean-Philippe Bélanger Steve Nelson wrote: >I don't seem to need the ld_assume_kernel thing. What are the symptoms when >it is required? > > >-Original Message- >From: [EMAIL PROTECTED] >[mailto:[EMAIL PROTECTED] >Sent: Friday, January 09, 2004 12:33 PM >To: Tomcat Users List >Subject: Re: tomcat 5.0.16 Replication > > >Just tried the CVS head and everything works with any CPU going crazy! >only if ld_assume_kernel is set to 2.4 > >One more question for you Filip, is the useDirtyFlag working at all? It >seams like even if it's set to true, the whole session gets replicated >after each request. :( > >Jean-Philippe > >[EMAIL PROTECTED] wrote: > > > >>Hurray for Fillip! :) >> >>I'll get the CVS head for the module today and test this out. >>Happy to see that it got fixed that quickly! >> >>Thanks again and I'll let you know how it goes >> >>Jean-Philippe >> >>Filip Hanik wrote: >> >> >> >>>Jean-Philippe and Steve, >>>I fixed the bug, and tried replication on RH9. Immediately it didn't >>>work. >>>The problem is that when RH9 tries to write the ACK back to the NIO >>>socket, >>>it never reaches the other node. and times out after a long time. >>> >>>I set LD_ASSUME_KERNEL=2.4 and it started to work >>> >>>Filip >>> >>>-Original Message- >>>From: Filip Hanik [mailto:[EMAIL PROTECTED] >>>Sent: Thursday, January 08, 2004 6:43 PM >>>To: Tomcat Users List >>>Subject: RE: tomcat 5.0.16 Replication >>> >>> >>>ok guys, >>>good news. The 100% cpu is totally my fault. I messed up on that one. >>>I was registering OP_WRITE as an interest >>>this is not good :) >>>checking in the working code in 15 min, some more regression tests >>>Filip >>> >>>-Original Message- >>>From: Filip Hanik [mailto:[EMAIL PROTECTED] >>>Sent: Thursday, January 08, 2004 2:54 PM >>>To: Tomcat Users List >>>Subject: RE: tomcat 5.0.16 Replication >>> >>> >>>another code change was, that I am now accepting keys for OP_READ and >>>OP_WRITE. before it was only OP_READ, >>>but for synchronous replication I need both. >>> >>>this is good info, I just got RH9 installed. will be trying it out >>>this and >>>next week. >>> >>>Filip >>> >>>-Original Message- >>>From: [EMAIL PROTECTED] >>>[mailto:[EMAIL PROTECTED] >>>Sent: Thursday, January 08, 2004 11:46 AM >>>To: Tomcat Users List >>>Subject: Re: tomcat 5.0.16 Replication >>> >>> >>>The only changes in the ReplicationListener class is the try catch that >>>was added. >>> >>>the code logic is the same. Weird enough. So it's probably elsewhere >>>that something changed in the state of the SelectionKey. >>> >>>Jean-Philippe Bélanger >>> >>>Steve Nelson wrote: >>> >>> >>> >>> >>> >>>>I was just about to try this actually. I found through googling alot of >>>>people >>>>having problems with select with 1.4 and NIO with Redhat 9. They were >>>>actually >>>>experiencing crashes though. >>>> >>>>To verify your results I just put a Thread.Sleep(1); where you >>>> >>>> >>>> >>>suggested and >>> >>> >>> >>> >>>>I also see the jump in performance. >>>> >>>>Something must have changed in ReplicationListener that causes this >>>>because >>>>the 5.0.16 >>>>version doesn't seem to have the problem. I'll see if I can figure >>>>it out >>>>when I get back to where I can diff the files. >>>> >>>>-Steve >>>> >>>>-Original Message- >>>>From: [EMAIL PROTECTED] >>>>[mailto:[EMAIL PROTECTED] >>>>Sent: Thur
Re: tomcat 5.0.16 Replication
The replication message ACK never get back to the sender. So my webpages never loads without that flag. I think it is only needed under REDHAT 9. Jean-Philippe Bélanger Steve Nelson wrote: I don't seem to need the ld_assume_kernel thing. What are the symptoms when it is required? -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Friday, January 09, 2004 12:33 PM To: Tomcat Users List Subject: Re: tomcat 5.0.16 Replication Just tried the CVS head and everything works with any CPU going crazy! only if ld_assume_kernel is set to 2.4 One more question for you Filip, is the useDirtyFlag working at all? It seams like even if it's set to true, the whole session gets replicated after each request. :( Jean-Philippe [EMAIL PROTECTED] wrote: Hurray for Fillip! :) I'll get the CVS head for the module today and test this out. Happy to see that it got fixed that quickly! Thanks again and I'll let you know how it goes Jean-Philippe Filip Hanik wrote: Jean-Philippe and Steve, I fixed the bug, and tried replication on RH9. Immediately it didn't work. The problem is that when RH9 tries to write the ACK back to the NIO socket, it never reaches the other node. and times out after a long time. I set LD_ASSUME_KERNEL=2.4 and it started to work Filip -Original Message- From: Filip Hanik [mailto:[EMAIL PROTECTED] Sent: Thursday, January 08, 2004 6:43 PM To: Tomcat Users List Subject: RE: tomcat 5.0.16 Replication ok guys, good news. The 100% cpu is totally my fault. I messed up on that one. I was registering OP_WRITE as an interest this is not good :) checking in the working code in 15 min, some more regression tests Filip -Original Message- From: Filip Hanik [mailto:[EMAIL PROTECTED] Sent: Thursday, January 08, 2004 2:54 PM To: Tomcat Users List Subject: RE: tomcat 5.0.16 Replication another code change was, that I am now accepting keys for OP_READ and OP_WRITE. before it was only OP_READ, but for synchronous replication I need both. this is good info, I just got RH9 installed. will be trying it out this and next week. Filip -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Thursday, January 08, 2004 11:46 AM To: Tomcat Users List Subject: Re: tomcat 5.0.16 Replication The only changes in the ReplicationListener class is the try catch that was added. the code logic is the same. Weird enough. So it's probably elsewhere that something changed in the state of the SelectionKey. Jean-Philippe Bélanger Steve Nelson wrote: I was just about to try this actually. I found through googling alot of people having problems with select with 1.4 and NIO with Redhat 9. They were actually experiencing crashes though. To verify your results I just put a Thread.Sleep(1); where you suggested and I also see the jump in performance. Something must have changed in ReplicationListener that causes this because the 5.0.16 version doesn't seem to have the problem. I'll see if I can figure it out when I get back to where I can diff the files. -Steve -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Thursday, January 08, 2004 12:25 PM To: Tomcat Users List Subject: Re: tomcat 5.0.16 Replication More content for you Filip. I've checked and followed the code of the listen event in ReplicationListener.java Here's what happening: selector.select(timeout) -> return immediatly with one SelectorKey available That key is not Acceptable and not Readable so it immediatly skip those IFs and loops back to the beginning. I've put traces and this is executed once every millisecond hence the 100% load on the server. Just to make sure, I've put a Thread.sleep(10) at the end of the loop and the CPU dropped back to 0% and the replication still worked nicely but probably a little slower since the wait of 10ms. I don't know much about those NIO packages but seams like the select(timeout) method shouldn't return a SelectorKey of that state. with any waiting. Let me know what you can dig from those. Jean-Philippe Bélanger [EMAIL PROTECTED] wrote: Hi Filip. I did some profiling of 40mins of tomcat with and without a 2nd node up. here are the results with -Xrunhprof:cpu=samples,thread=y,file=/u01/portal/java.hprof.txt,depth=10: Those number are cpu=times and not samples since the later one freezes on my systems. So that list shows the time spent in each methods. Major difference the some call to the sun.nio.ch.PollArrayWrapper class. I don't know much about those NIOs packages but 819000 call in 40 mins is a lot. The Socket Interface was called more than twice with 2 hosts than with a single one. Which seams normal. Maybe this can help. If you need the complete hprof file I can send them to you. 1 host in cluster: CPU TIME (ms) BEGIN (total = 19701) Thu Jan 8 10:00
Re: tomcat 5.0.16 Replication
A... I see. So no way to have only the value which setAttribute was called on to be replicated (yet...) ? Thanks Jean-Philippe Bélanger Filip Hanik wrote: useDirtyFlag=true means that session (yes the whole) only gets replicated when setAttribute and removeAttribute is called -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Friday, January 09, 2004 10:33 AM To: Tomcat Users List Subject: Re: tomcat 5.0.16 Replication Just tried the CVS head and everything works with any CPU going crazy! only if ld_assume_kernel is set to 2.4 One more question for you Filip, is the useDirtyFlag working at all? It seams like even if it's set to true, the whole session gets replicated after each request. :( Jean-Philippe [EMAIL PROTECTED] wrote: Hurray for Fillip! :) I'll get the CVS head for the module today and test this out. Happy to see that it got fixed that quickly! Thanks again and I'll let you know how it goes Jean-Philippe Filip Hanik wrote: Jean-Philippe and Steve, I fixed the bug, and tried replication on RH9. Immediately it didn't work. The problem is that when RH9 tries to write the ACK back to the NIO socket, it never reaches the other node. and times out after a long time. I set LD_ASSUME_KERNEL=2.4 and it started to work Filip -Original Message- From: Filip Hanik [mailto:[EMAIL PROTECTED] Sent: Thursday, January 08, 2004 6:43 PM To: Tomcat Users List Subject: RE: tomcat 5.0.16 Replication ok guys, good news. The 100% cpu is totally my fault. I messed up on that one. I was registering OP_WRITE as an interest this is not good :) checking in the working code in 15 min, some more regression tests Filip -Original Message- From: Filip Hanik [mailto:[EMAIL PROTECTED] Sent: Thursday, January 08, 2004 2:54 PM To: Tomcat Users List Subject: RE: tomcat 5.0.16 Replication another code change was, that I am now accepting keys for OP_READ and OP_WRITE. before it was only OP_READ, but for synchronous replication I need both. this is good info, I just got RH9 installed. will be trying it out this and next week. Filip -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Thursday, January 08, 2004 11:46 AM To: Tomcat Users List Subject: Re: tomcat 5.0.16 Replication The only changes in the ReplicationListener class is the try catch that was added. the code logic is the same. Weird enough. So it's probably elsewhere that something changed in the state of the SelectionKey. Jean-Philippe Bélanger Steve Nelson wrote: I was just about to try this actually. I found through googling alot of people having problems with select with 1.4 and NIO with Redhat 9. They were actually experiencing crashes though. To verify your results I just put a Thread.Sleep(1); where you suggested and I also see the jump in performance. Something must have changed in ReplicationListener that causes this because the 5.0.16 version doesn't seem to have the problem. I'll see if I can figure it out when I get back to where I can diff the files. -Steve -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Thursday, January 08, 2004 12:25 PM To: Tomcat Users List Subject: Re: tomcat 5.0.16 Replication More content for you Filip. I've checked and followed the code of the listen event in ReplicationListener.java Here's what happening: selector.select(timeout) -> return immediatly with one SelectorKey available That key is not Acceptable and not Readable so it immediatly skip those IFs and loops back to the beginning. I've put traces and this is executed once every millisecond hence the 100% load on the server. Just to make sure, I've put a Thread.sleep(10) at the end of the loop and the CPU dropped back to 0% and the replication still worked nicely but probably a little slower since the wait of 10ms. I don't know much about those NIO packages but seams like the select(timeout) method shouldn't return a SelectorKey of that state. with any waiting. Let me know what you can dig from those. Jean-Philippe Bélanger [EMAIL PROTECTED] wrote: Hi Filip. I did some profiling of 40mins of tomcat with and without a 2nd node up. here are the results with -Xrunhprof:cpu=samples,thread=y,file=/u01/portal/java.hprof.txt,depth=10: Those number are cpu=times and not samples since the later one freezes on my systems. So that list shows the time spent in each methods. Major difference the some call to the sun.nio.ch.PollArrayWrapper class. I don't know much about those NIOs packages but 819000 call in 40 mins is a lot. The Socket Interface was called more than twice with 2 hosts than with a single one. Which seams normal. Maybe this can help. If you need the complete hprof file I can send them to you. 1 host in cluster: CPU TIME (ms) BEGIN (total = 19701) Thu Jan 8 10:00:59 2
RE: tomcat 5.0.16 Replication
I don't seem to need the ld_assume_kernel thing. What are the symptoms when it is required? -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Friday, January 09, 2004 12:33 PM To: Tomcat Users List Subject: Re: tomcat 5.0.16 Replication Just tried the CVS head and everything works with any CPU going crazy! only if ld_assume_kernel is set to 2.4 One more question for you Filip, is the useDirtyFlag working at all? It seams like even if it's set to true, the whole session gets replicated after each request. :( Jean-Philippe [EMAIL PROTECTED] wrote: > Hurray for Fillip! :) > > I'll get the CVS head for the module today and test this out. > Happy to see that it got fixed that quickly! > > Thanks again and I'll let you know how it goes > > Jean-Philippe > > Filip Hanik wrote: > >> Jean-Philippe and Steve, >> I fixed the bug, and tried replication on RH9. Immediately it didn't >> work. >> The problem is that when RH9 tries to write the ACK back to the NIO >> socket, >> it never reaches the other node. and times out after a long time. >> >> I set LD_ASSUME_KERNEL=2.4 and it started to work >> >> Filip >> >> -Original Message- >> From: Filip Hanik [mailto:[EMAIL PROTECTED] >> Sent: Thursday, January 08, 2004 6:43 PM >> To: Tomcat Users List >> Subject: RE: tomcat 5.0.16 Replication >> >> >> ok guys, >> good news. The 100% cpu is totally my fault. I messed up on that one. >> I was registering OP_WRITE as an interest >> this is not good :) >> checking in the working code in 15 min, some more regression tests >> Filip >> >> -Original Message- >> From: Filip Hanik [mailto:[EMAIL PROTECTED] >> Sent: Thursday, January 08, 2004 2:54 PM >> To: Tomcat Users List >> Subject: RE: tomcat 5.0.16 Replication >> >> >> another code change was, that I am now accepting keys for OP_READ and >> OP_WRITE. before it was only OP_READ, >> but for synchronous replication I need both. >> >> this is good info, I just got RH9 installed. will be trying it out >> this and >> next week. >> >> Filip >> >> -Original Message- >> From: [EMAIL PROTECTED] >> [mailto:[EMAIL PROTECTED] >> Sent: Thursday, January 08, 2004 11:46 AM >> To: Tomcat Users List >> Subject: Re: tomcat 5.0.16 Replication >> >> >> The only changes in the ReplicationListener class is the try catch that >> was added. >> >> the code logic is the same. Weird enough. So it's probably elsewhere >> that something changed in the state of the SelectionKey. >> >> Jean-Philippe Bélanger >> >> Steve Nelson wrote: >> >> >> >>> I was just about to try this actually. I found through googling alot of >>> people >>> having problems with select with 1.4 and NIO with Redhat 9. They were >>> actually >>> experiencing crashes though. >>> >>> To verify your results I just put a Thread.Sleep(1); where you >>> >> >> suggested and >> >> >>> I also see the jump in performance. >>> >>> Something must have changed in ReplicationListener that causes this >>> because >>> the 5.0.16 >>> version doesn't seem to have the problem. I'll see if I can figure >>> it out >>> when I get back to where I can diff the files. >>> >>> -Steve >>> >>> -Original Message- >>> From: [EMAIL PROTECTED] >>> [mailto:[EMAIL PROTECTED] >>> Sent: Thursday, January 08, 2004 12:25 PM >>> To: Tomcat Users List >>> Subject: Re: tomcat 5.0.16 Replication >>> >>> >>> More content for you Filip. >>> >>> I've checked and followed the code of the listen event in >>> ReplicationListener.java >>> >>> Here's what happening: >>> >>> selector.select(timeout) -> return immediatly with one SelectorKey >>> >> >> available >> >> >>> That key is not Acceptable and not Readable so it immediatly skip those >>> IFs and loops back to the beginning. >>> >>> I've put traces and this is executed once every millisecond hence the >>> 100% load on the server. >>> Just to make sure, I've put a Thread.sleep(10) at the end of the loop >>> and the CPU dropped back to 0% and the replication still worked nicely >>> but probably a little slower since the wait of 10ms. >>> >
RE: tomcat 5.0.16 Replication
useDirtyFlag=true means that session (yes the whole) only gets replicated when setAttribute and removeAttribute is called -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Friday, January 09, 2004 10:33 AM To: Tomcat Users List Subject: Re: tomcat 5.0.16 Replication Just tried the CVS head and everything works with any CPU going crazy! only if ld_assume_kernel is set to 2.4 One more question for you Filip, is the useDirtyFlag working at all? It seams like even if it's set to true, the whole session gets replicated after each request. :( Jean-Philippe [EMAIL PROTECTED] wrote: > Hurray for Fillip! :) > > I'll get the CVS head for the module today and test this out. > Happy to see that it got fixed that quickly! > > Thanks again and I'll let you know how it goes > > Jean-Philippe > > Filip Hanik wrote: > >> Jean-Philippe and Steve, >> I fixed the bug, and tried replication on RH9. Immediately it didn't >> work. >> The problem is that when RH9 tries to write the ACK back to the NIO >> socket, >> it never reaches the other node. and times out after a long time. >> >> I set LD_ASSUME_KERNEL=2.4 and it started to work >> >> Filip >> >> -Original Message- >> From: Filip Hanik [mailto:[EMAIL PROTECTED] >> Sent: Thursday, January 08, 2004 6:43 PM >> To: Tomcat Users List >> Subject: RE: tomcat 5.0.16 Replication >> >> >> ok guys, >> good news. The 100% cpu is totally my fault. I messed up on that one. >> I was registering OP_WRITE as an interest >> this is not good :) >> checking in the working code in 15 min, some more regression tests >> Filip >> >> -Original Message- >> From: Filip Hanik [mailto:[EMAIL PROTECTED] >> Sent: Thursday, January 08, 2004 2:54 PM >> To: Tomcat Users List >> Subject: RE: tomcat 5.0.16 Replication >> >> >> another code change was, that I am now accepting keys for OP_READ and >> OP_WRITE. before it was only OP_READ, >> but for synchronous replication I need both. >> >> this is good info, I just got RH9 installed. will be trying it out >> this and >> next week. >> >> Filip >> >> -Original Message- >> From: [EMAIL PROTECTED] >> [mailto:[EMAIL PROTECTED] >> Sent: Thursday, January 08, 2004 11:46 AM >> To: Tomcat Users List >> Subject: Re: tomcat 5.0.16 Replication >> >> >> The only changes in the ReplicationListener class is the try catch that >> was added. >> >> the code logic is the same. Weird enough. So it's probably elsewhere >> that something changed in the state of the SelectionKey. >> >> Jean-Philippe Bélanger >> >> Steve Nelson wrote: >> >> >> >>> I was just about to try this actually. I found through googling alot of >>> people >>> having problems with select with 1.4 and NIO with Redhat 9. They were >>> actually >>> experiencing crashes though. >>> >>> To verify your results I just put a Thread.Sleep(1); where you >>> >> >> suggested and >> >> >>> I also see the jump in performance. >>> >>> Something must have changed in ReplicationListener that causes this >>> because >>> the 5.0.16 >>> version doesn't seem to have the problem. I'll see if I can figure >>> it out >>> when I get back to where I can diff the files. >>> >>> -Steve >>> >>> -Original Message- >>> From: [EMAIL PROTECTED] >>> [mailto:[EMAIL PROTECTED] >>> Sent: Thursday, January 08, 2004 12:25 PM >>> To: Tomcat Users List >>> Subject: Re: tomcat 5.0.16 Replication >>> >>> >>> More content for you Filip. >>> >>> I've checked and followed the code of the listen event in >>> ReplicationListener.java >>> >>> Here's what happening: >>> >>> selector.select(timeout) -> return immediatly with one SelectorKey >>> >> >> available >> >> >>> That key is not Acceptable and not Readable so it immediatly skip those >>> IFs and loops back to the beginning. >>> >>> I've put traces and this is executed once every millisecond hence the >>> 100% load on the server. >>> Just to make sure, I've put a Thread.sleep(10) at the end of the loop >>> and the CPU dropped back to 0% and the replication still worked nicely >>> but probably a little slower since the wait of 10ms. >>&g
RE: tomcat 5.0.16 Replication
I will be implementing some performance improvements today. I'll let you know how it goes -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Friday, January 09, 2004 4:33 AM To: Tomcat Users List Subject: Re: tomcat 5.0.16 Replication Hurray for Fillip! :) I'll get the CVS head for the module today and test this out. Happy to see that it got fixed that quickly! Thanks again and I'll let you know how it goes Jean-Philippe Filip Hanik wrote: >Jean-Philippe and Steve, >I fixed the bug, and tried replication on RH9. Immediately it didn't work. >The problem is that when RH9 tries to write the ACK back to the NIO socket, >it never reaches the other node. and times out after a long time. > >I set LD_ASSUME_KERNEL=2.4 and it started to work > >Filip > >-Original Message- >From: Filip Hanik [mailto:[EMAIL PROTECTED] >Sent: Thursday, January 08, 2004 6:43 PM >To: Tomcat Users List >Subject: RE: tomcat 5.0.16 Replication > > >ok guys, >good news. The 100% cpu is totally my fault. I messed up on that one. >I was registering OP_WRITE as an interest >this is not good :) >checking in the working code in 15 min, some more regression tests >Filip > >-Original Message- >From: Filip Hanik [mailto:[EMAIL PROTECTED] >Sent: Thursday, January 08, 2004 2:54 PM >To: Tomcat Users List >Subject: RE: tomcat 5.0.16 Replication > > >another code change was, that I am now accepting keys for OP_READ and >OP_WRITE. before it was only OP_READ, >but for synchronous replication I need both. > >this is good info, I just got RH9 installed. will be trying it out this and >next week. > >Filip > >-Original Message- >From: [EMAIL PROTECTED] >[mailto:[EMAIL PROTECTED] >Sent: Thursday, January 08, 2004 11:46 AM >To: Tomcat Users List >Subject: Re: tomcat 5.0.16 Replication > > >The only changes in the ReplicationListener class is the try catch that >was added. > >the code logic is the same. Weird enough. So it's probably elsewhere >that something changed in the state of the SelectionKey. > >Jean-Philippe Bélanger > >Steve Nelson wrote: > > > >>I was just about to try this actually. I found through googling alot of >>people >>having problems with select with 1.4 and NIO with Redhat 9. They were >>actually >>experiencing crashes though. >> >>To verify your results I just put a Thread.Sleep(1); where you >> >> >suggested and > > >>I also see the jump in performance. >> >>Something must have changed in ReplicationListener that causes this because >>the 5.0.16 >>version doesn't seem to have the problem. I'll see if I can figure it out >>when I get back to where I can diff the files. >> >>-Steve >> >>-Original Message- >>From: [EMAIL PROTECTED] >>[mailto:[EMAIL PROTECTED] >>Sent: Thursday, January 08, 2004 12:25 PM >>To: Tomcat Users List >>Subject: Re: tomcat 5.0.16 Replication >> >> >>More content for you Filip. >> >>I've checked and followed the code of the listen event in >>ReplicationListener.java >> >>Here's what happening: >> >>selector.select(timeout) -> return immediatly with one SelectorKey >> >> >available > > >>That key is not Acceptable and not Readable so it immediatly skip those >>IFs and loops back to the beginning. >> >>I've put traces and this is executed once every millisecond hence the >>100% load on the server. >>Just to make sure, I've put a Thread.sleep(10) at the end of the loop >>and the CPU dropped back to 0% and the replication still worked nicely >>but probably a little slower since the wait of 10ms. >> >>I don't know much about those NIO packages but seams like the >>select(timeout) method shouldn't return a SelectorKey of that state. >>with any waiting. >> >>Let me know what you can dig from those. >> >>Jean-Philippe Bélanger >> >>[EMAIL PROTECTED] wrote: >> >> >> >> >> >>>Hi Filip. >>> >>>I did some profiling of 40mins of tomcat with and without a 2nd node >>>up. here are the results with >>>-Xrunhprof:cpu=samples,thread=y,file=/u01/portal/java.hprof.txt,depth=10: >>> >>>Those number are cpu=times and not samples since the later one freezes >>>on my systems. >>>So that list shows the time spent in each methods. >>> >>>Major difference the some call to the sun.nio.ch.PollArrayWrapper >>>class. I don't know much about those NIOs
Re: tomcat 5.0.16 Replication
Just tried the CVS head and everything works with any CPU going crazy! only if ld_assume_kernel is set to 2.4 One more question for you Filip, is the useDirtyFlag working at all? It seams like even if it's set to true, the whole session gets replicated after each request. :( Jean-Philippe [EMAIL PROTECTED] wrote: Hurray for Fillip! :) I'll get the CVS head for the module today and test this out. Happy to see that it got fixed that quickly! Thanks again and I'll let you know how it goes Jean-Philippe Filip Hanik wrote: Jean-Philippe and Steve, I fixed the bug, and tried replication on RH9. Immediately it didn't work. The problem is that when RH9 tries to write the ACK back to the NIO socket, it never reaches the other node. and times out after a long time. I set LD_ASSUME_KERNEL=2.4 and it started to work Filip -Original Message- From: Filip Hanik [mailto:[EMAIL PROTECTED] Sent: Thursday, January 08, 2004 6:43 PM To: Tomcat Users List Subject: RE: tomcat 5.0.16 Replication ok guys, good news. The 100% cpu is totally my fault. I messed up on that one. I was registering OP_WRITE as an interest this is not good :) checking in the working code in 15 min, some more regression tests Filip -Original Message- From: Filip Hanik [mailto:[EMAIL PROTECTED] Sent: Thursday, January 08, 2004 2:54 PM To: Tomcat Users List Subject: RE: tomcat 5.0.16 Replication another code change was, that I am now accepting keys for OP_READ and OP_WRITE. before it was only OP_READ, but for synchronous replication I need both. this is good info, I just got RH9 installed. will be trying it out this and next week. Filip -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Thursday, January 08, 2004 11:46 AM To: Tomcat Users List Subject: Re: tomcat 5.0.16 Replication The only changes in the ReplicationListener class is the try catch that was added. the code logic is the same. Weird enough. So it's probably elsewhere that something changed in the state of the SelectionKey. Jean-Philippe Bélanger Steve Nelson wrote: I was just about to try this actually. I found through googling alot of people having problems with select with 1.4 and NIO with Redhat 9. They were actually experiencing crashes though. To verify your results I just put a Thread.Sleep(1); where you suggested and I also see the jump in performance. Something must have changed in ReplicationListener that causes this because the 5.0.16 version doesn't seem to have the problem. I'll see if I can figure it out when I get back to where I can diff the files. -Steve -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Thursday, January 08, 2004 12:25 PM To: Tomcat Users List Subject: Re: tomcat 5.0.16 Replication More content for you Filip. I've checked and followed the code of the listen event in ReplicationListener.java Here's what happening: selector.select(timeout) -> return immediatly with one SelectorKey available That key is not Acceptable and not Readable so it immediatly skip those IFs and loops back to the beginning. I've put traces and this is executed once every millisecond hence the 100% load on the server. Just to make sure, I've put a Thread.sleep(10) at the end of the loop and the CPU dropped back to 0% and the replication still worked nicely but probably a little slower since the wait of 10ms. I don't know much about those NIO packages but seams like the select(timeout) method shouldn't return a SelectorKey of that state. with any waiting. Let me know what you can dig from those. Jean-Philippe Bélanger [EMAIL PROTECTED] wrote: Hi Filip. I did some profiling of 40mins of tomcat with and without a 2nd node up. here are the results with -Xrunhprof:cpu=samples,thread=y,file=/u01/portal/java.hprof.txt,depth=10: Those number are cpu=times and not samples since the later one freezes on my systems. So that list shows the time spent in each methods. Major difference the some call to the sun.nio.ch.PollArrayWrapper class. I don't know much about those NIOs packages but 819000 call in 40 mins is a lot. The Socket Interface was called more than twice with 2 hosts than with a single one. Which seams normal. Maybe this can help. If you need the complete hprof file I can send them to you. 1 host in cluster: CPU TIME (ms) BEGIN (total = 19701) Thu Jan 8 10:00:59 2004 rank self accum count trace method 1 11.48% 11.48% 5485 java.lang.Object.wait 2 11.46% 22.94% 11786 java.lang.Object.wait 3 10.95% 33.89%4115 215 java.net.PlainDatagramSocketImpl.receive 4 10.93% 44.81%4114 224 java.lang.Thread.sleep 5 10.91% 55.73% 19005 214 sun.nio.ch.PollArrayWrapper.poll0 6 7.37% 63.09% 28 495 java.lang.Object.wait 7 7.24% 70.34% 10 576 java.lang.Object.wait 8 4.57% 74.90% 90 716 java.lang.Thread.sleep 9 4.48% 79.38% 1 909 java.lang.
Re: tomcat 5.0.16 Replication
Hurray for Fillip! :) I'll get the CVS head for the module today and test this out. Happy to see that it got fixed that quickly! Thanks again and I'll let you know how it goes Jean-Philippe Filip Hanik wrote: Jean-Philippe and Steve, I fixed the bug, and tried replication on RH9. Immediately it didn't work. The problem is that when RH9 tries to write the ACK back to the NIO socket, it never reaches the other node. and times out after a long time. I set LD_ASSUME_KERNEL=2.4 and it started to work Filip -Original Message- From: Filip Hanik [mailto:[EMAIL PROTECTED] Sent: Thursday, January 08, 2004 6:43 PM To: Tomcat Users List Subject: RE: tomcat 5.0.16 Replication ok guys, good news. The 100% cpu is totally my fault. I messed up on that one. I was registering OP_WRITE as an interest this is not good :) checking in the working code in 15 min, some more regression tests Filip -Original Message- From: Filip Hanik [mailto:[EMAIL PROTECTED] Sent: Thursday, January 08, 2004 2:54 PM To: Tomcat Users List Subject: RE: tomcat 5.0.16 Replication another code change was, that I am now accepting keys for OP_READ and OP_WRITE. before it was only OP_READ, but for synchronous replication I need both. this is good info, I just got RH9 installed. will be trying it out this and next week. Filip -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Thursday, January 08, 2004 11:46 AM To: Tomcat Users List Subject: Re: tomcat 5.0.16 Replication The only changes in the ReplicationListener class is the try catch that was added. the code logic is the same. Weird enough. So it's probably elsewhere that something changed in the state of the SelectionKey. Jean-Philippe Bélanger Steve Nelson wrote: I was just about to try this actually. I found through googling alot of people having problems with select with 1.4 and NIO with Redhat 9. They were actually experiencing crashes though. To verify your results I just put a Thread.Sleep(1); where you suggested and I also see the jump in performance. Something must have changed in ReplicationListener that causes this because the 5.0.16 version doesn't seem to have the problem. I'll see if I can figure it out when I get back to where I can diff the files. -Steve -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Thursday, January 08, 2004 12:25 PM To: Tomcat Users List Subject: Re: tomcat 5.0.16 Replication More content for you Filip. I've checked and followed the code of the listen event in ReplicationListener.java Here's what happening: selector.select(timeout) -> return immediatly with one SelectorKey available That key is not Acceptable and not Readable so it immediatly skip those IFs and loops back to the beginning. I've put traces and this is executed once every millisecond hence the 100% load on the server. Just to make sure, I've put a Thread.sleep(10) at the end of the loop and the CPU dropped back to 0% and the replication still worked nicely but probably a little slower since the wait of 10ms. I don't know much about those NIO packages but seams like the select(timeout) method shouldn't return a SelectorKey of that state. with any waiting. Let me know what you can dig from those. Jean-Philippe Bélanger [EMAIL PROTECTED] wrote: Hi Filip. I did some profiling of 40mins of tomcat with and without a 2nd node up. here are the results with -Xrunhprof:cpu=samples,thread=y,file=/u01/portal/java.hprof.txt,depth=10: Those number are cpu=times and not samples since the later one freezes on my systems. So that list shows the time spent in each methods. Major difference the some call to the sun.nio.ch.PollArrayWrapper class. I don't know much about those NIOs packages but 819000 call in 40 mins is a lot. The Socket Interface was called more than twice with 2 hosts than with a single one. Which seams normal. Maybe this can help. If you need the complete hprof file I can send them to you. 1 host in cluster: CPU TIME (ms) BEGIN (total = 19701) Thu Jan 8 10:00:59 2004 rank self accum count trace method 1 11.48% 11.48% 5485 java.lang.Object.wait 2 11.46% 22.94% 11786 java.lang.Object.wait 3 10.95% 33.89%4115 215 java.net.PlainDatagramSocketImpl.receive 4 10.93% 44.81%4114 224 java.lang.Thread.sleep 5 10.91% 55.73% 19005 214 sun.nio.ch.PollArrayWrapper.poll0 6 7.37% 63.09% 28 495 java.lang.Object.wait 7 7.24% 70.34% 10 576 java.lang.Object.wait 8 4.57% 74.90% 90 716 java.lang.Thread.sleep 9 4.48% 79.38% 1 909 java.lang.Object.wait 10 4.48% 83.86% 1 908 java.lang.Object.wait 11 4.48% 88.34% 15 810 java.lang.Object.wait 12 4.47% 92.81% 1 910 java.net.PlainSocketImpl.socketAccept 13 0.71% 93.52% 2 623 java.lang.Object.wait 14 0.56% 94.08% 2 706 java.lang.Object.wait 15 0.38% 94.46
RE: tomcat 5.0.16 Replication
Jean-Philippe and Steve, I fixed the bug, and tried replication on RH9. Immediately it didn't work. The problem is that when RH9 tries to write the ACK back to the NIO socket, it never reaches the other node. and times out after a long time. I set LD_ASSUME_KERNEL=2.4 and it started to work Filip -Original Message- From: Filip Hanik [mailto:[EMAIL PROTECTED] Sent: Thursday, January 08, 2004 6:43 PM To: Tomcat Users List Subject: RE: tomcat 5.0.16 Replication ok guys, good news. The 100% cpu is totally my fault. I messed up on that one. I was registering OP_WRITE as an interest this is not good :) checking in the working code in 15 min, some more regression tests Filip -Original Message- From: Filip Hanik [mailto:[EMAIL PROTECTED] Sent: Thursday, January 08, 2004 2:54 PM To: Tomcat Users List Subject: RE: tomcat 5.0.16 Replication another code change was, that I am now accepting keys for OP_READ and OP_WRITE. before it was only OP_READ, but for synchronous replication I need both. this is good info, I just got RH9 installed. will be trying it out this and next week. Filip -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Thursday, January 08, 2004 11:46 AM To: Tomcat Users List Subject: Re: tomcat 5.0.16 Replication The only changes in the ReplicationListener class is the try catch that was added. the code logic is the same. Weird enough. So it's probably elsewhere that something changed in the state of the SelectionKey. Jean-Philippe Bélanger Steve Nelson wrote: >I was just about to try this actually. I found through googling alot of >people >having problems with select with 1.4 and NIO with Redhat 9. They were >actually >experiencing crashes though. > >To verify your results I just put a Thread.Sleep(1); where you suggested and >I also see the jump in performance. > >Something must have changed in ReplicationListener that causes this because >the 5.0.16 >version doesn't seem to have the problem. I'll see if I can figure it out >when I get back to where I can diff the files. > >-Steve > >-Original Message- >From: [EMAIL PROTECTED] >[mailto:[EMAIL PROTECTED] >Sent: Thursday, January 08, 2004 12:25 PM >To: Tomcat Users List >Subject: Re: tomcat 5.0.16 Replication > > >More content for you Filip. > >I've checked and followed the code of the listen event in >ReplicationListener.java > >Here's what happening: > >selector.select(timeout) -> return immediatly with one SelectorKey available >That key is not Acceptable and not Readable so it immediatly skip those >IFs and loops back to the beginning. > >I've put traces and this is executed once every millisecond hence the >100% load on the server. >Just to make sure, I've put a Thread.sleep(10) at the end of the loop >and the CPU dropped back to 0% and the replication still worked nicely >but probably a little slower since the wait of 10ms. > >I don't know much about those NIO packages but seams like the >select(timeout) method shouldn't return a SelectorKey of that state. >with any waiting. > >Let me know what you can dig from those. > >Jean-Philippe Bélanger > >[EMAIL PROTECTED] wrote: > > > >>Hi Filip. >> >>I did some profiling of 40mins of tomcat with and without a 2nd node >>up. here are the results with >>-Xrunhprof:cpu=samples,thread=y,file=/u01/portal/java.hprof.txt,depth=10: >> >>Those number are cpu=times and not samples since the later one freezes >>on my systems. >>So that list shows the time spent in each methods. >> >>Major difference the some call to the sun.nio.ch.PollArrayWrapper >>class. I don't know much about those NIOs packages but 819000 call in >>40 mins is a lot. >>The Socket Interface was called more than twice with 2 hosts than with >>a single one. Which seams normal. >> >>Maybe this can help. >>If you need the complete hprof file I can send them to you. >> >>1 host in cluster: >>CPU TIME (ms) BEGIN (total = 19701) Thu Jan 8 10:00:59 2004 >>rank self accum count trace method >> 1 11.48% 11.48% 5485 java.lang.Object.wait >> 2 11.46% 22.94% 11786 java.lang.Object.wait >> 3 10.95% 33.89%4115 215 java.net.PlainDatagramSocketImpl.receive >> 4 10.93% 44.81%4114 224 java.lang.Thread.sleep >> 5 10.91% 55.73% 19005 214 sun.nio.ch.PollArrayWrapper.poll0 >> 6 7.37% 63.09% 28 495 java.lang.Object.wait >> 7 7.24% 70.34% 10 576 java.lang.Object.wait >> 8 4.57% 74.90% 90 716 java.lang.Thread.sleep >> 9 4.48% 79.38% 1 909 java.lang.Object.wait >> 10 4.48% 83.86% 1 908 java.
RE: tomcat 5.0.16 Replication
ok guys, good news. The 100% cpu is totally my fault. I messed up on that one. I was registering OP_WRITE as an interest this is not good :) checking in the working code in 15 min, some more regression tests Filip -Original Message- From: Filip Hanik [mailto:[EMAIL PROTECTED] Sent: Thursday, January 08, 2004 2:54 PM To: Tomcat Users List Subject: RE: tomcat 5.0.16 Replication another code change was, that I am now accepting keys for OP_READ and OP_WRITE. before it was only OP_READ, but for synchronous replication I need both. this is good info, I just got RH9 installed. will be trying it out this and next week. Filip -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Thursday, January 08, 2004 11:46 AM To: Tomcat Users List Subject: Re: tomcat 5.0.16 Replication The only changes in the ReplicationListener class is the try catch that was added. the code logic is the same. Weird enough. So it's probably elsewhere that something changed in the state of the SelectionKey. Jean-Philippe Bélanger Steve Nelson wrote: >I was just about to try this actually. I found through googling alot of >people >having problems with select with 1.4 and NIO with Redhat 9. They were >actually >experiencing crashes though. > >To verify your results I just put a Thread.Sleep(1); where you suggested and >I also see the jump in performance. > >Something must have changed in ReplicationListener that causes this because >the 5.0.16 >version doesn't seem to have the problem. I'll see if I can figure it out >when I get back to where I can diff the files. > >-Steve > >-Original Message- >From: [EMAIL PROTECTED] >[mailto:[EMAIL PROTECTED] >Sent: Thursday, January 08, 2004 12:25 PM >To: Tomcat Users List >Subject: Re: tomcat 5.0.16 Replication > > >More content for you Filip. > >I've checked and followed the code of the listen event in >ReplicationListener.java > >Here's what happening: > >selector.select(timeout) -> return immediatly with one SelectorKey available >That key is not Acceptable and not Readable so it immediatly skip those >IFs and loops back to the beginning. > >I've put traces and this is executed once every millisecond hence the >100% load on the server. >Just to make sure, I've put a Thread.sleep(10) at the end of the loop >and the CPU dropped back to 0% and the replication still worked nicely >but probably a little slower since the wait of 10ms. > >I don't know much about those NIO packages but seams like the >select(timeout) method shouldn't return a SelectorKey of that state. >with any waiting. > >Let me know what you can dig from those. > >Jean-Philippe Bélanger > >[EMAIL PROTECTED] wrote: > > > >>Hi Filip. >> >>I did some profiling of 40mins of tomcat with and without a 2nd node >>up. here are the results with >>-Xrunhprof:cpu=samples,thread=y,file=/u01/portal/java.hprof.txt,depth=10: >> >>Those number are cpu=times and not samples since the later one freezes >>on my systems. >>So that list shows the time spent in each methods. >> >>Major difference the some call to the sun.nio.ch.PollArrayWrapper >>class. I don't know much about those NIOs packages but 819000 call in >>40 mins is a lot. >>The Socket Interface was called more than twice with 2 hosts than with >>a single one. Which seams normal. >> >>Maybe this can help. >>If you need the complete hprof file I can send them to you. >> >>1 host in cluster: >>CPU TIME (ms) BEGIN (total = 19701) Thu Jan 8 10:00:59 2004 >>rank self accum count trace method >> 1 11.48% 11.48% 5485 java.lang.Object.wait >> 2 11.46% 22.94% 11786 java.lang.Object.wait >> 3 10.95% 33.89%4115 215 java.net.PlainDatagramSocketImpl.receive >> 4 10.93% 44.81%4114 224 java.lang.Thread.sleep >> 5 10.91% 55.73% 19005 214 sun.nio.ch.PollArrayWrapper.poll0 >> 6 7.37% 63.09% 28 495 java.lang.Object.wait >> 7 7.24% 70.34% 10 576 java.lang.Object.wait >> 8 4.57% 74.90% 90 716 java.lang.Thread.sleep >> 9 4.48% 79.38% 1 909 java.lang.Object.wait >> 10 4.48% 83.86% 1 908 java.lang.Object.wait >> 11 4.48% 88.34% 15 810 java.lang.Object.wait >> 12 4.47% 92.81% 1 910 java.net.PlainSocketImpl.socketAccept >> 13 0.71% 93.52% 2 623 java.lang.Object.wait >> 14 0.56% 94.08% 2 706 java.lang.Object.wait >> 15 0.38% 94.46% 2 914 java.lang.Object.wait >> 16 0.24% 94.70% 775 913 java.lang.String.toCharArray >> 17 0.23% 94.93% 3 475 java.lang.Thread.sleep >
RE: tomcat 5.0.16 Replication
another code change was, that I am now accepting keys for OP_READ and OP_WRITE. before it was only OP_READ, but for synchronous replication I need both. this is good info, I just got RH9 installed. will be trying it out this and next week. Filip -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Thursday, January 08, 2004 11:46 AM To: Tomcat Users List Subject: Re: tomcat 5.0.16 Replication The only changes in the ReplicationListener class is the try catch that was added. the code logic is the same. Weird enough. So it's probably elsewhere that something changed in the state of the SelectionKey. Jean-Philippe Bélanger Steve Nelson wrote: >I was just about to try this actually. I found through googling alot of >people >having problems with select with 1.4 and NIO with Redhat 9. They were >actually >experiencing crashes though. > >To verify your results I just put a Thread.Sleep(1); where you suggested and >I also see the jump in performance. > >Something must have changed in ReplicationListener that causes this because >the 5.0.16 >version doesn't seem to have the problem. I'll see if I can figure it out >when I get back to where I can diff the files. > >-Steve > >-Original Message- >From: [EMAIL PROTECTED] >[mailto:[EMAIL PROTECTED] >Sent: Thursday, January 08, 2004 12:25 PM >To: Tomcat Users List >Subject: Re: tomcat 5.0.16 Replication > > >More content for you Filip. > >I've checked and followed the code of the listen event in >ReplicationListener.java > >Here's what happening: > >selector.select(timeout) -> return immediatly with one SelectorKey available >That key is not Acceptable and not Readable so it immediatly skip those >IFs and loops back to the beginning. > >I've put traces and this is executed once every millisecond hence the >100% load on the server. >Just to make sure, I've put a Thread.sleep(10) at the end of the loop >and the CPU dropped back to 0% and the replication still worked nicely >but probably a little slower since the wait of 10ms. > >I don't know much about those NIO packages but seams like the >select(timeout) method shouldn't return a SelectorKey of that state. >with any waiting. > >Let me know what you can dig from those. > >Jean-Philippe Bélanger > >[EMAIL PROTECTED] wrote: > > > >>Hi Filip. >> >>I did some profiling of 40mins of tomcat with and without a 2nd node >>up. here are the results with >>-Xrunhprof:cpu=samples,thread=y,file=/u01/portal/java.hprof.txt,depth=10: >> >>Those number are cpu=times and not samples since the later one freezes >>on my systems. >>So that list shows the time spent in each methods. >> >>Major difference the some call to the sun.nio.ch.PollArrayWrapper >>class. I don't know much about those NIOs packages but 819000 call in >>40 mins is a lot. >>The Socket Interface was called more than twice with 2 hosts than with >>a single one. Which seams normal. >> >>Maybe this can help. >>If you need the complete hprof file I can send them to you. >> >>1 host in cluster: >>CPU TIME (ms) BEGIN (total = 19701) Thu Jan 8 10:00:59 2004 >>rank self accum count trace method >> 1 11.48% 11.48% 5485 java.lang.Object.wait >> 2 11.46% 22.94% 11786 java.lang.Object.wait >> 3 10.95% 33.89%4115 215 java.net.PlainDatagramSocketImpl.receive >> 4 10.93% 44.81%4114 224 java.lang.Thread.sleep >> 5 10.91% 55.73% 19005 214 sun.nio.ch.PollArrayWrapper.poll0 >> 6 7.37% 63.09% 28 495 java.lang.Object.wait >> 7 7.24% 70.34% 10 576 java.lang.Object.wait >> 8 4.57% 74.90% 90 716 java.lang.Thread.sleep >> 9 4.48% 79.38% 1 909 java.lang.Object.wait >> 10 4.48% 83.86% 1 908 java.lang.Object.wait >> 11 4.48% 88.34% 15 810 java.lang.Object.wait >> 12 4.47% 92.81% 1 910 java.net.PlainSocketImpl.socketAccept >> 13 0.71% 93.52% 2 623 java.lang.Object.wait >> 14 0.56% 94.08% 2 706 java.lang.Object.wait >> 15 0.38% 94.46% 2 914 java.lang.Object.wait >> 16 0.24% 94.70% 775 913 java.lang.String.toCharArray >> 17 0.23% 94.93% 3 475 java.lang.Thread.sleep >> 18 0.16% 95.09% 2 472 java.lang.Object.wait >> 19 0.15% 95.24% 2 595 java.lang.Thread.sleep >> 20 0.15% 95.40% 2 586 java.lang.Thread.sleep >> 21 0.15% 95.55% 2 703 java.lang.Thread.sleep >> 22 0.15% 95.70% 2 476 java.lang.Thread.sleep >> 23 0.15% 95.85% 2 692 java.lang.Thread.sleep >> 24 0.12% 95.97% 218595
Re: tomcat 5.0.16 Replication
The only changes in the ReplicationListener class is the try catch that was added. the code logic is the same. Weird enough. So it's probably elsewhere that something changed in the state of the SelectionKey. Jean-Philippe Bélanger Steve Nelson wrote: I was just about to try this actually. I found through googling alot of people having problems with select with 1.4 and NIO with Redhat 9. They were actually experiencing crashes though. To verify your results I just put a Thread.Sleep(1); where you suggested and I also see the jump in performance. Something must have changed in ReplicationListener that causes this because the 5.0.16 version doesn't seem to have the problem. I'll see if I can figure it out when I get back to where I can diff the files. -Steve -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Thursday, January 08, 2004 12:25 PM To: Tomcat Users List Subject: Re: tomcat 5.0.16 Replication More content for you Filip. I've checked and followed the code of the listen event in ReplicationListener.java Here's what happening: selector.select(timeout) -> return immediatly with one SelectorKey available That key is not Acceptable and not Readable so it immediatly skip those IFs and loops back to the beginning. I've put traces and this is executed once every millisecond hence the 100% load on the server. Just to make sure, I've put a Thread.sleep(10) at the end of the loop and the CPU dropped back to 0% and the replication still worked nicely but probably a little slower since the wait of 10ms. I don't know much about those NIO packages but seams like the select(timeout) method shouldn't return a SelectorKey of that state. with any waiting. Let me know what you can dig from those. Jean-Philippe Bélanger [EMAIL PROTECTED] wrote: Hi Filip. I did some profiling of 40mins of tomcat with and without a 2nd node up. here are the results with -Xrunhprof:cpu=samples,thread=y,file=/u01/portal/java.hprof.txt,depth=10: Those number are cpu=times and not samples since the later one freezes on my systems. So that list shows the time spent in each methods. Major difference the some call to the sun.nio.ch.PollArrayWrapper class. I don't know much about those NIOs packages but 819000 call in 40 mins is a lot. The Socket Interface was called more than twice with 2 hosts than with a single one. Which seams normal. Maybe this can help. If you need the complete hprof file I can send them to you. 1 host in cluster: CPU TIME (ms) BEGIN (total = 19701) Thu Jan 8 10:00:59 2004 rank self accum count trace method 1 11.48% 11.48% 5485 java.lang.Object.wait 2 11.46% 22.94% 11786 java.lang.Object.wait 3 10.95% 33.89%4115 215 java.net.PlainDatagramSocketImpl.receive 4 10.93% 44.81%4114 224 java.lang.Thread.sleep 5 10.91% 55.73% 19005 214 sun.nio.ch.PollArrayWrapper.poll0 6 7.37% 63.09% 28 495 java.lang.Object.wait 7 7.24% 70.34% 10 576 java.lang.Object.wait 8 4.57% 74.90% 90 716 java.lang.Thread.sleep 9 4.48% 79.38% 1 909 java.lang.Object.wait 10 4.48% 83.86% 1 908 java.lang.Object.wait 11 4.48% 88.34% 15 810 java.lang.Object.wait 12 4.47% 92.81% 1 910 java.net.PlainSocketImpl.socketAccept 13 0.71% 93.52% 2 623 java.lang.Object.wait 14 0.56% 94.08% 2 706 java.lang.Object.wait 15 0.38% 94.46% 2 914 java.lang.Object.wait 16 0.24% 94.70% 775 913 java.lang.String.toCharArray 17 0.23% 94.93% 3 475 java.lang.Thread.sleep 18 0.16% 95.09% 2 472 java.lang.Object.wait 19 0.15% 95.24% 2 595 java.lang.Thread.sleep 20 0.15% 95.40% 2 586 java.lang.Thread.sleep 21 0.15% 95.55% 2 703 java.lang.Thread.sleep 22 0.15% 95.70% 2 476 java.lang.Thread.sleep 23 0.15% 95.85% 2 692 java.lang.Thread.sleep 24 0.12% 95.97% 218595 385 java.lang.CharacterDataLatin1.toLowerCase 25 0.12% 96.09% 218595 408 java.lang.Character.toLowerCase 26 0.11% 96.20% 218595 433 java.lang.CharacterDataLatin1.getProperties 27 0.10% 96.30% 210925 389 java.lang.String.equalsIgnoreCase 28 0.08% 96.38% 157259 387 java.lang.String.charAt 29 0.08% 96.46% 1 646 java.lang.Thread.sleep 30 0.08% 96.53% 1 634 java.lang.Thread.sleep 31 0.08% 96.61% 1 903 java.lang.Thread.sleep 32 0.08% 96.69% 1 714 java.lang.Thread.sleep 33 0.08% 96.76% 1 811 java.lang.Thread.sleep 34 0.08% 96.84% 1 715 java.lang.Thread.sleep 2 hosts: CPU TIME (ms) BEGIN (total = 37247) Thu Jan 8 11:01:28 2004 rank self accum count trace method 1 9.56% 9.56% 5285 java.lang.Object.wait 2 9.56% 19.12% 2986 java.lang.Object.wait 3 9.30% 28.43% 3 267 java.lang.Object.wait 4 9.25% 37.68%6644 224 java.lang.Thread.sleep 5 9.23% 46.91% 13116 215 java.net.PlainDatagramSocketImpl.receive 6
RE: tomcat 5.0.16 Replication
I was just about to try this actually. I found through googling alot of people having problems with select with 1.4 and NIO with Redhat 9. They were actually experiencing crashes though. To verify your results I just put a Thread.Sleep(1); where you suggested and I also see the jump in performance. Something must have changed in ReplicationListener that causes this because the 5.0.16 version doesn't seem to have the problem. I'll see if I can figure it out when I get back to where I can diff the files. -Steve -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Thursday, January 08, 2004 12:25 PM To: Tomcat Users List Subject: Re: tomcat 5.0.16 Replication More content for you Filip. I've checked and followed the code of the listen event in ReplicationListener.java Here's what happening: selector.select(timeout) -> return immediatly with one SelectorKey available That key is not Acceptable and not Readable so it immediatly skip those IFs and loops back to the beginning. I've put traces and this is executed once every millisecond hence the 100% load on the server. Just to make sure, I've put a Thread.sleep(10) at the end of the loop and the CPU dropped back to 0% and the replication still worked nicely but probably a little slower since the wait of 10ms. I don't know much about those NIO packages but seams like the select(timeout) method shouldn't return a SelectorKey of that state. with any waiting. Let me know what you can dig from those. Jean-Philippe Bélanger [EMAIL PROTECTED] wrote: > Hi Filip. > > I did some profiling of 40mins of tomcat with and without a 2nd node > up. here are the results with > -Xrunhprof:cpu=samples,thread=y,file=/u01/portal/java.hprof.txt,depth=10: > > Those number are cpu=times and not samples since the later one freezes > on my systems. > So that list shows the time spent in each methods. > > Major difference the some call to the sun.nio.ch.PollArrayWrapper > class. I don't know much about those NIOs packages but 819000 call in > 40 mins is a lot. > The Socket Interface was called more than twice with 2 hosts than with > a single one. Which seams normal. > > Maybe this can help. > If you need the complete hprof file I can send them to you. > > 1 host in cluster: > CPU TIME (ms) BEGIN (total = 19701) Thu Jan 8 10:00:59 2004 > rank self accum count trace method > 1 11.48% 11.48% 5485 java.lang.Object.wait > 2 11.46% 22.94% 11786 java.lang.Object.wait > 3 10.95% 33.89%4115 215 java.net.PlainDatagramSocketImpl.receive > 4 10.93% 44.81%4114 224 java.lang.Thread.sleep > 5 10.91% 55.73% 19005 214 sun.nio.ch.PollArrayWrapper.poll0 > 6 7.37% 63.09% 28 495 java.lang.Object.wait > 7 7.24% 70.34% 10 576 java.lang.Object.wait > 8 4.57% 74.90% 90 716 java.lang.Thread.sleep > 9 4.48% 79.38% 1 909 java.lang.Object.wait > 10 4.48% 83.86% 1 908 java.lang.Object.wait > 11 4.48% 88.34% 15 810 java.lang.Object.wait > 12 4.47% 92.81% 1 910 java.net.PlainSocketImpl.socketAccept > 13 0.71% 93.52% 2 623 java.lang.Object.wait > 14 0.56% 94.08% 2 706 java.lang.Object.wait > 15 0.38% 94.46% 2 914 java.lang.Object.wait > 16 0.24% 94.70% 775 913 java.lang.String.toCharArray > 17 0.23% 94.93% 3 475 java.lang.Thread.sleep > 18 0.16% 95.09% 2 472 java.lang.Object.wait > 19 0.15% 95.24% 2 595 java.lang.Thread.sleep > 20 0.15% 95.40% 2 586 java.lang.Thread.sleep > 21 0.15% 95.55% 2 703 java.lang.Thread.sleep > 22 0.15% 95.70% 2 476 java.lang.Thread.sleep > 23 0.15% 95.85% 2 692 java.lang.Thread.sleep > 24 0.12% 95.97% 218595 385 java.lang.CharacterDataLatin1.toLowerCase > 25 0.12% 96.09% 218595 408 java.lang.Character.toLowerCase > 26 0.11% 96.20% 218595 433 > java.lang.CharacterDataLatin1.getProperties > 27 0.10% 96.30% 210925 389 java.lang.String.equalsIgnoreCase > 28 0.08% 96.38% 157259 387 java.lang.String.charAt > 29 0.08% 96.46% 1 646 java.lang.Thread.sleep > 30 0.08% 96.53% 1 634 java.lang.Thread.sleep > 31 0.08% 96.61% 1 903 java.lang.Thread.sleep > 32 0.08% 96.69% 1 714 java.lang.Thread.sleep > 33 0.08% 96.76% 1 811 java.lang.Thread.sleep > 34 0.08% 96.84% 1 715 java.lang.Thread.sleep > > 2 hosts: > CPU TIME (ms) BEGIN (total = 37247) Thu Jan 8 11:01:28 2004 > rank self accum count trace method > 1 9.56% 9.56% 5285 java.lang.Object.wait > 2 9.56% 19.12% 2986 java.lang.Object.wait > 3 9.30% 28.43% 3 267 java.lang.Object.wait > 4 9.25% 37.68%6644
Re: tomcat 5.0.16 Replication
January 07, 2004 12:08 PM To: 'Tomcat Users List' Subject: RE: tomcat 5.0.16 Replication Okay, did that got this BEGIN TO RECEIVE SENT:Default 1 RECEIVED:Default 1 FROM /10.0.0.110: SENT:Default 2 BEGIN TO RECEIVE RECEIVED:Default 2 FROM /10.0.0.110: SENT:Default 3 BEGIN TO RECEIVE RECEIVED:Default 3 FROM /10.0.0.110: SENT:Default 4 BEGIN TO RECEIVE RECEIVED:Default 4 FROM /10.0.0.110: *shrug* BTW It didn't go to 100% CPU ute before I started using the code from CVS. Of course the Manager would almost always timeout before it would recieve the message. Now it gets the message right away, but maxes my machine out. -Original Message- From: Filip Hanik [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 07, 2004 1:58 PM To: Tomcat Users List Subject: RE: tomcat 5.0.16 Replication 100% cpu can mean that you have a multicast problem, try to run java -cp tomcat-replication.jar MCaster download the jar from http://cvs.apache.org/~fhanik/ Filip -Original Message- From: Steve Nelson [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 07, 2004 6:51 AM To: '[EMAIL PROTECTED]' Subject: tomcat 5.0.16 Replication I was having random problems with clustering when starting up. Mostly it had to do with Timing out when the manager was starting up. I built the CVS version and it solved that problem. But it has caused some serious performance problems. First a little background. I have 2 servers, dual 300mhz cpq proliants, both running Redhat - 9, Tomcat 5.0.16 (with catalina-cluster.jar build from cvs) The multicast packets are restricted to a crossover link between the servers. There are 3 hosts in the server.xml, all with clustering set up. They all function just fine. But.the cpu's spikes up to 100% if I start up both servers. I know this didn't happen without the new catalina-cluster.jar. If I shut down 1 server (doesn't matter which) everything returns to normal. But when both are running both servers are at 100% CPU. I am trying to profile it now, but I figured if someone has already experienced this they could save me some time. Oh, and there isn't anything relevant in my logs. It's not throwing millions of errors or something. -Steve Nelson - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Jean-Philippe Bélanger (514)228-8800 ext 3060 111 Duke CGI - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: tomcat 5.0.16 Replication
Hi Filip. I did some profiling of 40mins of tomcat with and without a 2nd node up. here are the results with -Xrunhprof:cpu=samples,thread=y,file=/u01/portal/java.hprof.txt,depth=10: Those number are cpu=times and not samples since the later one freezes on my systems. So that list shows the time spent in each methods. Major difference the some call to the sun.nio.ch.PollArrayWrapper class. I don't know much about those NIOs packages but 819000 call in 40 mins is a lot. The Socket Interface was called more than twice with 2 hosts than with a single one. Which seams normal. Maybe this can help. If you need the complete hprof file I can send them to you. 1 host in cluster: CPU TIME (ms) BEGIN (total = 19701) Thu Jan 8 10:00:59 2004 rank self accum count trace method 1 11.48% 11.48% 5485 java.lang.Object.wait 2 11.46% 22.94% 11786 java.lang.Object.wait 3 10.95% 33.89%4115 215 java.net.PlainDatagramSocketImpl.receive 4 10.93% 44.81%4114 224 java.lang.Thread.sleep 5 10.91% 55.73% 19005 214 sun.nio.ch.PollArrayWrapper.poll0 6 7.37% 63.09% 28 495 java.lang.Object.wait 7 7.24% 70.34% 10 576 java.lang.Object.wait 8 4.57% 74.90% 90 716 java.lang.Thread.sleep 9 4.48% 79.38% 1 909 java.lang.Object.wait 10 4.48% 83.86% 1 908 java.lang.Object.wait 11 4.48% 88.34% 15 810 java.lang.Object.wait 12 4.47% 92.81% 1 910 java.net.PlainSocketImpl.socketAccept 13 0.71% 93.52% 2 623 java.lang.Object.wait 14 0.56% 94.08% 2 706 java.lang.Object.wait 15 0.38% 94.46% 2 914 java.lang.Object.wait 16 0.24% 94.70% 775 913 java.lang.String.toCharArray 17 0.23% 94.93% 3 475 java.lang.Thread.sleep 18 0.16% 95.09% 2 472 java.lang.Object.wait 19 0.15% 95.24% 2 595 java.lang.Thread.sleep 20 0.15% 95.40% 2 586 java.lang.Thread.sleep 21 0.15% 95.55% 2 703 java.lang.Thread.sleep 22 0.15% 95.70% 2 476 java.lang.Thread.sleep 23 0.15% 95.85% 2 692 java.lang.Thread.sleep 24 0.12% 95.97% 218595 385 java.lang.CharacterDataLatin1.toLowerCase 25 0.12% 96.09% 218595 408 java.lang.Character.toLowerCase 26 0.11% 96.20% 218595 433 java.lang.CharacterDataLatin1.getProperties 27 0.10% 96.30% 210925 389 java.lang.String.equalsIgnoreCase 28 0.08% 96.38% 157259 387 java.lang.String.charAt 29 0.08% 96.46% 1 646 java.lang.Thread.sleep 30 0.08% 96.53% 1 634 java.lang.Thread.sleep 31 0.08% 96.61% 1 903 java.lang.Thread.sleep 32 0.08% 96.69% 1 714 java.lang.Thread.sleep 33 0.08% 96.76% 1 811 java.lang.Thread.sleep 34 0.08% 96.84% 1 715 java.lang.Thread.sleep 2 hosts: CPU TIME (ms) BEGIN (total = 37247) Thu Jan 8 11:01:28 2004 rank self accum count trace method 1 9.56% 9.56% 5285 java.lang.Object.wait 2 9.56% 19.12% 2986 java.lang.Object.wait 3 9.30% 28.43% 3 267 java.lang.Object.wait 4 9.25% 37.68%6644 224 java.lang.Thread.sleep 5 9.23% 46.91% 13116 215 java.net.PlainDatagramSocketImpl.receive 6 7.67% 54.58% 3 266 java.lang.Object.wait 7 5.90% 60.47% 39 847 java.lang.Object.wait 8 5.76% 66.24% 12 503 java.lang.Object.wait 9 3.90% 70.14% 145 975 java.lang.Thread.sleep 10 3.90% 74.04% 1 1174 java.lang.Object.wait 11 3.90% 77.94% 1 1173 java.lang.Object.wait 12 3.90% 81.84% 25 973 java.lang.Object.wait 13 3.90% 85.74% 1 1175 java.net.PlainSocketImpl.socketAccept 14 3.88% 89.62% 819692 214 sun.nio.ch.PollArrayWrapper.poll0 15 0.75% 90.37% 2 958 java.lang.Object.wait 16 0.28% 90.65% 2 457 java.lang.Object.wait 17 0.26% 90.91% 2 1181 java.lang.Object.wait Filip Hanik wrote: I'll try to get an instance going today. Will let you know how it goes also, try asynchronous replication, does it still go to 100%? Filip -Original Message- From: Steve Nelson [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 07, 2004 12:08 PM To: 'Tomcat Users List' Subject: RE: tomcat 5.0.16 Replication Okay, did that got this BEGIN TO RECEIVE SENT:Default 1 RECEIVED:Default 1 FROM /10.0.0.110: SENT:Default 2 BEGIN TO RECEIVE RECEIVED:Default 2 FROM /10.0.0.110: SENT:Default 3 BEGIN TO RECEIVE RECEIVED:Default 3 FROM /10.0.0.110: SENT:Default 4 BEGIN TO RECEIVE RECEIVED:Default 4 FROM /10.0.0.110: *shrug* BTW It didn't go to 100% CPU ute before I started using the code from CVS. Of course the Manager would almost always timeout before it would recieve the message. Now it gets the message right away, but maxes my machine out. -Original Message- From: Filip Hanik [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 07, 2004 1:58 PM To: Tomcat Users List Subject: RE: tomcat 5.0.16 Replication 100% cpu can mean that you have a multicast problem, tr
RE: tomcat 5.0.16 Replication
Ends up doing the same thing. The variable was set. I checked it with an echo. -Original Message- From: Filip Hanik [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 07, 2004 4:05 PM To: Tomcat Users List Subject: RE: tomcat 5.0.16 Replication you should do export LD_ASSUME_KERNEL=2.4.1 not export set LD_ASSUME_KERNEL=2.4.1 in regular bash shell -Original Message- From: Steve Nelson [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 07, 2004 1:38 PM To: 'Tomcat Users List' Subject: RE: tomcat 5.0.16 Replication Heh, now I am replying to myself :P I tried export set LD_ASSUME_KERNEL=2.4.1 No change in Behaviour then I tried export set LD_ASSUME_KERNEL=2.2.5 again, no change. I restarted both servers between runs. I still get the CPU going crazy Scenario. -Steve -Original Message- From: Steve Nelson [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 07, 2004 3:03 PM To: 'Tomcat Users List' Subject: RE: tomcat 5.0.16 Replication Okay, I reverted back to the 5.0.16 version and now I don't have the high CPU ute. But it takes almost 60 seconds for the Manager to request the session state. Which causes it to fail to synch about half the time. Must be something in the Synch code. Which comes back to your original comments about the NIO stuff and RH9 not liking Java in general. Is there a known fix for making things right with RH9? I could try that. -Steve -Original Message- From: Steve Nelson [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 07, 2004 2:53 PM To: 'Tomcat Users List' Subject: RE: tomcat 5.0.16 Replication Yep, also happens when I use asynch. I couldn't get the profiling files to load on the machine I am using right now, when I get back to the servers I'll try to figure out what is eating up all the CPUalthough TOP tells me arround 30% of the ute is system level as opposed the the java executable. Sounds like alot of the load may be in system calls. -Steve -Original Message- From: Filip Hanik [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 07, 2004 2:47 PM To: Tomcat Users List Subject: RE: tomcat 5.0.16 Replication I'll try to get an instance going today. Will let you know how it goes also, try asynchronous replication, does it still go to 100%? Filip -Original Message- From: Steve Nelson [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 07, 2004 12:08 PM To: 'Tomcat Users List' Subject: RE: tomcat 5.0.16 Replication Okay, did that got this BEGIN TO RECEIVE SENT:Default 1 RECEIVED:Default 1 FROM /10.0.0.110: SENT:Default 2 BEGIN TO RECEIVE RECEIVED:Default 2 FROM /10.0.0.110: SENT:Default 3 BEGIN TO RECEIVE RECEIVED:Default 3 FROM /10.0.0.110: SENT:Default 4 BEGIN TO RECEIVE RECEIVED:Default 4 FROM /10.0.0.110: *shrug* BTW It didn't go to 100% CPU ute before I started using the code from CVS. Of course the Manager would almost always timeout before it would recieve the message. Now it gets the message right away, but maxes my machine out. -Original Message- From: Filip Hanik [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 07, 2004 1:58 PM To: Tomcat Users List Subject: RE: tomcat 5.0.16 Replication 100% cpu can mean that you have a multicast problem, try to run java -cp tomcat-replication.jar MCaster download the jar from http://cvs.apache.org/~fhanik/ Filip -Original Message- From: Steve Nelson [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 07, 2004 6:51 AM To: '[EMAIL PROTECTED]' Subject: tomcat 5.0.16 Replication I was having random problems with clustering when starting up. Mostly it had to do with Timing out when the manager was starting up. I built the CVS version and it solved that problem. But it has caused some serious performance problems. First a little background. I have 2 servers, dual 300mhz cpq proliants, both running Redhat - 9, Tomcat 5.0.16 (with catalina-cluster.jar build from cvs) The multicast packets are restricted to a crossover link between the servers. There are 3 hosts in the server.xml, all with clustering set up. They all function just fine. But.the cpu's spikes up to 100% if I start up both servers. I know this didn't happen without the new catalina-cluster.jar. If I shut down 1 server (doesn't matter which) everything returns to normal. But when both are running both servers are at 100% CPU. I am trying to profile it now, but I figured if someone has already experienced this they could save me some time. Oh, and there isn't anything relevant in my logs. It's not throwing millions of errors or something. -Steve Nelson - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -
RE: tomcat 5.0.16 Replication
you should do export LD_ASSUME_KERNEL=2.4.1 not export set LD_ASSUME_KERNEL=2.4.1 in regular bash shell -Original Message- From: Steve Nelson [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 07, 2004 1:38 PM To: 'Tomcat Users List' Subject: RE: tomcat 5.0.16 Replication Heh, now I am replying to myself :P I tried export set LD_ASSUME_KERNEL=2.4.1 No change in Behaviour then I tried export set LD_ASSUME_KERNEL=2.2.5 again, no change. I restarted both servers between runs. I still get the CPU going crazy Scenario. -Steve -Original Message- From: Steve Nelson [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 07, 2004 3:03 PM To: 'Tomcat Users List' Subject: RE: tomcat 5.0.16 Replication Okay, I reverted back to the 5.0.16 version and now I don't have the high CPU ute. But it takes almost 60 seconds for the Manager to request the session state. Which causes it to fail to synch about half the time. Must be something in the Synch code. Which comes back to your original comments about the NIO stuff and RH9 not liking Java in general. Is there a known fix for making things right with RH9? I could try that. -Steve -Original Message- From: Steve Nelson [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 07, 2004 2:53 PM To: 'Tomcat Users List' Subject: RE: tomcat 5.0.16 Replication Yep, also happens when I use asynch. I couldn't get the profiling files to load on the machine I am using right now, when I get back to the servers I'll try to figure out what is eating up all the CPUalthough TOP tells me arround 30% of the ute is system level as opposed the the java executable. Sounds like alot of the load may be in system calls. -Steve -Original Message- From: Filip Hanik [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 07, 2004 2:47 PM To: Tomcat Users List Subject: RE: tomcat 5.0.16 Replication I'll try to get an instance going today. Will let you know how it goes also, try asynchronous replication, does it still go to 100%? Filip -Original Message- From: Steve Nelson [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 07, 2004 12:08 PM To: 'Tomcat Users List' Subject: RE: tomcat 5.0.16 Replication Okay, did that got this BEGIN TO RECEIVE SENT:Default 1 RECEIVED:Default 1 FROM /10.0.0.110: SENT:Default 2 BEGIN TO RECEIVE RECEIVED:Default 2 FROM /10.0.0.110: SENT:Default 3 BEGIN TO RECEIVE RECEIVED:Default 3 FROM /10.0.0.110: SENT:Default 4 BEGIN TO RECEIVE RECEIVED:Default 4 FROM /10.0.0.110: *shrug* BTW It didn't go to 100% CPU ute before I started using the code from CVS. Of course the Manager would almost always timeout before it would recieve the message. Now it gets the message right away, but maxes my machine out. -Original Message- From: Filip Hanik [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 07, 2004 1:58 PM To: Tomcat Users List Subject: RE: tomcat 5.0.16 Replication 100% cpu can mean that you have a multicast problem, try to run java -cp tomcat-replication.jar MCaster download the jar from http://cvs.apache.org/~fhanik/ Filip -Original Message- From: Steve Nelson [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 07, 2004 6:51 AM To: '[EMAIL PROTECTED]' Subject: tomcat 5.0.16 Replication I was having random problems with clustering when starting up. Mostly it had to do with Timing out when the manager was starting up. I built the CVS version and it solved that problem. But it has caused some serious performance problems. First a little background. I have 2 servers, dual 300mhz cpq proliants, both running Redhat - 9, Tomcat 5.0.16 (with catalina-cluster.jar build from cvs) The multicast packets are restricted to a crossover link between the servers. There are 3 hosts in the server.xml, all with clustering set up. They all function just fine. But.the cpu's spikes up to 100% if I start up both servers. I know this didn't happen without the new catalina-cluster.jar. If I shut down 1 server (doesn't matter which) everything returns to normal. But when both are running both servers are at 100% CPU. I am trying to profile it now, but I figured if someone has already experienced this they could save me some time. Oh, and there isn't anything relevant in my logs. It's not throwing millions of errors or something. -Steve Nelson - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: tomcat 5.0.16 Replication
Heh, now I am replying to myself :P I tried export set LD_ASSUME_KERNEL=2.4.1 No change in Behaviour then I tried export set LD_ASSUME_KERNEL=2.2.5 again, no change. I restarted both servers between runs. I still get the CPU going crazy Scenario. -Steve -Original Message- From: Steve Nelson [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 07, 2004 3:03 PM To: 'Tomcat Users List' Subject: RE: tomcat 5.0.16 Replication Okay, I reverted back to the 5.0.16 version and now I don't have the high CPU ute. But it takes almost 60 seconds for the Manager to request the session state. Which causes it to fail to synch about half the time. Must be something in the Synch code. Which comes back to your original comments about the NIO stuff and RH9 not liking Java in general. Is there a known fix for making things right with RH9? I could try that. -Steve -Original Message- From: Steve Nelson [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 07, 2004 2:53 PM To: 'Tomcat Users List' Subject: RE: tomcat 5.0.16 Replication Yep, also happens when I use asynch. I couldn't get the profiling files to load on the machine I am using right now, when I get back to the servers I'll try to figure out what is eating up all the CPUalthough TOP tells me arround 30% of the ute is system level as opposed the the java executable. Sounds like alot of the load may be in system calls. -Steve -Original Message- From: Filip Hanik [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 07, 2004 2:47 PM To: Tomcat Users List Subject: RE: tomcat 5.0.16 Replication I'll try to get an instance going today. Will let you know how it goes also, try asynchronous replication, does it still go to 100%? Filip -Original Message- From: Steve Nelson [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 07, 2004 12:08 PM To: 'Tomcat Users List' Subject: RE: tomcat 5.0.16 Replication Okay, did that got this BEGIN TO RECEIVE SENT:Default 1 RECEIVED:Default 1 FROM /10.0.0.110: SENT:Default 2 BEGIN TO RECEIVE RECEIVED:Default 2 FROM /10.0.0.110: SENT:Default 3 BEGIN TO RECEIVE RECEIVED:Default 3 FROM /10.0.0.110: SENT:Default 4 BEGIN TO RECEIVE RECEIVED:Default 4 FROM /10.0.0.110: *shrug* BTW It didn't go to 100% CPU ute before I started using the code from CVS. Of course the Manager would almost always timeout before it would recieve the message. Now it gets the message right away, but maxes my machine out. -Original Message- From: Filip Hanik [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 07, 2004 1:58 PM To: Tomcat Users List Subject: RE: tomcat 5.0.16 Replication 100% cpu can mean that you have a multicast problem, try to run java -cp tomcat-replication.jar MCaster download the jar from http://cvs.apache.org/~fhanik/ Filip -Original Message- From: Steve Nelson [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 07, 2004 6:51 AM To: '[EMAIL PROTECTED]' Subject: tomcat 5.0.16 Replication I was having random problems with clustering when starting up. Mostly it had to do with Timing out when the manager was starting up. I built the CVS version and it solved that problem. But it has caused some serious performance problems. First a little background. I have 2 servers, dual 300mhz cpq proliants, both running Redhat - 9, Tomcat 5.0.16 (with catalina-cluster.jar build from cvs) The multicast packets are restricted to a crossover link between the servers. There are 3 hosts in the server.xml, all with clustering set up. They all function just fine. But.the cpu's spikes up to 100% if I start up both servers. I know this didn't happen without the new catalina-cluster.jar. If I shut down 1 server (doesn't matter which) everything returns to normal. But when both are running both servers are at 100% CPU. I am trying to profile it now, but I figured if someone has already experienced this they could save me some time. Oh, and there isn't anything relevant in my logs. It's not throwing millions of errors or something. -Steve Nelson - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: tomcat 5.0.16 Replication
Okay, I reverted back to the 5.0.16 version and now I don't have the high CPU ute. But it takes almost 60 seconds for the Manager to request the session state. Which causes it to fail to synch about half the time. Must be something in the Synch code. Which comes back to your original comments about the NIO stuff and RH9 not liking Java in general. Is there a known fix for making things right with RH9? I could try that. -Steve -Original Message- From: Steve Nelson [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 07, 2004 2:53 PM To: 'Tomcat Users List' Subject: RE: tomcat 5.0.16 Replication Yep, also happens when I use asynch. I couldn't get the profiling files to load on the machine I am using right now, when I get back to the servers I'll try to figure out what is eating up all the CPUalthough TOP tells me arround 30% of the ute is system level as opposed the the java executable. Sounds like alot of the load may be in system calls. -Steve -Original Message- From: Filip Hanik [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 07, 2004 2:47 PM To: Tomcat Users List Subject: RE: tomcat 5.0.16 Replication I'll try to get an instance going today. Will let you know how it goes also, try asynchronous replication, does it still go to 100%? Filip -Original Message- From: Steve Nelson [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 07, 2004 12:08 PM To: 'Tomcat Users List' Subject: RE: tomcat 5.0.16 Replication Okay, did that got this BEGIN TO RECEIVE SENT:Default 1 RECEIVED:Default 1 FROM /10.0.0.110: SENT:Default 2 BEGIN TO RECEIVE RECEIVED:Default 2 FROM /10.0.0.110: SENT:Default 3 BEGIN TO RECEIVE RECEIVED:Default 3 FROM /10.0.0.110: SENT:Default 4 BEGIN TO RECEIVE RECEIVED:Default 4 FROM /10.0.0.110: *shrug* BTW It didn't go to 100% CPU ute before I started using the code from CVS. Of course the Manager would almost always timeout before it would recieve the message. Now it gets the message right away, but maxes my machine out. -Original Message- From: Filip Hanik [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 07, 2004 1:58 PM To: Tomcat Users List Subject: RE: tomcat 5.0.16 Replication 100% cpu can mean that you have a multicast problem, try to run java -cp tomcat-replication.jar MCaster download the jar from http://cvs.apache.org/~fhanik/ Filip -Original Message- From: Steve Nelson [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 07, 2004 6:51 AM To: '[EMAIL PROTECTED]' Subject: tomcat 5.0.16 Replication I was having random problems with clustering when starting up. Mostly it had to do with Timing out when the manager was starting up. I built the CVS version and it solved that problem. But it has caused some serious performance problems. First a little background. I have 2 servers, dual 300mhz cpq proliants, both running Redhat - 9, Tomcat 5.0.16 (with catalina-cluster.jar build from cvs) The multicast packets are restricted to a crossover link between the servers. There are 3 hosts in the server.xml, all with clustering set up. They all function just fine. But.the cpu's spikes up to 100% if I start up both servers. I know this didn't happen without the new catalina-cluster.jar. If I shut down 1 server (doesn't matter which) everything returns to normal. But when both are running both servers are at 100% CPU. I am trying to profile it now, but I figured if someone has already experienced this they could save me some time. Oh, and there isn't anything relevant in my logs. It's not throwing millions of errors or something. -Steve Nelson - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: tomcat 5.0.16 Replication
Yep, also happens when I use asynch. I couldn't get the profiling files to load on the machine I am using right now, when I get back to the servers I'll try to figure out what is eating up all the CPUalthough TOP tells me arround 30% of the ute is system level as opposed the the java executable. Sounds like alot of the load may be in system calls. -Steve -Original Message- From: Filip Hanik [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 07, 2004 2:47 PM To: Tomcat Users List Subject: RE: tomcat 5.0.16 Replication I'll try to get an instance going today. Will let you know how it goes also, try asynchronous replication, does it still go to 100%? Filip -Original Message- From: Steve Nelson [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 07, 2004 12:08 PM To: 'Tomcat Users List' Subject: RE: tomcat 5.0.16 Replication Okay, did that got this BEGIN TO RECEIVE SENT:Default 1 RECEIVED:Default 1 FROM /10.0.0.110: SENT:Default 2 BEGIN TO RECEIVE RECEIVED:Default 2 FROM /10.0.0.110: SENT:Default 3 BEGIN TO RECEIVE RECEIVED:Default 3 FROM /10.0.0.110: SENT:Default 4 BEGIN TO RECEIVE RECEIVED:Default 4 FROM /10.0.0.110: *shrug* BTW It didn't go to 100% CPU ute before I started using the code from CVS. Of course the Manager would almost always timeout before it would recieve the message. Now it gets the message right away, but maxes my machine out. -Original Message- From: Filip Hanik [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 07, 2004 1:58 PM To: Tomcat Users List Subject: RE: tomcat 5.0.16 Replication 100% cpu can mean that you have a multicast problem, try to run java -cp tomcat-replication.jar MCaster download the jar from http://cvs.apache.org/~fhanik/ Filip -Original Message- From: Steve Nelson [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 07, 2004 6:51 AM To: '[EMAIL PROTECTED]' Subject: tomcat 5.0.16 Replication I was having random problems with clustering when starting up. Mostly it had to do with Timing out when the manager was starting up. I built the CVS version and it solved that problem. But it has caused some serious performance problems. First a little background. I have 2 servers, dual 300mhz cpq proliants, both running Redhat - 9, Tomcat 5.0.16 (with catalina-cluster.jar build from cvs) The multicast packets are restricted to a crossover link between the servers. There are 3 hosts in the server.xml, all with clustering set up. They all function just fine. But.the cpu's spikes up to 100% if I start up both servers. I know this didn't happen without the new catalina-cluster.jar. If I shut down 1 server (doesn't matter which) everything returns to normal. But when both are running both servers are at 100% CPU. I am trying to profile it now, but I figured if someone has already experienced this they could save me some time. Oh, and there isn't anything relevant in my logs. It's not throwing millions of errors or something. -Steve Nelson - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: tomcat 5.0.16 Replication
I'll try to get an instance going today. Will let you know how it goes also, try asynchronous replication, does it still go to 100%? Filip -Original Message- From: Steve Nelson [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 07, 2004 12:08 PM To: 'Tomcat Users List' Subject: RE: tomcat 5.0.16 Replication Okay, did that got this BEGIN TO RECEIVE SENT:Default 1 RECEIVED:Default 1 FROM /10.0.0.110: SENT:Default 2 BEGIN TO RECEIVE RECEIVED:Default 2 FROM /10.0.0.110: SENT:Default 3 BEGIN TO RECEIVE RECEIVED:Default 3 FROM /10.0.0.110: SENT:Default 4 BEGIN TO RECEIVE RECEIVED:Default 4 FROM /10.0.0.110: *shrug* BTW It didn't go to 100% CPU ute before I started using the code from CVS. Of course the Manager would almost always timeout before it would recieve the message. Now it gets the message right away, but maxes my machine out. -Original Message- From: Filip Hanik [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 07, 2004 1:58 PM To: Tomcat Users List Subject: RE: tomcat 5.0.16 Replication 100% cpu can mean that you have a multicast problem, try to run java -cp tomcat-replication.jar MCaster download the jar from http://cvs.apache.org/~fhanik/ Filip -Original Message- From: Steve Nelson [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 07, 2004 6:51 AM To: '[EMAIL PROTECTED]' Subject: tomcat 5.0.16 Replication I was having random problems with clustering when starting up. Mostly it had to do with Timing out when the manager was starting up. I built the CVS version and it solved that problem. But it has caused some serious performance problems. First a little background. I have 2 servers, dual 300mhz cpq proliants, both running Redhat - 9, Tomcat 5.0.16 (with catalina-cluster.jar build from cvs) The multicast packets are restricted to a crossover link between the servers. There are 3 hosts in the server.xml, all with clustering set up. They all function just fine. But.the cpu's spikes up to 100% if I start up both servers. I know this didn't happen without the new catalina-cluster.jar. If I shut down 1 server (doesn't matter which) everything returns to normal. But when both are running both servers are at 100% CPU. I am trying to profile it now, but I figured if someone has already experienced this they could save me some time. Oh, and there isn't anything relevant in my logs. It's not throwing millions of errors or something. -Steve Nelson - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: tomcat 5.0.16 Replication
Okay, did that got this BEGIN TO RECEIVE SENT:Default 1 RECEIVED:Default 1 FROM /10.0.0.110: SENT:Default 2 BEGIN TO RECEIVE RECEIVED:Default 2 FROM /10.0.0.110: SENT:Default 3 BEGIN TO RECEIVE RECEIVED:Default 3 FROM /10.0.0.110: SENT:Default 4 BEGIN TO RECEIVE RECEIVED:Default 4 FROM /10.0.0.110: *shrug* BTW It didn't go to 100% CPU ute before I started using the code from CVS. Of course the Manager would almost always timeout before it would recieve the message. Now it gets the message right away, but maxes my machine out. -Original Message- From: Filip Hanik [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 07, 2004 1:58 PM To: Tomcat Users List Subject: RE: tomcat 5.0.16 Replication 100% cpu can mean that you have a multicast problem, try to run java -cp tomcat-replication.jar MCaster download the jar from http://cvs.apache.org/~fhanik/ Filip -Original Message- From: Steve Nelson [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 07, 2004 6:51 AM To: '[EMAIL PROTECTED]' Subject: tomcat 5.0.16 Replication I was having random problems with clustering when starting up. Mostly it had to do with Timing out when the manager was starting up. I built the CVS version and it solved that problem. But it has caused some serious performance problems. First a little background. I have 2 servers, dual 300mhz cpq proliants, both running Redhat - 9, Tomcat 5.0.16 (with catalina-cluster.jar build from cvs) The multicast packets are restricted to a crossover link between the servers. There are 3 hosts in the server.xml, all with clustering set up. They all function just fine. But.the cpu's spikes up to 100% if I start up both servers. I know this didn't happen without the new catalina-cluster.jar. If I shut down 1 server (doesn't matter which) everything returns to normal. But when both are running both servers are at 100% CPU. I am trying to profile it now, but I figured if someone has already experienced this they could save me some time. Oh, and there isn't anything relevant in my logs. It's not throwing millions of errors or something. -Steve Nelson - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: tomcat 5.0.16 Replication
100% cpu can mean that you have a multicast problem, try to run java -cp tomcat-replication.jar MCaster download the jar from http://cvs.apache.org/~fhanik/ Filip -Original Message- From: Steve Nelson [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 07, 2004 6:51 AM To: '[EMAIL PROTECTED]' Subject: tomcat 5.0.16 Replication I was having random problems with clustering when starting up. Mostly it had to do with Timing out when the manager was starting up. I built the CVS version and it solved that problem. But it has caused some serious performance problems. First a little background. I have 2 servers, dual 300mhz cpq proliants, both running Redhat - 9, Tomcat 5.0.16 (with catalina-cluster.jar build from cvs) The multicast packets are restricted to a crossover link between the servers. There are 3 hosts in the server.xml, all with clustering set up. They all function just fine. But.the cpu's spikes up to 100% if I start up both servers. I know this didn't happen without the new catalina-cluster.jar. If I shut down 1 server (doesn't matter which) everything returns to normal. But when both are running both servers are at 100% CPU. I am trying to profile it now, but I figured if someone has already experienced this they could save me some time. Oh, and there isn't anything relevant in my logs. It's not throwing millions of errors or something. -Steve Nelson - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: tomcat 5.0.16 Replication
Well just to make sure that I wasn't saying something untrue. I went to check my redhats. Everything DOES work fine but it's true that there is some loop somewhere because both my tomcat are having abnormal loadavg. ie: 1.15 even with the server are idle. Jean-Philippe Bélanger Filip Hanik wrote: I had socket dead locks in the java.io.OutputStream.write that never returned, caused the system to eventually hang. in the next few weeks, I'll try to get a RH9 instance going. So everything works for you? Filip -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 07, 2004 11:43 AM To: Tomcat Users List Subject: Re: tomcat 5.0.16 Replication Currently running tomcat 5.0.16 with the CVS HEAD of the replication module. This is under redhat 9. So far so good. What kind of problem did you encounter under rh9? Jean-Philippe Bélanger Filip Hanik wrote: my only experience with Redhat 9 is that it doesn't play well with NIO. I have not successfully ran tomcat clustering on RH9, I use RH8. I also don't have a RH9 machine at home yet, so I can't develop for it Filip -Original Message- From: Steve Nelson [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 07, 2004 6:51 AM To: '[EMAIL PROTECTED]' Subject: tomcat 5.0.16 Replication I was having random problems with clustering when starting up. Mostly it had to do with Timing out when the manager was starting up. I built the CVS version and it solved that problem. But it has caused some serious performance problems. First a little background. I have 2 servers, dual 300mhz cpq proliants, both running Redhat - 9, Tomcat 5.0.16 (with catalina-cluster.jar build from cvs) The multicast packets are restricted to a crossover link between the servers. There are 3 hosts in the server.xml, all with clustering set up. They all function just fine. But.the cpu's spikes up to 100% if I start up both servers. I know this didn't happen without the new catalina-cluster.jar. If I shut down 1 server (doesn't matter which) everything returns to normal. But when both are running both servers are at 100% CPU. I am trying to profile it now, but I figured if someone has already experienced this they could save me some time. Oh, and there isn't anything relevant in my logs. It's not throwing millions of errors or something. -Steve Nelson - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: tomcat 5.0.16 Replication
My CPU Util jumps to 100% on both processes. It functions properly other than maxing the machine. BTW this is with NO load. I am going to try to profile it but the EJP profile files total over 800 meg for just starting up Tomcat. And I am off-site so I had to transfer them. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 07, 2004 1:43 PM To: Tomcat Users List Subject: Re: tomcat 5.0.16 Replication Currently running tomcat 5.0.16 with the CVS HEAD of the replication module. This is under redhat 9. So far so good. What kind of problem did you encounter under rh9? Jean-Philippe Bélanger Filip Hanik wrote: >my only experience with Redhat 9 is that it doesn't play well with NIO. >I have not successfully ran tomcat clustering on RH9, I use RH8. >I also don't have a RH9 machine at home yet, so I can't develop for it > >Filip > >-Original Message- >From: Steve Nelson [mailto:[EMAIL PROTECTED] >Sent: Wednesday, January 07, 2004 6:51 AM >To: '[EMAIL PROTECTED]' >Subject: tomcat 5.0.16 Replication > > > >I was having random problems with clustering when starting up. Mostly it had >to do with Timing out >when the manager was starting up. I built the CVS version and it solved that >problem. But it has caused >some serious performance problems. > >First a little background. > >I have 2 servers, dual 300mhz cpq proliants, both running Redhat - 9, Tomcat >5.0.16 (with catalina-cluster.jar build from cvs) The multicast packets are >restricted to a crossover link between the servers. There are 3 hosts in the >server.xml, all with clustering set up. They all function just fine. > >But.the cpu's spikes up to 100% if I start up both servers. I know this >didn't happen without the new catalina-cluster.jar. If I shut down 1 server >(doesn't matter which) everything returns to normal. But when both are >running both servers are at 100% CPU. I am trying to profile it now, but I >figured if someone has already experienced this they could save me some >time. > >Oh, and there isn't anything relevant in my logs. It's not throwing millions >of errors or something. > >-Steve Nelson > > > >- >To unsubscribe, e-mail: [EMAIL PROTECTED] >For additional commands, e-mail: [EMAIL PROTECTED] > > > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: tomcat 5.0.16 Replication
I had socket dead locks in the java.io.OutputStream.write that never returned, caused the system to eventually hang. in the next few weeks, I'll try to get a RH9 instance going. So everything works for you? Filip -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 07, 2004 11:43 AM To: Tomcat Users List Subject: Re: tomcat 5.0.16 Replication Currently running tomcat 5.0.16 with the CVS HEAD of the replication module. This is under redhat 9. So far so good. What kind of problem did you encounter under rh9? Jean-Philippe Bélanger Filip Hanik wrote: >my only experience with Redhat 9 is that it doesn't play well with NIO. >I have not successfully ran tomcat clustering on RH9, I use RH8. >I also don't have a RH9 machine at home yet, so I can't develop for it > >Filip > >-Original Message- >From: Steve Nelson [mailto:[EMAIL PROTECTED] >Sent: Wednesday, January 07, 2004 6:51 AM >To: '[EMAIL PROTECTED]' >Subject: tomcat 5.0.16 Replication > > > >I was having random problems with clustering when starting up. Mostly it had >to do with Timing out >when the manager was starting up. I built the CVS version and it solved that >problem. But it has caused >some serious performance problems. > >First a little background. > >I have 2 servers, dual 300mhz cpq proliants, both running Redhat - 9, Tomcat >5.0.16 (with catalina-cluster.jar build from cvs) The multicast packets are >restricted to a crossover link between the servers. There are 3 hosts in the >server.xml, all with clustering set up. They all function just fine. > >But.the cpu's spikes up to 100% if I start up both servers. I know this >didn't happen without the new catalina-cluster.jar. If I shut down 1 server >(doesn't matter which) everything returns to normal. But when both are >running both servers are at 100% CPU. I am trying to profile it now, but I >figured if someone has already experienced this they could save me some >time. > >Oh, and there isn't anything relevant in my logs. It's not throwing millions >of errors or something. > >-Steve Nelson > > > >- >To unsubscribe, e-mail: [EMAIL PROTECTED] >For additional commands, e-mail: [EMAIL PROTECTED] > > > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: tomcat 5.0.16 Replication
Currently running tomcat 5.0.16 with the CVS HEAD of the replication module. This is under redhat 9. So far so good. What kind of problem did you encounter under rh9? Jean-Philippe Bélanger Filip Hanik wrote: my only experience with Redhat 9 is that it doesn't play well with NIO. I have not successfully ran tomcat clustering on RH9, I use RH8. I also don't have a RH9 machine at home yet, so I can't develop for it Filip -Original Message- From: Steve Nelson [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 07, 2004 6:51 AM To: '[EMAIL PROTECTED]' Subject: tomcat 5.0.16 Replication I was having random problems with clustering when starting up. Mostly it had to do with Timing out when the manager was starting up. I built the CVS version and it solved that problem. But it has caused some serious performance problems. First a little background. I have 2 servers, dual 300mhz cpq proliants, both running Redhat - 9, Tomcat 5.0.16 (with catalina-cluster.jar build from cvs) The multicast packets are restricted to a crossover link between the servers. There are 3 hosts in the server.xml, all with clustering set up. They all function just fine. But.the cpu's spikes up to 100% if I start up both servers. I know this didn't happen without the new catalina-cluster.jar. If I shut down 1 server (doesn't matter which) everything returns to normal. But when both are running both servers are at 100% CPU. I am trying to profile it now, but I figured if someone has already experienced this they could save me some time. Oh, and there isn't anything relevant in my logs. It's not throwing millions of errors or something. -Steve Nelson - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: tomcat 5.0.16 Replication
my only experience with Redhat 9 is that it doesn't play well with NIO. I have not successfully ran tomcat clustering on RH9, I use RH8. I also don't have a RH9 machine at home yet, so I can't develop for it Filip -Original Message- From: Steve Nelson [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 07, 2004 6:51 AM To: '[EMAIL PROTECTED]' Subject: tomcat 5.0.16 Replication I was having random problems with clustering when starting up. Mostly it had to do with Timing out when the manager was starting up. I built the CVS version and it solved that problem. But it has caused some serious performance problems. First a little background. I have 2 servers, dual 300mhz cpq proliants, both running Redhat - 9, Tomcat 5.0.16 (with catalina-cluster.jar build from cvs) The multicast packets are restricted to a crossover link between the servers. There are 3 hosts in the server.xml, all with clustering set up. They all function just fine. But.the cpu's spikes up to 100% if I start up both servers. I know this didn't happen without the new catalina-cluster.jar. If I shut down 1 server (doesn't matter which) everything returns to normal. But when both are running both servers are at 100% CPU. I am trying to profile it now, but I figured if someone has already experienced this they could save me some time. Oh, and there isn't anything relevant in my logs. It's not throwing millions of errors or something. -Steve Nelson - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: tomcat 5.0.16 Replication (This is a Thread is a Duplicate Pl ease Ignore)
RE: tomcat 5.0.16 Replication
clustering doesn't support frames. synchronizing everything down to that level would cause overhead, so I decided against supporting it. Filip -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Monday, January 05, 2004 6:46 AM To: Tomcat Users List Subject: RE: tomcat 5.0.16 Replication I built the latest CVS branch for the clustering module and replaced my catalina-cluster.jar. Seams like everything is synchronous as stated. I had another unrelated problem with a IFRAME that IE seams to load before the server (tomcat) ends the request. So even if everything was synchronous the iframe request could be done by IE before the actual parent page was done replicating. I'll let you know if any other problem gets by since that release will be going thru intensive testing in the coming weeks. Thanks -Original Message- From: Filip Hanik [mailto:[EMAIL PROTECTED] Sent: Saturday, January 03, 2004 8:07 PM To: Tomcat Users List; [EMAIL PROTECTED] Subject: RE: tomcat 5.0.16 Replication it will come out in the next release. Filip -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Wednesday, December 31, 2003 9:41 AM To: Tomcat-user Subject: tomcat 5.0.16 Replication The new tomcat 5.0.16 replication seams to work odly. >From what I've read from the documentation and the mailing list, the clustering is supposed to be done synchronously. Right? Well that's not what's happening on my end, the client receives the response before the whole replication thing is done. ex: I got a webpage that fetches data and if data is found put it in the session and return a webpage containing a IFRAME. In the IFRAME, the src hits a webpage on the same cluster and loads up the data found in the session and display it. Well sometimes, when the IFRAME is shown and that hit is forwarded to different server than the first access, the data in the session is empty. Then once the page is loaded (empty) and returned to the client, I get the replication message in my logs. (The message containing the data that was supposed to be already replicated). [Cluster config] [end Cluster Config] Any idea on what could be going wrong here? Jean-Philippe Bélanger CGI - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: tomcat 5.0.16 Replication
I built the latest CVS branch for the clustering module and replaced my catalina-cluster.jar. Seams like everything is synchronous as stated. I had another unrelated problem with a IFRAME that IE seams to load before the server (tomcat) ends the request. So even if everything was synchronous the iframe request could be done by IE before the actual parent page was done replicating. I'll let you know if any other problem gets by since that release will be going thru intensive testing in the coming weeks. Thanks -Original Message- From: Filip Hanik [mailto:[EMAIL PROTECTED] Sent: Saturday, January 03, 2004 8:07 PM To: Tomcat Users List; [EMAIL PROTECTED] Subject: RE: tomcat 5.0.16 Replication it will come out in the next release. Filip -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Wednesday, December 31, 2003 9:41 AM To: Tomcat-user Subject: tomcat 5.0.16 Replication The new tomcat 5.0.16 replication seams to work odly. >From what I've read from the documentation and the mailing list, the clustering is supposed to be done synchronously. Right? Well that's not what's happening on my end, the client receives the response before the whole replication thing is done. ex: I got a webpage that fetches data and if data is found put it in the session and return a webpage containing a IFRAME. In the IFRAME, the src hits a webpage on the same cluster and loads up the data found in the session and display it. Well sometimes, when the IFRAME is shown and that hit is forwarded to different server than the first access, the data in the session is empty. Then once the page is loaded (empty) and returned to the client, I get the replication message in my logs. (The message containing the data that was supposed to be already replicated). [Cluster config] [end Cluster Config] Any idea on what could be going wrong here? Jean-Philippe Bélanger CGI - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: tomcat 5.0.16 Replication
it will come out in the next release. Filip -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Wednesday, December 31, 2003 9:41 AM To: Tomcat-user Subject: tomcat 5.0.16 Replication The new tomcat 5.0.16 replication seams to work odly. >From what I've read from the documentation and the mailing list, the clustering is supposed to be done synchronously. Right? Well that's not what's happening on my end, the client receives the response before the whole replication thing is done. ex: I got a webpage that fetches data and if data is found put it in the session and return a webpage containing a IFRAME. In the IFRAME, the src hits a webpage on the same cluster and loads up the data found in the session and display it. Well sometimes, when the IFRAME is shown and that hit is forwarded to different server than the first access, the data in the session is empty. Then once the page is loaded (empty) and returned to the client, I get the replication message in my logs. (The message containing the data that was supposed to be already replicated). [Cluster config] [end Cluster Config] Any idea on what could be going wrong here? Jean-Philippe Bélanger CGI - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]