[rt-users] Testing 4.0.0rc5 - experiences upgrading from 3.6.4 on RHEL6
So I figured I'd give 4.0.0rc5 a try on a disposable RHEL6 virtual machine tonight with a dump of our 3.6.4 database. I have to say, the various upgrade scripts seemed to have worked wonders, and I actually have a functioning 4.0.0rc5 running. It's reporting a lot of errors in syslog however, most of them to do with 'lifecycle', such as: Feb 18 20:01:51 rt RT: DBD::mysql::st execute failed: Unknown column 'lifecycle' in 'field list' at /usr/local/share/perl5/DBIx/SearchBuilder/Handle.pm line 509. (/usr/local/share/perl5/DBIx/SearchBuilder/Handle.pm:509) The Queues table looks like: mysql> describe Queues; +---+--+--+-+-++ | Field | Type | Null | Key | Default | Extra | +---+--+--+-+-++ | id| int(11) | NO | PRI | NULL| auto_increment | | Name | varchar(200) | NO | UNI | || | Description | varchar(255) | YES | | NULL|| | CorrespondAddress | varchar(120) | YES | | NULL|| | CommentAddress| varchar(120) | YES | | NULL|| | InitialPriority | int(11) | NO | | 0 || | FinalPriority | int(11) | NO | | 0 || | DefaultDueIn | int(11) | NO | | 0 || | Creator | int(11) | NO | | 0 || | Created | datetime | YES | | NULL|| | LastUpdatedBy | int(11) | NO | | 0 || | LastUpdated | datetime | YES | | NULL|| | Disabled | smallint(6) | NO | MUL | 0 || +---+--+--+-+-++ 13 rows in set (0.01 sec) What'd I miss? A second, more fatal problem that has me completely baffled presently is that after the initial cycle of: - import old database - upgrade to 3.7.87 using the script that allows you to specify start/end revisions - generate the queries.sql file and then run those I seem to be able to login once, but as soon as I explicitly logout, I can no longer login no matter how many mason/browser cache clearing loops I do. A third problem is that I can't seem to get RT to run when I first start apache. I get: mkdir /opt/rt4/var/mason_data/obj/3452818388: Permission denied at /usr/local/share/perl5/HTML/Mason/Compiler/ToObject.pm line 107 Looks like the various directories were created as root: [root@rt rt-4.0.0rc5]# ls -al /opt/rt4/var/mason_data/ total 16 drwxrwx---. 4 apache apache 4096 Feb 18 20:21 . drwxr-xr-x. 5 root root 4096 Feb 18 17:50 .. drwxr-xr-x. 2 root root 4096 Feb 18 19:41 cache drwxr-xr-x. 2 root root 4096 Feb 18 20:21 obj [root@rt rt-4.0.0rc5]# If I delete the subdirs under mason_data while apache is still running, it recreates the obj directory with a user ID it can deal with: [root@rt rt-4.0.0rc5]# ls -al /opt/rt4/var/mason_data/ total 12 drwxrwx---. 3 apache apache 4096 Feb 18 20:25 . drwxr-xr-x. 5 root root 4096 Feb 18 17:50 .. drwxr-xr-x. 3 apache apache 4096 Feb 18 20:25 obj [root@rt rt-4.0.0rc5]# and the site loads. Still can't login tho. Feb 18 20:12:24 rt RT: FAILED LOGIN for root from ##.##.##.## (/opt/rt4/sbin/../lib/RT/Interface/Web.pm:605) Feb 18 20:12:37 rt RT: FAILED LOGIN for root from ##.##.##.##(/opt/rt4/sbin/../lib/RT/Interface/Web.pm:605) Feb 18 20:13:08 rt RT: FAILED LOGIN for root from ##.##.##.## (/opt/rt4/sbin/../lib/RT/Interface/Web.pm:605) Help? -d
Re: [rt-users] Testing 4.0.0rc5 - experiences upgrading from 3.6.4 on RHEL6
On Fri, Feb 18, 2011 at 08:33:27PM -0500, Darren Nickerson wrote: > So I figured I'd give 4.0.0rc5 a try on a disposable RHEL6 virtual machine > tonight with a dump of our 3.6.4 database. I have to say, the various upgrade > scripts seemed to have worked wonders, and I actually have a functioning > 4.0.0rc5 running. It's reporting a lot of errors in syslog however, most of > them to do with 'lifecycle', such as: > > Feb 18 20:01:51 rt RT: DBD::mysql::st execute failed: Unknown column > 'lifecycle' in 'field list' at > /usr/local/share/perl5/DBIx/SearchBuilder/Handle.pm line 509. > (/usr/local/share/perl5/DBIx/SearchBuilder/Handle.pm:509) > It sure looks like something went wrong during your upgrades, as a "Lifecycle" column should have been added by the 3.9.7 upgrade script. Do you still have a log?
Re: [rt-users] Testing 4.0.0rc5 - experiences upgrading from 3.6.4 on RHEL6
On Feb 18, 2011, at 9:45 PM, Jesse Vincent wrote: > > It sure looks like something went wrong during your upgrades, as a > "Lifecycle" column should have been added by the 3.9.7 upgrade script. Do you > still have a log? Jesse, I think I figured it out. I had been using UPGRADING.mysql and README as my principal stepwise guide. Line 3 of UPGRADING.mysql got me to 3.7.87. I'm not sure what lines 4,5,6 accomplish exactly but I assumed they got me completely current (did all the rest). But clearly the schema was missing a lot of stuff. On a hunch, I re-ran: /opt/rt4/sbin/rt-setup-database --dba root --prompt-for-dba-password --action upgrade with a start release of 3.7.87 and no end (apply all updates) and I have a much healthier RT now. In fact, I can't find a single darn thing wrong with it!! Very sexy new look. ;-) (a little while later ) Well maybe I can find one thing ... I'm not able to get mailgate working any longer: Feb 18 21:59:03 rt postfix/local[13847]: 052E6C064D: to=, relay=local, delay=0.39, delays=0.07/0/0/0.32, dsn=4.3.0, status=deferred (temporary failure. Command output: An Error Occurred = 302 Found ) I'm not sure if you can hand-crank mailgate or not, but trying, I see: [root@rt postfix]# /opt/rt4/bin/rt-mailgate --queue sales --action comment --url http://rt4.ifax.com/ --debug test . /opt/rt4/bin/rt-mailgate: temp file is '/tmp/D8IMPShIFr/6difOpAIKD' /opt/rt4/bin/rt-mailgate: connecting to http://rt4.ifax.com//REST/1.0/NoAuth/mail-gateway An Error Occurred = 302 Found /opt/rt4/bin/rt-mailgate: undefined server error Which seems to be the same error as postfix was getting. Apache log says: [Fri Feb 18 22:06:32 2011] [error] Your request path is '/1.0/NoAuth/mail-gateway' and it doesn't match your Location(Match) '/opt/rt4/share/html/'. This should be due to the configuration error. See perldoc Plack::Handler::Apache2 for details. Thoughts? -d
Re: [rt-users] Testing 4.0.0rc5 - experiences upgrading from 3.6.4 on RHEL6
On Feb 18, 2011, at 10:11 PM, Darren Nickerson wrote: > > > [root@rt postfix]# /opt/rt4/bin/rt-mailgate --queue sales --action comment > --url http://rt4.ifax.com/ --debug > test > . > /opt/rt4/bin/rt-mailgate: temp file is '/tmp/D8IMPShIFr/6difOpAIKD' > /opt/rt4/bin/rt-mailgate: connecting to > http://rt4.ifax.com//REST/1.0/NoAuth/mail-gateway > An Error Occurred > = > > 302 Found > > /opt/rt4/bin/rt-mailgate: undefined server error Changing the invocation to: /opt/rt4/bin/rt-mailgate --queue sales --action comment --url http://rt4.ifax.com (ie: removing the trailing slash) fixes this problem. Presumably I could have handled that in the apache config as well. Looking good! -d
Re: [rt-users] Testing 4.0.0rc5 - experiences upgrading from 3.6.4 on RHEL6
Okay, only a few problems remaining, but they're doozies. Any help welcomed. 1. Apache occasionally stops responding to http requests. I haven't found a pattern yet, but it's frequent. Unable to login, the page just hangs waiting for a response ... the only log entry I see in apache's error log is: [Sat Feb 19 10:09:25 2011] [error] Your request path is '/index.html' and it doesn't match your Location(Match) '/opt/rt4/share/html/'. This should be due to the configuration error. See perldoc Plack::Handler::Apache2 for details. I see that every time I try to reload the page, for what it's worth. It turns out I see a line like that for EVERY page request, so it's probably unrelated (but also worrying). An apache restart clears the problem. I've applied RHEL6 updates & rebooted the box, but if the hang occurs again I'd welcome any suggestions on how to troubleshoot. 2. At boot time, apache seems to setup the mason_data/cache directory as root, and stuff fails to run, reporting: mkdir /opt/rt4/var/mason_data/obj/3452818388/standard/Install: Permission denied at /usr/local/share/perl5/HTML/Mason/Compiler/ToObject.pm line 107 3. I haven't had a lot of luck with the SELinux suggestions at: http://requesttracker.wikia.com/wiki/SELinux . Has anyone got RT and SELinux coexisting on RHEL6? -Darren
Re: [rt-users] Testing 4.0.0rc5 - experiences upgrading from 3.6.4 on RHEL6
On Sat, Feb 19, 2011 at 11:19:35AM -0500, Darren Nickerson wrote: > Okay, only a few problems remaining, but they're doozies. Any help welcomed. > > 1. Apache occasionally stops responding to http requests. I haven't found a > pattern yet, but it's frequent. Unable to login, the page just hangs waiting > for a response ... the only log entry I see in apache's error log is: > Darren, Can you try out this new version of Plack: http://pause.perl.org/incoming/Plack-0.9970.tar.gz We believe it may resolve your issues. Thanks, Jesse
Re: [rt-users] Testing 4.0.0rc5 - experiences upgrading from 3.6.4 on RHEL6
On Feb 22, 2011, at 1:31 PM, Jesse Vincent wrote: > > Darren, > > Can you try out this new version of Plack: > > http://pause.perl.org/incoming/Plack-0.9970.tar.gz > > We believe it may resolve your issues. Jesse, Thanks for your help! I installed that yesterday, but had to restart apache about 6 hours later because rt-mailgate was failing to insert tickets correctly. Well, actually the email was getting added to the ticket, but postfix thought it was failing: Feb 22 21:47:34 rt4 postfix/local[20829]: E2AE4C0672: to=, relay=local, delay=943, delays=762/0.02/0/180, dsn=4.3.0, status=deferred (temporary failure. Command output: An Error Occurred = 500 read timeout ) Feb 22 22:07:35 rt4 postfix/local[20920]: E2AE4C0672: to=, relay=local, delay=2143, delays=1963/0.02/0/180, dsn=4.3.0, status=deferred (temporary failure. Command output: An Error Occurred = 500 read timeout ) That's just one 40-minute retry interval there, it kept injecting this transaction (and others) every 40 mins until I restarted apache. This seems like the same behavior I was seeing before installing this version of Plack. Just to give you a bit more information about what I know here, I don't see this when RT is busy, ... it's usually in the evenings when it's been idle. And these read timeouts aren't limited to rt-mailgate, as I mentioned the user just sees a seemingly eternal hang when trying to access RT via the web as well. One time when this happened I kept hitting reload about 20 times on the browser, and noticed that each time I did so an apache thread went into a "W" (sending reply) status as reported by mod_status. Not getting any joy there, I tried 'apachectl graceful' to restart httpd, and as you can see below each thread just went into a "G" state, but stayed there until I stopped apache and restarted it. I'm very interested in getting to the bottom of this and will help any way I can ... I'm just not entirely sure what to try next, or how to get any visibility into where the lockup is occurring. Oh, and I'm still seeing these in the apache error log: [Wed Feb 23 07:31:14 2011] [error] Your request path is '/NoAuth/Logout.html' and it doesn't match your Location(Match) '/opt/rt4/share/html/'. This should be due to the configuration error. See perldoc Plack::Handler::Apache2 for details. [Wed Feb 23 07:45:16 2011] [error] Your request path is '/REST/1.0/NoAuth/mail-gateway' and it doesn't match your Location(Match) '/opt/rt4/share/html/'. This should be due to the configuration error. See perldoc Plack::Handler::Apache2 for details. [Wed Feb 23 07:46:14 2011] [error] Your request path is '/robots.txt' and it doesn't match your Location(Match) '/opt/rt4/share/html/'. This should be due to the configuration error. See perldoc Plack::Handler::Apache2 for details. -Darren [root@rt4 ~]# apachectl status Apache Server Status for localhost Server Version: Apache/2.2.15 (Unix) DAV/2 PHP/5.3.2 mod_ssl/2.2.15 OpenSSL/1.0.0-fips mod_wsgi/3.2 Python/2.6.5 mod_perl/2.0.4 Perl/v5.10.1 Server Built: Aug 14 2010 08:53:48 -- Current Time: Monday, 21-Feb-2011 11:15:41 EST Restart Time: Sunday, 20-Feb-2011 03:09:24 EST Parent Server Generation: 4 Server uptime: 1 day 8 hours 6 minutes 17 seconds Total accesses: 887 - Total Traffic: 16.3 MB CPU Usage: u128.81 s53.18 cu0 cs0 - .157% CPU load .00767 requests/sec - 147 B/second - 18.8 kB/request 34 requests currently being processed, 7 idle workers _G__W... Scoreboard Key: "_" Waiting for Connection, "S" Starting up, "R" Reading Request, "W" Sending Reply, "K" Keepalive (read), "D" DNS Lookup, "C" Closing connection, "L" Logging, "G" Gracefully finishing, "I" Idle cleanup of worker, "." Open slot with no current process [root@rt4 ~]# uptime 11:15:44 up 2 days, 22 min, 2 users, load average: 1.18, 0.37, 0.13 [root@rt4 ~]#
Re: [rt-users] Testing 4.0.0rc5 - experiences upgrading from 3.6.4 on RHEL6
Darren, I am looking at the hung response issue. for the Location error you are getting, please update the apache config to use ... instead of . I've only been able to reproduce the hung issue when setting the apache to have only one worker, and when the response finally came out, i see the login page's "time to display" being something normal, like 0.09 secs, so that means the congestion happened in apache or the modperl glue somehow. In your case, does the page eventually came out rendered at all, like after 5min? and can you see if it makes any difference if you change around line 222 of RT::Interface::Web::Handler from: my $h = RT::Interface::Web::Handler::NewHandler( 'HTML::Mason::PSGIHandler::Streamy'); to: my $h = RT::Interface::Web::Handler::NewHandler( 'HTML::Mason::PSGIHandler');
Re: [rt-users] Testing 4.0.0rc5 - experiences upgrading from 3.6.4 on RHEL6
On Feb 24, 2011, at 4:04 AM, Chia-liang Kao wrote: > Darren, > > I am looking at the hung response issue. for the Location error > you are getting, please update the apache config to use /> ... instead of . Thanks for looking into this! Just to be clear, the documentation in docs/web_deployment.pod suggests the following for mod_perl 2.xx: Order allow,deny Allow from all SetHandler perl-script PerlResponseHandler Plack::Handler::Apache2 PerlSetVar psgi_app /opt/rt4/sbin/rt-server In face my apache config is pretty much exactly as described in that section. You really want me to change Directory to Location? If I do that, I see: You haven't yet configured your webserver to run RT. You appear to have installed RT's web interface correctly, but haven't yet configured your web server to "run" the RT server which powers the web interface. The next step is to edit your webserver's configuration file to instruct it to use RT's mod_perl or FastCGI handler. If you need commercial support, please contact us at sa...@bestpractical.com. And my RT is down. Please clarify your instructions here. > I've only been able to reproduce the hung issue when setting the > apache to have only one worker, and when the response finally > came out, i see the login page's "time to display" being > something normal, like 0.09 secs, so that means the congestion > happened in apache or the modperl glue somehow. Interesting data point, thanks. > In your case, does the page eventually came out rendered at all, > like after 5min? I will try to let it wait longer ... I think I have waited as long as 10 minutes in the past, but it's worth trying and reporting back to you. > and can you see if it makes any difference if you change around line > 222 of RT::Interface::Web::Handler from: > > my $h = RT::Interface::Web::Handler::NewHandler( >'HTML::Mason::PSGIHandler::Streamy'); > to: > > my $h = RT::Interface::Web::Handler::NewHandler( >'HTML::Mason::PSGIHandler'); I will try this next time we see a lockup. Oddly, we did not have one at all yesterday (during heavy use) or last night (when RT was mostly idle). Perhaps the new version of Plack that Jesse had me install improved the situation after all? -d
Re: [rt-users] Testing 4.0.0rc5 - experiences upgrading from 3.6.4 on RHEL6
On Thu, Feb 24, 2011 at 12:08:47PM -0500, Darren Nickerson wrote: >On Feb 24, 2011, at 4:04 AM, Chia-liang Kao wrote: > > Darren, > > I am looking at the hung response issue. for the Location error > you are getting, please update the apache config to use /> ... instead of . > >Thanks for looking into this! Just to be clear, the documentation in > docs/web_deployment.pod >suggests the following for mod_perl 2.xx: CL has been revamping those docs, his recommendation comes from those updates. You can see his current branch (with tweaked docs) at https://github.com/bestpractical/rt/blob/4.0%2Fapache2-fixes/docs/web_deployment.pod -kevin pgpfcdL3urxbk.pgp Description: PGP signature
Re: [rt-users] Testing 4.0.0rc5 - experiences upgrading from 3.6.4 on RHEL6
On Feb 24, 2011, at 12:08 PM, Darren Nickerson wrote: > >> In your case, does the page eventually came out rendered at all, >> like after 5min? > > I will try to let it wait longer ... I think I have waited as long as 10 > minutes in the past, but it's worth trying and reporting back to you. So, we had two instances of the hang today, and they each followed a similar pattern. In at least one case the problem cleared itself after some time. I do not know how long, it was a background apache thread started by rt-mailgate that eventually cleared, it was not a browser session. Of the http worker threads, each one is blocked in a semop call: [root@rt4 Plack-0.9970]# strace -p 30307 Process 30307 attached - interrupt to quit semop(1802244, {{0, -1, SEM_UNDO}}, 1^C Process 30307 detached [root@rt4 Plack-0.9970]# strace -p 30308 Process 30308 attached - interrupt to quit semop(1802244, {{0, -1, SEM_UNDO}}, 1^C Process 30308 detached [root@rt4 Plack-0.9970]# strace -p 30309 Process 30309 attached - interrupt to quit semop(1802244, {{0, -1, SEM_UNDO}}, 1^C Process 30309 detached except for one which is reading from fd 1: [root@rt4 Plack-0.9970]# strace -p 30310 Process 30310 attached - interrupt to quit read(1, ^C That FD is a network connection to our database server: [root@rt4 Plack-0.9970]# ls -l /proc/30310/fd/1 lrwx--. 1 root root 64 Feb 24 17:44 /proc/30310/fd/1 -> socket:[281592] [root@rt4 Plack-0.9970]# netstat -antep | grep 281592 tcp0 5 10.0.12.149:49410 10.0.11.100:3306 ESTABLISHED 48 281592 30310/httpd The database server has no record of that tcp connection any longer, and mysqladmin processlist shows all threads sleeping. >> and can you see if it makes any difference if you change around line >> 222 of RT::Interface::Web::Handler from: >> >> my $h = RT::Interface::Web::Handler::NewHandler( >>'HTML::Mason::PSGIHandler::Streamy'); >> to: >> >> my $h = RT::Interface::Web::Handler::NewHandler( >>'HTML::Mason::PSGIHandler'); I have not yet tried this. Given the new detailed information above, does it still make sense to do so? -Darren
Re: [rt-users] Testing 4.0.0rc5 - experiences upgrading from 3.6.4 on RHEL6
On Feb 24, 2011, at 7:04 PM, Darren Nickerson wrote: > > That FD is a network connection to our database server: > > [root@rt4 Plack-0.9970]# ls -l /proc/30310/fd/1 > lrwx--. 1 root root 64 Feb 24 17:44 /proc/30310/fd/1 -> socket:[281592] > [root@rt4 Plack-0.9970]# netstat -antep | grep 281592 > tcp0 5 10.0.12.149:49410 10.0.11.100:3306 > ESTABLISHED 48 281592 30310/httpd > > The database server has no record of that tcp connection any longer, and > mysqladmin processlist shows all threads sleeping. *bump* any thoughts on this? Should I be looking outiside of RT for a fix for this? Is this more likely to be a kernel/networking problem in RHEL6 or with mod_perl or the MySQL DBD/DBI stuff? Looks like RT is using persistent database connections, but that when the lockup occurs it's usually because apache is waiting to get data back from a MySQL socket that no longer exists on the database server ... -d
Re: [rt-users] Testing 4.0.0rc5 - experiences upgrading from 3.6.4 on RHEL6
Darren, On Feb 24, 2011, at 7:04 PM, Darren Nickerson wrote: That FD is a network connection to our database server: [root@rt4 Plack-0.9970]# ls -l /proc/30310/fd/1 lrwx--. 1 root root 64 Feb 24 17:44 /proc/30310/fd/1 -> socket: [281592] [root@rt4 Plack-0.9970]# netstat -antep | grep 281592 tcp0 5 10.0.12.149:49410 10.0.11.100:3306ESTABLISHED 48 281592 30310/ httpd The database server has no record of that tcp connection any longer, and mysqladmin processlist shows all threads sleeping. *bump* any thoughts on this? Should I be looking outiside of RT for a fix for this? Is this more likely to be a kernel/networking problem in RHEL6 or with mod_perl or the MySQL DBD/DBI stuff? Looks like RT is using persistent database connections, but that when the lockup occurs it's usually because apache is waiting to get data back from a MySQL socket that no longer exists on the database server ... It seems the issue is with apache2 and plack. i can reproduce the stalled connection (however not always) with the plack-req.psgi example from the plack dist, with apache 2.2.16 on ubuntu maverick. however it's a bit different from yours. strace says: close(0)= 0 read(5, 0x7f1f67af, 1) = -1 EAGAIN (Resource temporarily unavailable) accept(4, {sa_family=AF_INET, sin_port=htons(64924), sin_addr=inet_addr("114.44.215.78")}, [16]) = 0 fcntl(0, F_GETFD) = 0 fcntl(0, F_SETFD, FD_CLOEXEC) = 0 getsockname(0, {sa_family=AF_INET, sin_port=htons(6254), sin_addr=inet_addr("10.130.158.43")}, [16]) = 0 fcntl(0, F_GETFL) = 0x2 (flags O_RDWR) fcntl(0, F_SETFL, O_RDWR|O_NONBLOCK)= 0 read(0, 0x7f8e98b0bf58, 8000) = -1 EAGAIN (Resource temporarily unavailable) poll([{fd=0, events=POLLIN}], 1, 30 Cheers, CLK
Re: [rt-users] Testing 4.0.0rc5 - experiences upgrading from 3.6.4 on RHEL6
On Mar 2, 2011, at 10:19 AM, Chia-liang Kao wrote: > > It seems the issue is with apache2 and plack. i can reproduce the stalled > connection (however not always) with the > plack-req.psgi example from the plack dist, with apache 2.2.16 on ubuntu > maverick. however it's a bit different from yours. You're definitely looking at a different problem. Is there any way I can make the database connections non-persistent? That might help the ones that are getting confused. -Darren
Re: [rt-users] Testing 4.0.0rc5 - experiences upgrading from 3.6.4 on RHEL6
On 2011/3/4, at 上午 3:13, Darren Nickerson wrote: On Mar 2, 2011, at 10:19 AM, Chia-liang Kao wrote: It seems the issue is with apache2 and plack. i can reproduce the stalled connection (however not always) with the plack-req.psgi example from the plack dist, with apache 2.2.16 on ubuntu maverick. however it's a bit different from yours. You're definitely looking at a different problem. Is there any way I can make the database connections non-persistent? That might help the ones that are getting confused. Try to remove the Plack::Handler::Apache2->preload line and make MaxRequestsPerChild 1 ?