[OpenAFS] AFS DB server upgrade advice
Greetings, I'm planning out an upgrade path for our AFS DB servers. A lot of changes need to be made, so I wanted to see if folks know of a safer/better way to do the below. We have 3 cells, each of which as 3 DB servers running Solaris 8 and OpenAFS 1.4.7. We would like to upgrade to RHEL 6 based servers running OpenAFS 1.6.1. Two DB servers in the EOS cell need new IPs, the other IPs will stay the same. 99% of our clients use --afsdb, except the few remaining Solaris 8 machines. Our plan for this is below. One server at a time, we start with the highest IP to do the sync master last. 1 Shutdown the AFS server processes 2 Take Backups 3 Build/Install 1.6.1 on the server 4 Recreate an empty db/ directory unless we are upgrading the sync master -- it gets a pre-populated db and config files (upclient master) 5 Start the AFS server processes 6 Check for sanity Once this process has produced a stable DB environment we had planned on running the process again where step 3 is replaced by shutdown solaris machine and replace it with a RHEL box. In the EOS cell 2 servers must change IPs. One of these new servers will end up being the sync master as it will have the lowest IP. (The current sync master gets to keep its IP.) So we were thinking about a similar upgrade path. 1 Upgrade all solaris servers to openafs 1.6.1 2 Create two new RHEL/1.6.1 DB servers on what will be the new IPs and let one become the new sync master. Update DNS records and CellServDB files on the other EOS DB servers as appropriate. 5 DB servers at this point. 3 Check for sanity. 4 Upgrade via above the old sync master to RHEL 5 Remove the solaris DBs from afsdb DNS records and CellServDBs on the EOS DB servers. 6 Monitor existing solaris servers to see usage drop off 7 Shutdown the 2 existing solaris servers and surplus Any recommendations for making this process any smoother? Jack -- Jack Neely jjne...@ncsu.edu Linux Czar, OIT Campus Linux Services Office of Information Technology, NC State University GPG Fingerprint: 1917 5AC1 E828 9337 7AA4 EA6B 213B 765F 3B6A 5B89 ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] 1.4.x, select() and recent RHEL kernels beware
On Thu, Nov 08, 2012 at 06:20:18PM +0100, Stephan Wiesand wrote: Hi Dan, On Nov 8, 2012, at 16:41 , Dan Van Der Ster wrote: [...] All of the nasty details of this incident here: https://afs.web.cern.ch/afs/reports/html/afs200SegFaults.html We're now running with a workaround, ulimit -Hn 1024; ulimit -Sn 1024 in our init scripts until we manage to upgrade to 1.6. Hope this saves someone the effort of troubleshooting this again. Great work (again). Thanks a lot for sharing this! Cheers, Stephan We've had this issue occur at NCSU as well. I'm trying to figure out if 1.6.2 will be out soon enough to wait for it, or have multiple outages for installing the ulimits and then upgrading to 1.6.2 when its available. (Or spend weeks moving volumes.) There are another fix or two in 1.6.2 I'd like to apply to our servers. Jack Neely -- Jack Neely jjne...@ncsu.edu Linux Czar, OIT Campus Linux Services Office of Information Technology, NC State University GPG Fingerprint: 1917 5AC1 E828 9337 7AA4 EA6B 213B 765F 3B6A 5B89 ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: OpenAFS kernel panic
On Wed, Oct 31, 2012 at 11:15:43AM -0500, Andrew Deason wrote: On Mon, 29 Oct 2012 14:44:15 -0400 Jack Neely jjne...@pams.ncsu.edu wrote: You can save us a little time by providing the disassembly of afs_Conn. You can get this by running objdump -d -r /path/to/libafs.ko /some/file Attached. This looks like 'objdump -d', not 'objdump -d -r', but okay. This is issue 130714. Were you moving fileservers around at the time, or trying to change their addresses or something? Yes, this was in the middle of lots of volume moves so that we could upgrade the storage space allocated to each AFS server. Jack A fix is already in the tree and will be in the next 1.6 release. -- Andrew Deason adea...@sinenomine.net ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info -- Jack Neely jjne...@ncsu.edu Linux Czar, OIT Campus Linux Services Office of Information Technology, NC State University GPG Fingerprint: 1917 5AC1 E828 9337 7AA4 EA6B 213B 765F 3B6A 5B89 ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: OpenAFS kernel panic
On Fri, Oct 26, 2012 at 01:52:24PM -0500, Andrew Deason wrote: On Fri, 26 Oct 2012 09:30:30 -0400 Jack Neely jjne...@pams.ncsu.edu wrote: Had an OpenAFS 1.6.1 client crash last night and I received the following screenshot of the kernel panic. https://lh6.googleusercontent.com/-LgYESh-n4zI/UIpsIQv1UPI/DTQ/DTbWGpa1L1w/s869/uni01ftp-20121026.jpg You can save us a little time by providing the disassembly of afs_Conn. You can get this by running objdump -d -r /path/to/libafs.ko /some/file Attached. And trimming the output to just contain the section that starts with afs_Conn:. It's also better to get more of the output, a little above that cutoff. If the screenshot is all you have, obviously there's nothing you can do, but if that stuff was logged anywhere, it'd be good to see. Alas, the screenshot is all I have, nothing was present in the logs. Jack I think CR2 gives the access address, though? 0x30 seems plausible... -- Jack Neely jjne...@ncsu.edu Linux Czar, OIT Campus Linux Services Office of Information Technology, NC State University GPG Fingerprint: 1917 5AC1 E828 9337 7AA4 EA6B 213B 765F 3B6A 5B89 00029600 afs_Conn: 29600: 55 push %rbp 29601: 48 89 e5mov%rsp,%rbp 29604: 41 57 push %r15 29606: 41 56 push %r14 29608: 41 55 push %r13 2960a: 41 54 push %r12 2960c: 53 push %rbx 2960d: 48 83 ec 38 sub$0x38,%rsp 29611: e8 00 00 00 00 callq 29616 afs_Conn+0x16 29616: 48 c7 01 00 00 00 00movq $0x0,(%rcx) 2961d: 41 89 d6mov%edx,%r14d 29620: ba 01 00 00 00 mov$0x1,%edx 29625: 83 05 00 00 00 00 01addl $0x1,0x0(%rip)# 2962c afs_Conn+0x2c 2962c: 48 89 4d c8 mov%rcx,-0x38(%rbp) 29630: 49 89 fcmov%rdi,%r12 29633: 49 89 f5mov%rsi,%r13 29636: e8 00 00 00 00 callq 2963b afs_Conn+0x3b 2963b: 48 85 c0test %rax,%rax 2963e: 48 8b 4d c8 mov-0x38(%rbp),%rcx 29642: 0f 84 74 02 00 00 je 298bc afs_Conn+0x2bc 29648: 4c 8b 78 48 mov0x48(%rax),%r15 2964c: 4d 85 fftest %r15,%r15 2964f: 0f 84 41 02 00 00 je 29896 afs_Conn+0x296 29655: 49 8b 57 40 mov0x40(%r15),%rdx 29659: 48 85 d2test %rdx,%rdx 2965c: 0f 84 34 02 00 00 je 29896 afs_Conn+0x296 29662: 44 0f b7 42 68 movzwl 0x68(%rdx),%r8d 29667: 8b 90 b0 00 00 00 mov0xb0(%rax),%edx 2966d: 85 d2 test %edx,%edx 2966f: 75 5f jne296d0 afs_Conn+0xd0 29671: 4d 85 fftest %r15,%r15 29674: 74 5a je 296d0 afs_Conn+0xd0 29676: 49 8b 5f 60 mov0x60(%r15),%rbx 2967a: f6 43 30 20 testb $0x20,0x30(%rbx) 2967e: 75 50 jne296d0 afs_Conn+0xd0 29680: 41 80 7d 12 00 cmpb $0x0,0x12(%r13) 29685: 0f 8e 21 02 00 00 jle298ac afs_Conn+0x2ac 2968b: 41 80 7d 13 01 cmpb $0x1,0x13(%r13) 29690: 74 3e je 296d0 afs_Conn+0xd0 29692: 48 85 dbtest %rbx,%rbx 29695: 74 39 je 296d0 afs_Conn+0xd0 29697: f6 80 2a 01 00 00 01testb $0x1,0x12a(%rax) 2969e: 48 89 c2mov%rax,%rdx 296a1: 41 bf ff ff ff ff mov$0x,%r15d 296a7: 0f 84 55 01 00 00 je 29802 afs_Conn+0x202 296ad: 48 83 7a 48 00 cmpq $0x0,0x48(%rdx) 296b2: 0f 84 4d 01 00 00 je 29805 afs_Conn+0x205 296b8: 41 83 c7 01 add$0x1,%r15d 296bc: 48 83 c2 08 add$0x8,%rdx 296c0: 41 83 ff 0c cmp$0xc,%r15d 296c4: 75 e7 jne296ad afs_Conn+0xad 296c6: e9 3a 01 00 00 jmpq 29805 afs_Conn+0x205 296cb: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) 296d0: 4d 8d 5d 13 lea0x13(%r13),%r11 296d4: 45 31 d2xor%r10d,%r10d 296d7: 41 b9 ff ff ff ff mov$0x,%r9d 296dd: 0f 1f 00nopl (%rax) 296e0: 4c 89 dfmov%r11,%rdi 296e3: 31 f6 xor%esi,%esi 296e5: 31 db xor%ebx,%ebx 296e7: eb 46 jmp2972f afs_Conn+0x12f 296e9: 0f 1f 80 00 00 00 00
[OpenAFS] OpenAFS kernel panic
Folks, Had an OpenAFS 1.6.1 client crash last night and I received the following screenshot of the kernel panic. https://lh6.googleusercontent.com/-LgYESh-n4zI/UIpsIQv1UPI/DTQ/DTbWGpa1L1w/s869/uni01ftp-20121026.jpg This one is new to me. Of course, no kdump, alas... Jack -- Jack Neely jjne...@ncsu.edu Linux Czar, OIT Campus Linux Services Office of Information Technology, NC State University GPG Fingerprint: 1917 5AC1 E828 9337 7AA4 EA6B 213B 765F 3B6A 5B89 ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] DAFS Salvager failure
Folks, One of our AFS file servers crashed this afternoon. OpenAFS 1.6.1 on RHEL 6 with kernel 2.6.32-279.9.1.el6.x86_64. It looks like the salvager hung and eventually the dafileserver stopped responding to clients. We're rebooted, fsck'd the ext4 partitions, and finally ran the dasalvager -force by hand to attempt to correctly salvage the server. In all cases once the dafs instance starts up, it serves requests, it dispatches a volume salvage or 4, all the salvager processes get stuck and we start all over again. We've salvaged the server multiple times at this point -- our next hope is that we can restart the file server with the traditional file server process. (BTW, 2 and 3 GiB cores from dafileserver and dasalvager abound.) SalsrvLog messages are usually along the following: 10/18/2012 17:55:08 SYNC_ask: No response on circuit 'FSSYNC' 10/18/2012 17:55:08 SYNC_ask: protocol communications failure on circuit 'FSSYNC'; attempting reconnect to server 10/18/2012 17:55:08 SYNC_ask: No response on circuit 'FSSYNC' 10/18/2012 17:55:08 SYNC_ask: protocol communications failure on circuit 'FSSYNC'; attempting reconnect to server 10/18/2012 17:55:11 SYNC_ask: too many / too latent fatal protocol errors on circuit 'FSSYNC'; giving up (tries 1 timeout 1350597266) 10/18/2012 17:55:11 FSYNC_askfs: internal FSSYNC protocol error 2 10/18/2012 17:55:11 AskOffline: request for fileserver to take volume offline failed; trying again... 10/18/2012 17:55:08 SYNC_ask: No response on circuit 'FSSYNC' 10/18/2012 17:55:08 SYNC_ask: protocol communications failure on circuit 'FSSYNC'; attempting reconnect to server 10/18/2012 17:55:11 SYNC_ask: too many / too latent fatal protocol errors on circuit 'FSSYNC'; giving up (tries 1 timeout 1350597265) 10/18/2012 17:55:11 FSYNC_askfs: internal FSSYNC protocol error 2 10/18/2012 17:55:11 AskOffline: request for fileserver to take volume offline failed; trying again... 10/18/2012 17:55:08 SYNC_ask: No response on circuit 'FSSYNC' or 10/18/2012 22:20:49 dispatching child to salvage volume 540007729... 10/18/2012 22:19:33 SYNC_ask: No response on circuit 'FSSYNC' 10/18/2012 22:19:33 SYNC_ask: protocol communications failure on circuit 'FSSYNC'; attempting reconnect to server and from FileLog (this looks like I'm restoring from backups) Thu Oct 18 22:25:30 2012 FSYNC_com: invalid protocol version (2574739029) Thu Oct 18 22:25:30 2012 FSYNC_com: invalid protocol version (3774863615) Thu Oct 18 22:25:30 2012 FSYNC_com: invalid protocol version (944130375) Thu Oct 18 22:25:30 2012 Volume 539458481 now offline, must be salvaged. Thu Oct 18 22:25:30 2012 Scheduling salvage for volume 539458481 on part /vicepb over SALVSYNC Thu Oct 18 22:25:31 2012 nUsers == 0, but header not on LRU Thu Oct 18 22:25:31 2012 SYNC_getCom: error receiving command Thu Oct 18 22:25:31 2012 Scheduling salvage for volume 539894230 on part /vicepb over SALVSYNC Thu Oct 18 22:25:31 2012 FSYNC_com: read failed; dropping connection (cnt=103291) Thu Oct 18 22:25:37 2012 FSYNC_com: invalid protocol version (2023862981) I've checked, all my binaries are from my 1.6.1 build. What's going on? Jack Neely -- Jack Neely jjne...@ncsu.edu Linux Czar, OIT Campus Linux Services Office of Information Technology, NC State University GPG Fingerprint: 1917 5AC1 E828 9337 7AA4 EA6B 213B 765F 3B6A 5B89 ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Distro vs. @sys. Round 1: FIGHT!
On Thu, Aug 23, 2012 at 08:16:37AM -0600, Ken Dreyer wrote: On Thu, Aug 23, 2012 at 8:02 AM, Jeff Blaine jbla...@kickflop.net wrote: Any other options, or is the standard thing everyone does? The last time this was brought up[1], it sounded like there was rough consensus for the following default upstream @sysname list on Linux: ${arch}_linux ${arch}_linux26 ... and then whatever else your distro or site wanted to put in front of that. - Ken [1] http://lists.openafs.org/pipermail/openafs-info/2012-March/037784.html I've used for a few years now a bit of code in /etc/sysconfig/openafs that sets the sysname to be arch_dist_major-version As first in the list with the OpenAFS default as second. So, Current sysname list is 'amd64_redhat_6' 'amd64_linux26' or Current sysname list is 'i386_redhat_6' 'i386_linux26' Jack -- Jack Neely jjne...@ncsu.edu Linux Czar, OIT Campus Linux Services Office of Information Technology, NC State University GPG Fingerprint: 1917 5AC1 E828 9337 7AA4 EA6B 213B 765F 3B6A 5B89 ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] idle dead timeout issue?
On Tue, Apr 03, 2012 at 11:21:10AM -0400, Jeffrey Altman wrote: On 4/3/2012 11:13 AM, Jack Neely wrote: Folks, What's the status of the idle dead timeout issue? We are continuing to have issues with 1.6.1pre2. I've seen a lot of git activity and am wondering if the idle dead issue has been resolved at this point. Thanks, Jack What are the issues and why do you believe they are idle dead related? Because Russ told me so. ;-) https://lists.openafs.org/pipermail/openafs-info/2012-January/037524.html See the bottom of the email. I've grabbed 1.6.1, read the release notes and saw some notes that probably apply to this situation. I'm still unclear if the OpenAFS folks believe this issue is solved or just better. In any case there's nothing like tossing it on one of the web servers and giving her a spin. Performance appears better compared to our other web servers, slightly. However, we are still getting periods of time where AFS takes multiple seconds to 30 seconds to respond. Then suddenly, all hanging AFS transactions return at the same time. See the graph. The Y-axis is the number of httpd processes, the X-axis is the number of seconds past 13:00 today. (Data gathered from the http logs of how long each request took.) http://www4.ncsu.edu/~jjneely/web-apr4-1325.pdf Servers are still on 1.6.1pre2 and we are making plans to do more testing and then upgrade the servers to 1.6.1 final. Jack -- Jack Neely jjne...@ncsu.edu Linux Czar, OIT Campus Linux Services Office of Information Technology, NC State University GPG Fingerprint: 1917 5AC1 E828 9337 7AA4 EA6B 213B 765F 3B6A 5B89 ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] idle dead timeout issue?
Folks, What's the status of the idle dead timeout issue? We are continuing to have issues with 1.6.1pre2. I've seen a lot of git activity and am wondering if the idle dead issue has been resolved at this point. Thanks, Jack -- Jack Neely jjne...@ncsu.edu Linux Czar, OIT Campus Linux Services Office of Information Technology, NC State University GPG Fingerprint: 1917 5AC1 E828 9337 7AA4 EA6B 213B 765F 3B6A 5B89 ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Timeouts and odd behavior with 1.6.0 file servers
Folks, We've pushed patches to our servers to bring us up to 1.6.1pre2. However, we are still seeing a lot of reader_wait states and load spikes on our web servers. I know this is the idledead issue now. Is there anything else we can do to reduce the issue here? (Web server afs clients are 1.6.0.) Thanks, Jack Neely -- Jack Neely jjne...@ncsu.edu Linux Czar, OIT Campus Linux Services Office of Information Technology, NC State University GPG Fingerprint: 1917 5AC1 E828 9337 7AA4 EA6B 213B 765F 3B6A 5B89 ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Timeouts and odd behavior with 1.6.0 file servers
On Wed, Jan 25, 2012 at 02:22:26PM -0800, Russ Allbery wrote: Jack Neely jjne...@pams.ncsu.edu writes: We are working our way through a migration from old Sun AFS hardware running Openafs 1.4.11 to HP Blades running RHEL 6 with OpenAFS 1.6.0. At this point we've completed most of our file servers. Don't use the 1.6.0 file server. It has a data corruption problem when you have an inode clone (such as a backup volume or a migration clone) and directories are moved with mv. These are fixed in 1.6.1pre1 and in the Debian 1.6.0-3 packages. This may be what you're running into. RHEL 6 / 1.6.0 clients wired into network occasionally have long pauses when doing AFS operations, such as running ls. It may take 30 seconds to a minute for the AFS server (the datacenter is downstairs) to respond. We are not seeing high load or any signs on the server that something is wrong. The above applies as well to our web servers that are RHEL 6 / 1.6.0. Several times a week load on the web servers will suddenly spike and rxdebug tells us that RX calls to one of the AFS servers are all/mostly in the reader_wait state. Just as suddenly as it starts, its over with. call 0: # 5231, state active, mode: receiving, flags: reader_wait Our cron job that mirrors CPAN to AFS space now often fails with time out errors. readlink_stat(/afs/...) failed: Connection timed out (110) Yes, this is consistent with the problems that we're seeing on our web servers with OpenAFS 1.4 as well, which are probably due at least in part to the pathological idledead interactions with the way that server threads can back up waiting for vnode locks. 1.6.1pre2 (coming shortly) has both client and server fixes for the idledead part of this. -- Russ Allbery (r...@stanford.edu) http://www.eyrie.org/~eagle/ ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info So, I grabbed the current HEAD of the openafs-stable-1_6_x branch which looks to be prep'd for 1.6.1pre2. I build that and deployed it to a server I could do some testing on. I'm seeing good results, but we haven't finished our testing yet. Jack -- Jack Neely jjne...@ncsu.edu Linux Czar, OIT Campus Linux Services Office of Information Technology, NC State University GPG Fingerprint: 1917 5AC1 E828 9337 7AA4 EA6B 213B 765F 3B6A 5B89 ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] Timeouts and odd behavior with 1.6.0 file servers
Folks, We are working our way through a migration from old Sun AFS hardware running Openafs 1.4.11 to HP Blades running RHEL 6 with OpenAFS 1.6.0. At this point we've completed most of our file servers. With most of our volumes on 1.6.0 DAFS servers we have started to see some odd behavior. Our Subversion servers have kept SVN repos in AFS for years, we've not upgraded the Subversion software. (SVN servers are RHEL 5 with OpenAFS 1.4.11.) But now SVN often tells us: Transmitting file data ...svn: Commit failed (details follow): svn: database disk image is malformed At this point we know that the SQLite databases in Subversions fsfs backend has become corrupt. RHEL 6 / 1.6.0 clients wired into network occasionally have long pauses when doing AFS operations, such as running ls. It may take 30 seconds to a minute for the AFS server (the datacenter is downstairs) to respond. We are not seeing high load or any signs on the server that something is wrong. The above applies as well to our web servers that are RHEL 6 / 1.6.0. Several times a week load on the web servers will suddenly spike and rxdebug tells us that RX calls to one of the AFS servers are all/mostly in the reader_wait state. Just as suddenly as it starts, its over with. call 0: # 5231, state active, mode: receiving, flags: reader_wait Our cron job that mirrors CPAN to AFS space now often fails with time out errors. readlink_stat(/afs/...) failed: Connection timed out (110) All of these by themselves is just a fluke or a network glitch. But as time progresses we are starting to see a pattern emerge. Any clues of what may be happening? Jack Neely -- Jack Neely jjne...@ncsu.edu Linux Czar, OIT Campus Linux Services Office of Information Technology, NC State University GPG Fingerprint: 1917 5AC1 E828 9337 7AA4 EA6B 213B 765F 3B6A 5B89 ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Tracking down AFS Fileserver corruption
Folks, To follow up, I was able to solve or work around this particular issue. Turns out, the emcpower devices do need some sort of abstraction layer to work properly. This could be using linux's LVM system or having a partition table with one large partition. Those configuration scenarios do not exhibit the corruption issues I was having when attempting to utilize the raw block device. Jack Neely -- Jack Neely jjne...@ncsu.edu Linux Czar, OIT Campus Linux Services Office of Information Technology, NC State University GPG Fingerprint: 1917 5AC1 E828 9337 7AA4 EA6B 213B 765F 3B6A 5B89 ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] Tracking down AFS Fileserver corruption
Folks, I'm deploying new OpenAFS 1.6.0 DAFS file servers on fully update RHEL 6.1 servers and I've stumbled across a data corruption problem. My ext4 filesystem on the vice mounts are not getting corrupted, just the AFS volume data. Our /vicep[ab] mounts are provided by an EMC Clariion SAN array using the PowerPath driver. Each of the two vice mounts have 4 paths and are not partitioned. I've directly formatted the /dev/emcpower[ab] block device as ext4. Of course, the /dev/emcpowerX device is mounted on /vicepX. Every hour our OCS Inventory agent runs which eventually runs fdisk -l to get statistics for the storage on the server. When I was moving test volumes to the new server and the agent ran fdisk -l the kernel would print: Nov 28 13:01:39 xxx kernel: sdc: unknown partition table Nov 28 13:01:39 xxx kernel: sde: unknown partition table Nov 28 13:01:49 xxx kernel: sdc: unknown partition table Nov 28 13:01:49 xxx kernel: sde: unknown partition table and the volume being moved at that exact time would be corrupt. Usually the server would soon detect this and salvage the volume, but the level of corruptions has varied. The above messages and corruption only seem to happen when volume moves are in progress. Running fdisk -l on an idle server produces no messages. Other things cause the above messages to be re-printed, such as running fsck -yf /dev/emcpowera. They occur during the early hours of the morning as well from something that appears to be related to a cron job I've not tracked down yet. I need some help in figuring out what is causing the corruption and, more importantly, how to fix things. Thanks, Jack Neely -- Jack Neely jjne...@ncsu.edu Linux Czar, OIT Campus Linux Services Office of Information Technology, NC State University GPG Fingerprint: 1917 5AC1 E828 9337 7AA4 EA6B 213B 765F 3B6A 5B89 ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Tracking down AFS Fileserver corruption
On Mon, Nov 28, 2011 at 08:34:00PM +0100, Stephan Wiesand wrote: Hi Jack, no help, just a few dumb questions inline: On Nov 28, 2011, at 19:13 , Jack Neely wrote: Folks, I'm deploying new OpenAFS 1.6.0 DAFS file servers on fully update RHEL 6.1 servers and I've stumbled across a data corruption problem. My ext4 filesystem on the vice mounts are not getting corrupted, just the AFS volume data. Our /vicep[ab] mounts are provided by an EMC Clariion SAN array using the PowerPath driver. Each of the two vice mounts have 4 paths and are not partitioned. I've directly formatted the /dev/emcpower[ab] block device as ext4. Of course, the /dev/emcpowerX device is mounted on /vicepX. emcpower{a,b} map to sdc{c,e} ? emcpowera is made of the paths: sdc sde sdg sdi emcpowerb is made of the paths: sdb sdd sdf sdh Here's the information from the powermt tool: http://pastebin.com/sfmJX5Kc Every hour our OCS Inventory agent runs which eventually runs fdisk -l to get statistics for the storage on the server. When I was moving test volumes to the new server and the agent ran fdisk -l the kernel would print: Nov 28 13:01:39 xxx kernel: sdc: unknown partition table Nov 28 13:01:39 xxx kernel: sde: unknown partition table Nov 28 13:01:49 xxx kernel: sdc: unknown partition table Nov 28 13:01:49 xxx kernel: sde: unknown partition table If the devices aren't partitioned, why would it ever find a partition table? It shouldn't. But why does it keep looking (and cause corruption)? Before I figured out that the corruption was happening at the same time as these messages I didn't think that there was any connection. This may have changed, but Red Hat used to not support setups with filesystems on unpartitioned block devices, I believe. I have a support case open with Red Hat as well and they have not indicated this. In fact, not partitioning SAN devices (especially large ones) seems to be accepted practice nowadays. and the volume being moved at that exact time would be corrupt. Usually the server would soon detect this and salvage the volume, but the level of corruptions has varied. I don't have experience with running 1.6 servers in production yet, but since the AFS fileserver is entirely running in userland, it should not cause this kind of corruption. That being said, there's an open BZ regarding ext4 corruption due to Ceph userland processes... The ext4 file system is not corrupted...so I think the afs daemons are somehow being disturbed and not writing complete data. The above messages and corruption only seem to happen when volume moves are in progress. Running fdisk -l on an idle server produces no messages. Any messages if you run bonnie++ or iozone on the filesystem when the agent runs? Haven't tried yet. Good idea though. Other things cause the above messages to be re-printed, such as running fsck -yf /dev/emcpowera. Is this safe to do on a mounted ext4 filesystem? I ran fsck on the unmounted SAN LUN to make sure I didn't have file system corruption. I was surprised that it seemed to trigger partition rescans again Jack They occur during the early hours of the morning as well from something that appears to be related to a cron job I've not tracked down yet. I need some help in figuring out what is causing the corruption and, more importantly, how to fix things. If the AFS fileserver could be run under a different account than root, one could be completely confident it's not the culprit. As things are, I'm only 99% confident... Best regards, Stephan Thanks, Jack Neely -- Jack Neely jjne...@ncsu.edu Linux Czar, OIT Campus Linux Services Office of Information Technology, NC State University GPG Fingerprint: 1917 5AC1 E828 9337 7AA4 EA6B 213B 765F 3B6A 5B89 ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info -- Stephan Wiesand DESY -DV- Platanenenallee 6 15738 Zeuthen, Germany ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info -- Jack Neely jjne...@ncsu.edu Linux Czar, OIT Campus Linux Services Office of Information Technology, NC State University GPG Fingerprint: 1917 5AC1 E828 9337 7AA4 EA6B 213B 765F 3B6A 5B89 ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] maximum volume size in OpenAFS 1.6
On Mon, Nov 28, 2011 at 03:20:53PM -0500, Rich Sudlow wrote: What's the maximum volume size with OpenAFS 1.6? I don't believe that quota's are enforced when going over 2 TB volumes - is that correct? Does any site run this routinely? Thanks, Rich -- Rich Sudlow University of Notre Dame Center for Research Computing - Union Station 310 West South St South Bend, In 46601 (574) 631-7258 (office) (574) 807-1046 (cell) Volume sizes can grow to a max of 16EiB on 64 bit systems. Partition sizes also have the same limit. The quota size is still a signed 32 bit int IIRC. So the max quota you can set is 2TiB. However, setting the quota to 0 or unlimited will allow you to use any available space on the partition that volume is on. I've seen 10TiB AFS partitions. Things worked normally although the output from fs lq does show some integers that have overflowed. Jack Neely -- Jack Neely jjne...@ncsu.edu Linux Czar, OIT Campus Linux Services Office of Information Technology, NC State University GPG Fingerprint: 1917 5AC1 E828 9337 7AA4 EA6B 213B 765F 3B6A 5B89 ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] Fedora 15 bugs
Folks, I'm building OpenAFS packages on Fedora 15 and having some trouble getting 1.6.0-pre6 working well. Consistently, afsd is segfaulting to some degree. Dmesg reveals: afsd[1319]: segfault at 0 ip 4e4476de sp bf912864 error 4 in libc-2.14.so[4e3df000+185000] Although the AFS service seems to start. I've gathered all the debug information, core dumps, and the packages I'm using here: http://callandor.unity.ncsu.edu/~slack/openafs/20110608/ The coredump-abrt-1301.tar.bz2 contains the coredump and other information recorded by ABRT. Jack -- Jack Neely jjne...@ncsu.edu Linux Czar, OIT Campus Linux Services Office of Information Technology, NC State University GPG Fingerprint: 1917 5AC1 E828 9337 7AA4 EA6B 213B 765F 3B6A 5B89 ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Fedora 15 bugs
I do. I'd very much like the package to work equally with and without SELinux. Indeed, things do behave much better without SELinux enabled. I do not see any avc denials so this doesn't appear to be a SELinux policy related matter. Jack On Wed, Jun 08, 2011 at 01:34:40PM -0400, omall...@msu.edu wrote: I assume no, but I will ask anyway. Do you have selinux enabled? Quoting Jack Neely jjne...@pams.ncsu.edu: Folks, I'm building OpenAFS packages on Fedora 15 and having some trouble getting 1.6.0-pre6 working well. Consistently, afsd is segfaulting to some degree. Dmesg reveals: afsd[1319]: segfault at 0 ip 4e4476de sp bf912864 error 4 in libc-2.14.so[4e3df000+185000] Although the AFS service seems to start. I've gathered all the debug information, core dumps, and the packages I'm using here: http://callandor.unity.ncsu.edu/~slack/openafs/20110608/ The coredump-abrt-1301.tar.bz2 contains the coredump and other information recorded by ABRT. Jack -- Jack Neely jjne...@ncsu.edu Linux Czar, OIT Campus Linux Services Office of Information Technology, NC State University GPG Fingerprint: 1917 5AC1 E828 9337 7AA4 EA6B 213B 765F 3B6A 5B89 ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info -- The information in this email, and attachment(s) thereto, is strictly confidential and may be legally privileged. It is intended solely for the named recipient(s), and access to this e-mail, or any attachment(s) thereto, by anyone else is unauthorized. Violations hereof may result in legal actions. Any attachment(s) to this e-mail have been checked for viruses, but please rely on your own virus-checker and procedures. If you contact us by e-mail, we will store your name and address to facilitate communications in the matter concerned. If you do not consent to us storing your name and address for above stated purpose, please notify the sender promptly. Also, if you are not the intended recipient please inform the sender by replying to this transmission, and delete the e-mail, its attachment(s), and any copies of it without, disclosing it. -- Jack Neely jjne...@ncsu.edu Linux Czar, OIT Campus Linux Services Office of Information Technology, NC State University GPG Fingerprint: 1917 5AC1 E828 9337 7AA4 EA6B 213B 765F 3B6A 5B89 ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] Packaging OpenAFS 1.6 for Fedora 15
Folks, I'm updating the Fedora 15 packages of OpenAFS in rpmFusion. I'm using 1.6.0pre4 and I've applied the 2 patches from git relating to changes in the 2.6.39 kernel. Specifically: a8aa6f4221309f44f49cdd00acce88122f1753f6 1e322b883e036fe0bd5468fe60a0431545fe2376 At this point the kernel module compiles but complains about GPL only symbols: FATAL: modpost: GPL-incompatible module libafs.ko uses GPL-only symbol '__init_work' I don't see anything in git that addresses this. Any ideas? Jack -- Jack Neely jjne...@ncsu.edu Linux Czar, OIT Campus Linux Services Office of Information Technology, NC State University GPG Fingerprint: 1917 5AC1 E828 9337 7AA4 EA6B 213B 765F 3B6A 5B89 ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] openafs packages for Fodera 13?
On Thu, Jun 10, 2010 at 11:29:07PM +0100, Simon Wilkinson wrote: On 10 Jun 2010, at 22:52, Derrick Brashear wrote: Does anyone know when the yum repository for Fodera 13 would come out? It looks like RPM fusion has openafs rpms available already. But I think I read somewhere before that they are not directly from openafs.org. Indeed, the RPM Fusion RPMs are not the same as the openafs.org ones. I've ranted on this subject to this list too many times before [1], but the summary is that the RPMFusion RPMS install things in such a way that you are likely to find much of the OpenAFS documentation misleading, and you are unlikely to be able to get community support as readily as when you use an OpenAFS build. Cheers, Simon. [1] - https://lists.openafs.org/pipermail/openafs-info/2010-March/033162.html ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info I'll be glad to help out folks with the RPMFusion RPMs as I maintain them. You're welcome to file a bug there: http://bugzilla.rpmfusion.org/ Or email me or the list. My motivation for doing so are two fold. First, I needed OpenAFS packages that conformed to the Filesystem Hierarchy Standard (FHS). Secondly, the current OpenAFS packages use an older kmod standard for supporting kernel modules in Fedora/RHEL, and I needed the more recent kmod standard to integrate with the other kernel modules I deal with. Jack -- Jack Neely jjne...@ncsu.edu Linux Czar, OIT Campus Linux Services Office of Information Technology, NC State University GPG Fingerprint: 1917 5AC1 E828 9337 7AA4 EA6B 213B 765F 3B6A 5B89 ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: openafs packages for Fodera 13?
On Tue, Jun 15, 2010 at 12:35:34AM +0200, Angel Marin wrote: A couple days ago I tried them, the worst offenders for us were: https://bugzilla.rpmfusion.org/show_bug.cgi?id=1274 https://bugzilla.rpmfusion.org/show_bug.cgi?id=1275 Well, that and the fact that there are no man pages on the packages ... WTF? -- Angel Marin http://anmar.eu.org/ Thanks a bunch for the bug reports. Stick one in there about the man pages too. I'm about to leave for vacation outside my country, so if I don't have these fixed in the next few days it will be a couple weeks. Jack -- Jack Neely jjne...@ncsu.edu Linux Czar, OIT Campus Linux Services Office of Information Technology, NC State University GPG Fingerprint: 1917 5AC1 E828 9337 7AA4 EA6B 213B 765F 3B6A 5B89 ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] kernel panics in afs_GetDCache
On Mon, Feb 15, 2010 at 11:07:00PM +, Simon Wilkinson wrote: I suspect we should probably move this into RT, but I thought recording the steps taken so far might be of use to others. Thanks Simon! Do I need to open the RT ticket on this? Jack -- Jack Neely jjne...@ncsu.edu Linux Czar, OIT Campus Linux Services Office of Information Technology, NC State University GPG Fingerprint: 1917 5AC1 E828 9337 7AA4 EA6B 213B 765F 3B6A 5B89 ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] kernel panics in afs_GetDCache
Folks, I'm seeing regular kernel panics on our apache web servers. The servers are RHEL 5.4 with kernel version 2.6.18-164.11.1.el5 or at least 2.6.18-164.*. They are running OpenAFS 1.4.11. It appears to always be afs_GetDCache: EIP: [f8bebfd9] afs_GetDCache+0x1c0a/0x2d6d [libafs] SS:ESP 0068:f78f2e68 http://www4.ncsu.edu/~jjneely/getdcache.jpg I'm using the options: -stat 2000 -dcache 800 -daemons 3 -volumes 70 -nosettime -memcache -afsdb and I'm thinking that the high volume apache server is busting its cache. (Most of those arguments have been used...historically at this point.) Am I going in the right direction with this, or is this panic being caused by something else? Jack -- Jack Neely jjne...@ncsu.edu Linux Czar, OIT Campus Linux Services Office of Information Technology, NC State University GPG Fingerprint: 1917 5AC1 E828 9337 7AA4 EA6B 213B 765F 3B6A 5B89 ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] kernel panics in afs_GetDCache
On Mon, Feb 15, 2010 at 08:23:11PM +, Simon Wilkinson wrote: On 15 Feb 2010, at 20:11, Jack Neely wrote: Am I going in the right direction with this, or is this panic being caused by something else? Are you using the kmod's from openafs.org (if you are, I can turn that stack trace into something understandable, if you aren't it's going to be harder) S. No, I'm using my own kmods. But sounds like you are looking for the debuginfo package? http://install.linux.ncsu.edu/pub/yum/CLS/RealmLinux/5/base/i386/debuginfo/openafs-kmod-debuginfo-1.4.11-1.2.6.18_164.11.1.el5.i686.rpm Jack -- Jack Neely jjne...@ncsu.edu Linux Czar, OIT Campus Linux Services Office of Information Technology, NC State University GPG Fingerprint: 1917 5AC1 E828 9337 7AA4 EA6B 213B 765F 3B6A 5B89 ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] kernel panics in afs_GetDCache
On Mon, Feb 15, 2010 at 10:09:59PM +, Simon Wilkinson wrote: On 15 Feb 2010, at 20:11, Jack Neely wrote: I'm seeing regular kernel panics on our apache web servers. The servers are RHEL 5.4 with kernel version 2.6.18-164.11.1.el5 or at least 2.6.18-164.*. They are running OpenAFS 1.4.11. It appears to always be afs_GetDCache: EIP: [f8bebfd9] afs_GetDCache+0x1c0a/0x2d6d [libafs] SS:ESP 0068:f78f2e68 http://www4.ncsu.edu/~jjneely/getdcache.jpg The stacktrace in that image, and the version details you report, don't match. Can you confirm that the crash you're seing on 2.6.18-164.11.1.el5 is at afs_GetDCache+0x1c0a? S. Gah... My reading skills need help, apperently. That was 2.6.18-128.7.1.el5 with openafs 1.4.11. http://install.linux.ncsu.edu/pub/yum/CLS/RealmLinux/5/base/i386/debuginfo/openafs-kmod-debuginfo-1.4.11-1.2.6.18_128.7.1.el5.i686.rpm Jack -- Jack Neely jjne...@ncsu.edu Linux Czar, OIT Campus Linux Services Office of Information Technology, NC State University GPG Fingerprint: 1917 5AC1 E828 9337 7AA4 EA6B 213B 765F 3B6A 5B89 ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Building 1.4.11 on fedora 12 ppc64
On Wed, Jan 06, 2010 at 07:40:18PM +, Simon Wilkinson wrote: So, I think the problem here is that your RPM is telling OpenAFS to build for the 'ppc_linux26' architecture, whereas you should actually building for the ppc64_linux26 architecture. From the output, I don't think you're using the OpenAFS RPM specfile, so you're probably best asking whoever produced your RPMs how they're generating the sysname that they pass to configure. Thanks Simon. The package maintainer is me and that was exactly what I was trying to figure out. Thanks, Jack -- Jack Neely jjne...@ncsu.edu Linux Czar, OIT Campus Linux Services Office of Information Technology, NC State University GPG Fingerprint: 1917 5AC1 E828 9337 7AA4 EA6B 213B 765F 3B6A 5B89 ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] Building 1.4.11 on fedora 12 ppc64
Folks, I'm trying to build some packages on Fedora 12. The package works well for intel arches that I'm familar with. But it fails on ppc64, which I'm not very familar with. The full logs are here: http://www4.ncsu.edu/~jjneely/build.log Things look to go off the wall around here, right when the kernel build starts: /builddir/build/BUILD/openafs-kmod-1.4.11/_kmod_build_2.6.30.10-105.fc11.ppc64/src/config/Makefile.config:137: warning: ignoring old commands for target `.c.o' make[5]: Entering directory `/usr/src/kernels/2.6.30.10-105.fc11.ppc64' CC [M] /builddir/build/BUILD/openafs-kmod-1.4.11/_kmod_build_2.6.30.10-105.fc11.ppc64/src/libafs/MODLOAD-2.6.30.10-105.fc11.ppc64-MP/afs_atomlist.o CC [M] /builddir/build/BUILD/openafs-kmod-1.4.11/_kmod_build_2.6.30.10-105.fc11.ppc64/src/libafs/MODLOAD-2.6.30.10-105.fc11.ppc64-MP/afs_lhash.o CC [M] /builddir/build/BUILD/openafs-kmod-1.4.11/_kmod_build_2.6.30.10-105.fc11.ppc64/src/libafs/MODLOAD-2.6.30.10-105.fc11.ppc64-MP/afs_analyze.o In file included from /builddir/build/BUILD/openafs-kmod-1.4.11/_kmod_build_2.6.30.10-105.fc11.ppc64/src/afs/afsincludes.h:54, from /builddir/build/BUILD/openafs-kmod-1.4.11/_kmod_build_2.6.30.10-105.fc11.ppc64/src/libafs/MODLOAD-2.6.30.10-105.fc11.ppc64-MP/afs_analyze.c:37: /builddir/build/BUILD/openafs-kmod-1.4.11/_kmod_build_2.6.30.10-105.fc11.ppc64/src/afs/afs_prototypes.h:950: warning: 'struct flock64' declared inside parameter list /builddir/build/BUILD/openafs-kmod-1.4.11/_kmod_build_2.6.30.10-105.fc11.ppc64/src/afs/afs_prototypes.h:950: warning: its scope is only this definition or declaration, which is probably not what you want I'd appreciate some help in figuring out what's not right. Thanks, Jack Neely -- Jack Neely jjne...@ncsu.edu Linux Czar, OIT Campus Linux Services Office of Information Technology, NC State University GPG Fingerprint: 1917 5AC1 E828 9337 7AA4 EA6B 213B 765F 3B6A 5B89 ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] dynroot question
On Wed, Aug 05, 2009 at 07:08:30PM +0100, Simon Wilkinson wrote: On 5 Aug 2009, at 19:03, Russ Allbery wrote: Apache recursively ascends the file hierarchy looking for .htaccess files even if that directory itself is not being served, so it will attempt to read /afs/.htaccess if you are serving any directory anywhere under / afs. I haven't looked at the code, so we may be already doing this, but it seems to me that we could just bounce requests for /afs/.htaccess immediately. In fact, there's probably a range of things that it makes no sense to do DNS lookups for. S. I agree here. Turns out I am sending out DNS queries from each of the web servers for the htaccess cell 20 or so times a second. Jack -- Jack Neely jjne...@ncsu.edu Linux Czar, OIT Campus Linux Services Office of Information Technology, NC State University GPG Fingerprint: 1917 5AC1 E828 9337 7AA4 EA6B 213B 765F 3B6A 5B89 ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] dynroot question]
On Wed, Aug 05, 2009 at 07:08:30PM +0100, Simon Wilkinson wrote: On 5 Aug 2009, at 19:03, Russ Allbery wrote: Apache recursively ascends the file hierarchy looking for .htaccess files even if that directory itself is not being served, so it will attempt to read /afs/.htaccess if you are serving any directory anywhere under / afs. I haven't looked at the code, so we may be already doing this, but it seems to me that we could just bounce requests for /afs/.htaccess immediately. In fact, there's probably a range of things that it makes no sense to do DNS lookups for. S. I agree here. Turns out I am sending out DNS queries from each of the web servers for the htaccess cell 20 or so times a second. Jack -- Jack Neely jjne...@ncsu.edu Linux Czar, OIT Campus Linux Services Office of Information Technology, NC State University GPG Fingerprint: 1917 5AC1 E828 9337 7AA4 EA6B 213B 765F 3B6A 5B89 ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] dynroot question
On Wed, Aug 05, 2009 at 10:02:21AM -0400, Derrick Brashear wrote: On Wed, Aug 5, 2009 at 10:00 AM, Jack Neelyjjne...@pams.ncsu.edu wrote: Folks, I'm having an issue with the dynroot functionality on my web servers. I've straced the httpd process and discovered that it is attempting to stat() /afs/.htaccess which, of course, doesn't exist. Â The problem being that AFS takes 10 to 20 seconds or more to return the stat call. The problem comes and goes. Â I'm not exactly sure what is triggering it and would like some help figuring that out. We're running 1.4.10 client side and the servers with the web volumes are 1.4.7 and 1.4.10. Â We are in the process of moving everything to 1.4.11 but wanted to try to track down this issue. tcpdump. i assume you see no afs traffic, but you do see dns traffic. yes? what's the round trip time on that? fstrace. what do you see? We did find the cause for this specific problem. Our DNS servers' firewall was dropping packets when the ip_conntrack table became full. Thanks for the help! Jack -- Jack Neely jjne...@ncsu.edu Linux Czar, OIT Campus Linux Services Office of Information Technology, NC State University GPG Fingerprint: 1917 5AC1 E828 9337 7AA4 EA6B 213B 765F 3B6A 5B89 ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] dynroot question
Folks, I'm having an issue with the dynroot functionality on my web servers. I've straced the httpd process and discovered that it is attempting to stat() /afs/.htaccess which, of course, doesn't exist. The problem being that AFS takes 10 to 20 seconds or more to return the stat call. The problem comes and goes. I'm not exactly sure what is triggering it and would like some help figuring that out. We're running 1.4.10 client side and the servers with the web volumes are 1.4.7 and 1.4.10. We are in the process of moving everything to 1.4.11 but wanted to try to track down this issue. Turning dynroot off, of course, fixes the issue. Jack -- Jack Neely jjne...@ncsu.edu Linux Czar, OIT Campus Linux Services Office of Information Technology, NC State University GPG Fingerprint: 1917 5AC1 E828 9337 7AA4 EA6B 213B 765F 3B6A 5B89 ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] com_err issues building afs-krb5
Folks, I'm attempting to build the AFS krb5 migration kit. I haven't done so in a while and ran into some problems with our friend com_err. I'm using the patches offered on the OpenAFS website and an additional patch to handle the /usr/lib v /usr/lib64 issues. The patches, spec and such can be found here: https://svn.linux.ncsu.edu/svn/clspackages/rpms/afs-krb5/EL5/ The afs-krb5 stuff finds the system's libcom_err.so but AFS builds and links against its own and always complains about missing symbols of afs_com_err and friends. Is there a better way to correct this rather than just adding /usr/lib64/afs/libcom_err.a to the LIBS variable in the proper place in configure.in? Thanks, Jack Neely -- Jack Neely [EMAIL PROTECTED] Linux Czar, OIT Campus Linux Services Office of Information Technology, NC State University GPG Fingerprint: 1917 5AC1 E828 9337 7AA4 EA6B 213B 765F 3B6A 5B89 gcc -o aklog aklog.o aklog_main.o aklog_param.o krb_util.o linked_list.o adderrtable.o -lkrb5 -lk5crypto -lcom_err -L/usr/lib64 -L/usr/lib -L/usr/lib64/afs -L/usr/lib/afs -lsys -lprot -lubik -lauth -lrxkad -lrx -llwp -ldes -lsys /usr/lib64/afs/util.a -lresolv aklog_param.o: In function `aklog_init_params': /home/slack/RPM/BUILD/afs-krb5/src/aklog_param.c:213: warning: the `getwd' function is dangerous and should not be used. /usr/lib64/afs/libprot.a(ptuser.o): In function `pr_Initialize': (.text+0xea6): undefined reference to `afs_com_err' /usr/lib64/afs/libprot.a(ptuser.o): In function `pr_Initialize': (.text+0x101c): undefined reference to `afs_com_err' /usr/lib64/afs/libprot.a(ptuser.o): In function `pr_Initialize': (.text+0x11f7): undefined reference to `afs_error_message' /usr/lib64/afs/libprot.a(pterror.o): In function `initialize_PT_error_table': (.text+0x8): undefined reference to `afs_add_to_error_table' /usr/lib64/libubik.a(uerrors.o): In function `initialize_U_error_table': (.text+0x8): undefined reference to `afs_add_to_error_table' /usr/lib64/afs/libauth.a(acfg_errors.o): In function `initialize_ACFG_error_table': (.text+0x8): undefined reference to `afs_add_to_error_table' /usr/lib64/afs/libauth.a(ktc_errors.o): In function `initialize_KTC_error_table': (.text+0x8): undefined reference to `afs_add_to_error_table' /usr/lib64/librxkad.a(rxkad_errs.o): In function `initialize_RXK_error_table': (.text+0x8): undefined reference to `afs_add_to_error_table' collect2: ld returned 1 exit status make: *** [aklog] Error 1 error: Bad exit status from /var/tmp/rpm-tmp.8046 (%build)
Re: [OpenAFS] 1.4.7pre3 client success on EL45 - and a question
On Wed, Apr 09, 2008 at 05:45:59PM +0200, Stephan Wiesand wrote: 1.4.7pre3 builds, and the client works, for us on SL4 and SL5, i386 and x86_64. And here's the question: We're trying to do something that doesn't work in AFS space under certain circumstances. We don't know yet what makes it fail or work, but it consistently either fails or works on any client, and all clients have a very similar setup. All clients are SL4, amd64, latest kernel (2.6.9-67.0.7.ELsmp). The failing procedure is a bit convoluted, and I don't know in every detail what it's doing. But the part that fails on some clients is that RPMs get installed, with the RPMDB in AFS, and if it fails we get three messages afs: failed to store file (13) and a wedged RPMDB. And, with 1.4.7pre3 but not 1.4.6, we see two more messages: WARNING: afs_ufswr vcp=10396e494c0, exOrW=0 WARNING: afs_ufswr vcp=10396e49140, exOrW=0 Any hint what these are about would probably be very helpful. Thanks, Stephan PS On SL3, inserting the module from 1.4.7pre3 fails with the message that hlist_unhashed is GPLONLY. I'll file a bug in RT. -- Stephan Wiesand DESY - DV - Platanenallee 6 15738 Zeuthen, Germany ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info Well, the RPM db is a sleepycat database. Which is not known to work well in AFS space. Jack Neely -- Jack Neely [EMAIL PROTECTED] Linux Czar, OIT Campus Linux Services Office of Information Technology, NC State University GPG Fingerprint: 1917 5AC1 E828 9337 7AA4 EA6B 213B 765F 3B6A 5B89 ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] fakestat-all weirdness
Folks, With my RHEL 5 deployment I've begun using the -fakestat-all flag which I'd hoped would make things work better and faster producing less load on our AFS servers. However, we have discovered that if AFS client B altered files used by AFS client A with the fakestat-all flag that AFS client A would exhibit some weird behavior. When we removed the fakestat-all option and rebooted the normal behavior resumed. We noticed this when folks would update their web content and the web server would either not see the updated files or the updated files would produce read errors. We have also discovered that our Solaris 10 machines with OpenAFS 1.4.6 using the fakestat-all option does not exhibit the broken behavior. To my understanding the fakestat option only affects mount points. Anyone know what's going on here? I've included the tests that we've been using to reproduce. Jack Neely -- Jack Neely [EMAIL PROTECTED] Linux Czar, OIT Campus Linux Services Office of Information Technology, NC State University GPG Fingerprint: 1917 5AC1 E828 9337 7AA4 EA6B 213B 765F 3B6A 5B89 Login to my home directory: /afs/xxx/brabec users.brabec 538563526 RW 759295 K On-line uni10f.unity.ncsu.edu /vicepa RWrite 538563526 ROnly 0 Backup 538563561 MaxQuota150 K CreationWed Jan 8 15:33:44 2003 Last Update Thu Mar 13 17:18:56 2008 15582 accesses in the past day (i.e., vnode references) RWrite: 538563526 Backup: 538563561 number of sites - 1 server uni10f.unity.ncsu.edu partition /vicepa RW Site SunOS uni10f 5.8 Generic_117350-04 sun4u sparc SUNW,Sun-Fire-280R @(#) OpenAFS 1.2.13 built 2004-11-03 on these hosts: - mosa = my desktop rh9 machine with working AFS openafs-1.2.10-3.9.1 openafs-client-1.2.10-3.9.1 openafs-devel-1.2.10-3.9.1 openafs-kernel-1.2.10-3.9.1 openafs-kernel-source-1.2.10-3.9.1 - web03rmw = a rhel5 server exhibiting the problem openafs-1.4.6-2.EL5 openafs-client-1.4.6-2.EL5 - also tried uni42ws, which is a rhel3 box without this problem openafs-1.2.11-20.EL openafs-client-1.2.11-20.EL openafs-kernel-1.2.11-20.EL # creating a file works mosa% vi afstest1 # add some text content mosa% ls -al afs* -rw-r--r--1 brabec ncsu 31 Mar 13 16:48 afstest1 web03rmw% ls -al afs* -rw-r--r-- 1 brabec ncsu 31 Mar 13 16:48 afstest1 # editing a file on mosa does not mosa% vi afstest1 # add more content mosa% ls -al afs* -rw-r--r--1 brabec ncsu 60 Mar 13 16:50 afstest1 web03rmw% ls -al afs* -rw-r--r-- 1 brabec ncsu 31 Mar 13 16:48 afstest1 # note the old timestamp and unchanged file size # attempted to edit same file on web03rmw... this time I got # a read error, and an empty file in vim. ls now shows the correct # file size and subsequent reads work correctly. # on other tries, I have gotten the cached copy of the file, and I could # make changes and overwrite the file in AFS, quietly losing the # changes made on mosa. # creating a file on rhel5 works web03rmw% vi afstest2 # add some content web03rmw% ls -al afs* -rw-r--r-- 1 brabec ncsu 60 Mar 13 16:50 afstest1 -rw-r--r-- 1 brabec ncsu 24 Mar 13 16:52 afstest2 mosa% ls -al afs* -rw-r--r--1 brabec ncsu 60 Mar 13 16:50 afstest1 -rw-r--r--1 brabec ncsu 24 Mar 13 16:52 afstest2 # editing the file on rhel5 works web03rmw% vi afstest2 # add some more content web03rmw% ls -al afs* -rw-r--r-- 1 brabec ncsu 60 Mar 13 16:50 afstest1 -rw-r--r-- 1 brabec ncsu 36 Mar 13 16:55 afstest2 mosa% ls -al afs* -rw-r--r--1 brabec ncsu 60 Mar 13 16:50 afstest1 -rw-r--r--1 brabec ncsu 36 Mar 13 16:55 afstest2 # edit the second file on mosa first, web03rmw second mosa% vi afstest2 # found 2 comments, added a third web03rmw% vi afstest2 # found only 2 comments, added a different third web03rmw% ls -al afs* -rw-r--r-- 1 brabec ncsu 60 Mar 13 16:50 afstest1 -rw-r--r-- 1 brabec ncsu 53 Mar 13 16:57 afstest2 mosa% ls -al afs* -rw-r--r--1 brabec ncsu 60 Mar 13 16:50 afstest1 -rw-r--r--1 brabec ncsu 53 Mar 13 16:57 afstest2 # cat on both machines shows the same content... that added second by the # rhel5/rmw host. the changes made on mosa were lost. # timing... # made edits to afstest2 and afstest3 on mosa, the ls looks like this -rw-r--r--1 brabec ncsu 60 Mar 13 16:50 afstest1 -rw-r--r--1 brabec ncsu 79 Mar 13 16:59 afstest2 -rw-r--r--1 brabec ncsu 31 Mar 13 17:01 afstest3 # and the second edit was made at precisely Thu Mar 13 17:01:27 EDT 2008 # on web03rmw... Thu Mar 13 17:07:46 EDT 2008 -rw-r--r-- 1 brabec ncsu 60 Mar 13 16:50 afstest1 -rw-r--r-- 1 brabec ncsu 53 Mar 13 16:57 afstest2 -rw-r--r-- 1 brabec ncsu 12 Mar 13 17:01 afstest3 % cat afstest3 cat: afstest3: No such file
Re: [OpenAFS] why linux sysnames are different
I've been using my own sysname scheme ever since we've been able to provide a sysname list. So an i386 normally has ia32_rhel4 and i386_linux26. For RHEL 5 ia32_rhel5 and i386_linux26. Jack Neely -- Jack Neely [EMAIL PROTECTED] Linux Czar, OIT Campus Linux Services Office of Information Technology, NC State University GPG Fingerprint: 1917 5AC1 E828 9337 7AA4 EA6B 213B 765F 3B6A 5B89 ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] get-time errors from working afs server
On Fri, Dec 07, 2007 at 03:07:42PM -0500, Derrick Brashear wrote: On Dec 7, 2007 2:56 PM, Jack Neely [EMAIL PROTECTED] wrote: Folks, I have a specific client workstation that refuses to talk to an AFS server (or vice versa). Of course its the server that contains the user's home directory that normally uses this workstation. Other identical clients are able to work with this server and the server appears to be working normally. The workstations are RHEL 5 with OpenAFS 1.4.4 and kernel 2.6.18-53.1.4.el5. And the server? Solaris 8. OpenAFS 1.2.13 -- Jack Neely [EMAIL PROTECTED] Linux Czar, OIT Campus Linux Services Office of Information Technology, NC State University GPG Fingerprint: 1917 5AC1 E828 9337 7AA4 EA6B 213B 765F 3B6A 5B89 ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] gnome-vfs and AFS home directories
On Tue, Aug 21, 2007 at 05:24:57PM -0700, Russ Allbery wrote: Jack Neely [EMAIL PROTECTED] writes: We are starting to deploy RHEL 5 with OpenAFS 1.4.4. All users have there home directories in AFS. The Trash can on the gnome desktop doesn't work. I found out that the Trash functionality in gnome-vfs was turned off for AFS filesystems. I patched this in Red Hat Bug #253090. Now, the Trash works for some users and not for others. I can sit and watch the opened Trash folder full of items. Suddenly all the files vanish...then reappear. If my research is right, gnome-vfs/nautilis use the kernel inotify stuff to monitor files. It appears that AFS isn't properly supporting this. Can anyone clarify what's happening here? Does it help to uninstall fam and install gamin instead? RHEL 5 (RHEL 4 as well) use gamin by default. Although, to confirm that I did discover gamin's config file (or lack thereof in RHEL). Tomorow I'll see if I can tweek its options and get better results. Thanks, Jack -- Jack Neely [EMAIL PROTECTED] NCSU Campus Linux Services Lead Information Technology Division, NC State University GPG Fingerprint: 1917 5AC1 E828 9337 7AA4 EA6B 213B 765F 3B6A 5B89 ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] gnome-vfs and AFS home directories
Folks, We are starting to deploy RHEL 5 with OpenAFS 1.4.4. All users have there home directories in AFS. The Trash can on the gnome desktop doesn't work. I found out that the Trash functionality in gnome-vfs was turned off for AFS filesystems. I patched this in Red Hat Bug #253090. Now, the Trash works for some users and not for others. I can sit and watch the opened Trash folder full of items. Suddenly all the files vanish...then reappear. If my research is right, gnome-vfs/nautilis use the kernel inotify stuff to monitor files. It appears that AFS isn't properly supporting this. Can anyone clarify what's happening here? Jack Neely -- Jack Neely [EMAIL PROTECTED] NCSU Campus Linux Services Lead Information Technology Division, NC State University GPG Fingerprint: 1917 5AC1 E828 9337 7AA4 EA6B 213B 765F 3B6A 5B89 ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] Fun kernel panics on RHEL 4
Greetings, While poking around to figure out how to get the afsd daemon with the -verbose and -debug flags to log someplace useful (*grumbles at initlog*) I discovered a neat kernel panic. With AFS already running typing afsd -debug as root results in a kernel panic and a frozen machine. I have attached the output of the afsd -debug command and the resulting kernel panic. This is OpenAFS 1.4.2 on RHEL 4, kernel version 2.6.9-42.0.8.ELsmp. I know that's not the proper way to run the daemon in debug mode...but I shouldn't be able to trick fellow sysadmins to kernel panic their machine so easily. Although, it is amusing. :-) Jack -- Jack Neely [EMAIL PROTECTED] Campus Linux Services Project Lead Information Technology Division, NC State University GPG Fingerprint: 1917 5AC1 E828 9337 7AA4 EA6B 213B 765F 3B6A 5B89 Feb 14 18:32:40 tweety kernel: openafs: afs_InitCacheInfo --- called for non-ufs cache![ cut here ] Feb 14 18:32:40 tweety kernel: kernel BUG at /home/slack/RPM/BUILD/openafs-1.4.2/src/libafs/MODLOAD-2.6.9-42.0.8.ELsmp-MP/afs_init.c:337! Feb 14 18:32:40 tweety kernel: invalid operand: [#1] Feb 14 18:32:40 tweety kernel: SMP Feb 14 18:32:40 tweety kernel: Modules linked in: i915 parport_pc lp parport autofs4 i2c_dev i2c_core libafs(U) sunrpc ipt_REJECT ipt_state ip_conntrack iptable_filter ip_tables dm_mirror dm_mod button battery ac md5 ipv6 uhci_hcd ehci_hcd hw_random snd_intel8x0 snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd_page_alloc snd_mpu401_uart snd_rawmidi snd_seq_device snd soundcore 3c59x mii floppy ext3 jbd ata_piix libata sd_mod scsi_mod Feb 14 18:32:40 tweety kernel: CPU:0 Feb 14 18:32:40 tweety kernel: EIP:0060:[e042d237]Tainted: PF VLI Feb 14 18:32:40 tweety kernel: EFLAGS: 00010216 (2.6.9-42.0.8.ELsmp) Feb 14 18:32:40 tweety kernel: EIP is at afs_InitCacheInfo+0x1f/0xd7 [libafs] Feb 14 18:32:40 tweety kernel: eax: 003b ebx: c1622980 ecx: d3755e24 edx: e0478290 Feb 14 18:32:40 tweety kernel: esi: edi: 0007 ebp: c1622980 esp: d3755e20 Feb 14 18:32:40 tweety kernel: ds: 007b es: 007b ss: 0068 Feb 14 18:32:40 tweety kernel: Process afsd (pid: 4872, threadinfo=d3755000 task=d39c39b0) Feb 14 18:32:40 tweety kernel: Stack: e0478290 d3755e2c c02d2e82 0246 e04905e0 d3755e44 Feb 14 18:32:40 tweety kernel:c02d2e82 e04905e0 e046b2a8 d46ad8b0 2d29ad35 c016728c d5342580 Feb 14 18:32:40 tweety kernel:d3755ebc d3755ef8 0001 0101 de9b401a dee9d180 d400c804 Feb 14 18:32:40 tweety kernel: Call Trace: Feb 14 18:32:40 tweety kernel: [c02d2e82] __cond_resched+0x14/0x39 Feb 14 18:32:40 tweety kernel: [c02d2e82] __cond_resched+0x14/0x39 Feb 14 18:32:40 tweety kernel: [e046b2a8] afs_syscall_call+0xe6e/0x1b2a [libafs] Feb 14 18:32:40 tweety kernel: [c016728c] __link_path_walk+0xafd/0xbb5 Feb 14 18:32:40 tweety kernel: [c01673d8] link_path_walk+0x94/0xbe Feb 14 18:32:40 tweety kernel: [c01c32da] memmove+0xe/0x24 Feb 14 18:32:40 tweety kernel: [e046c244] afs_syscall+0x1c1/0x4c4 [libafs] Feb 14 18:32:40 tweety kernel: [c01663a7] permission+0x4a/0x4f Feb 14 18:32:40 tweety kernel: [e0465026] afs_ioctl+0x82/0x8c [libafs] Feb 14 18:32:40 tweety kernel: [c016ab45] file_ioctl+0x19d/0x1af Feb 14 18:32:40 tweety kernel: [c016ad5f] sys_ioctl+0x208/0x269 Feb 14 18:32:40 tweety kernel: [c02d48d7] syscall_call+0x7/0xb Feb 14 18:32:40 tweety kernel: Code: 38 49 e0 89 d8 ff 52 10 31 c0 5b c3 53 83 ec 20 89 c3 ff 05 7c 8c 49 e0 83 3d b0 38 49 e0 00 74 13 68 90 82 47 e0 e8 67 56 cf df 0f 0b 51 01 cc 82 47 e0 59 89 d8 e8 65 75 03 00 85 c0 0f 85 9b Feb 14 18:32:40 tweety kernel: 0Fatal exception: panic in 5 seconds afsdebug.out.bz2 Description: BZip2 compressed data
Re: [OpenAFS] Re: 1.4.2 client on RHEL5 beta 2
On Wed, Jan 31, 2007 at 11:43:51AM -0500, Jeffrey Hutzelman wrote: On Tuesday, January 23, 2007 02:14:55 PM -0500 Derrick J Brashear [EMAIL PROTECTED] wrote: On Tue, 23 Jan 2007, Rainer Laatsch wrote: I circumvented the MODPOST issue by patching /usr/src/kernels/2.6.18-1.2747.el5-i686/scripts/mod/modpost.c around line 1103 ; replacing 'fatal' by 'warn' We can't reasonably do that. The problem is the loose binding isn't loose enough for this check. No, but with the new AC_TRY_KBUILD test, we should be able to reliably determine at build time whether tasklist_lock is exported -- or at least, whether a weak reference will cause the build to fail. Jeff, is there a patch I can try the AC_TRY_KBUILD test with the 1.4.3 RCs? Pointer into CVS? I really need to work on some RHEL5-ish stuff for work. :-) Thanks! Jack -- Jack Neely [EMAIL PROTECTED] Campus Linux Services Project Lead Information Technology Division, NC State University GPG Fingerprint: 1917 5AC1 E828 9337 7AA4 EA6B 213B 765F 3B6A 5B89 ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: 1.4.2 client on RHEL5 beta 2
On Mon, Feb 12, 2007 at 01:46:04PM -0500, Derrick J Brashear wrote: It's being tracked at 53441 in RT, incidentally. Thanks! That helps. Jack On Mon, 12 Feb 2007, Derrick J Brashear wrote: On Mon, 12 Feb 2007, Jack Neely wrote: No, but with the new AC_TRY_KBUILD test, we should be able to reliably determine at build time whether tasklist_lock is exported -- or at least, whether a weak reference will cause the build to fail. Jeff, is there a patch I can try the AC_TRY_KBUILD test with the 1.4.3 RCs? Pointer into CVS? I really need to work on some RHEL5-ish stuff for work. :-) I'm not Jeff, but, there is such a test, and it doesn't work correctly yet. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info -- Jack Neely [EMAIL PROTECTED] Campus Linux Services Project Lead Information Technology Division, NC State University GPG Fingerprint: 1917 5AC1 E828 9337 7AA4 EA6B 213B 765F 3B6A 5B89 ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: 1.4.2 client on RHEL5 beta 2
Folks, I still see this bug with OpenAFS 1.4.3rc1 on RHEL5 Beta 2 (2.6.18-1.2747.el5) Derek, was there a patch that I can test out your suggestion? LD [M] /home/slack/RPM/BUILD/openafs-kmod-1.4.3/_kmod_build_/src/libafs/MODLOAD-2.6.18-1.2747.el5-MP/libafs.o Building modules, stage 2. MODPOST FATAL: modpost: GPL-incompatible module libafs.ko uses GPL-only symbol 'tasklist_lock' Jack Neely -- Jack Neely [EMAIL PROTECTED] Campus Linux Services Project Lead Information Technology Division, NC State University GPG Fingerprint: 1917 5AC1 E828 9337 7AA4 EA6B 213B 765F 3B6A 5B89 ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] New problem regarding ROCKS client
On Mon, Nov 06, 2006 at 02:49:40PM -0500, Paul Mitchell wrote: Having solved the openAFS problems on solaris10 (x86) platform, I'm now turning to a ROCKS client. Here's the client: uname -a Linux compute-0-1.local 2.6.9-42.0.2.ELsmp #1 SMP Wed Aug 23 00:17:26 CDT 2006 i686 i686 i386 GNU/Linux And I've a set of RPM's which I downloaded: ls openafs-1.4.0-rhel4.1.i386.rpm openafs-client-1.4.0-rhel4.1.i386.rpm openafs-compat-1.4.0-rhel4.1.i386.rpm openafs-devel-1.4.0-rhel4.1.i386.rpm openafs-docs-1.4.0-rhel4.1.i386.rpm openafs-kernel-1.4.0-2.6.9_22.EL_1.i686.rpm openafs-kernel-smp-1.4.0-2.6.9_22.ELsmp_1.i686.rpm openafs-kpasswd-1.4.0-rhel4.1.i386.rpm openafs-krb5-1.4.0-rhel4.1.i386.rpm I'msure this is a function of my newness to Linux, however, the following is totally perplexing: rpm -i openafs-1.4.0-rhel4.1.i386.rpm package openafs-1.4.0-rhel4.1 is already installed [EMAIL PROTECTED] RPMS]# rpm -q openafs-1.4.0-rhel4.1.i386.rpm package openafs-1.4.0-rhel4.1.i386.rpm is not installed Paul, You need to give rpm the package name to query off of. You want to do something like rpm -q openafs This command rpm -ql openafs will list out what files where installed by that package and where they are. Jack How can it be installed and not installed? I actually don't believe it is installed as I can find no evidence of the packages contents on the system. Any advice will be appreciated. Paul Mitchell == Paul Mitchell email: [EMAIL PROTECTED] phone: (919) 962-9778 office: I have an office, room 14, Phillips Hall == ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info -- Jack Neely [EMAIL PROTECTED] Campus Linux Services Project Lead Information Technology Division, NC State University GPG Fingerprint: 1917 5AC1 E828 9337 7AA4 EA6B 213B 765F 3B6A 5B89 ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] New FC4 Kernel == Can't Find System Call Table
On Mon, Mar 13, 2006 at 11:18:45AM -0500, Matthew Miller wrote: On Sun, Mar 05, 2006 at 09:10:04PM -0500, Derrick J Brashear wrote: If this is the bit where the syscall table has been moved into the .rodata section, the resolution is to lose. Maybe someday we will get to use kernel keyrings. We could actually do that today if we were willing to break backward compatibility with every userland tool that did pags that was compiled before today. For many sites this would simply be a different form of sadness. How hard would it be to make this an option, so we can pick our sadness? :) We've all complained and flamed about the sys_call_table hook sadness. I think its definitely time to look at using the kernel keyrings. Its a much better solution and will lead toward much happiness in the end. I use several third party tools with OpenAFS that this would break. I'm willing to work with maintainers and fix them to use the kernel keyrings. I've actually been trying to find some time to see what would be involved in porting OpenAFS. Pointers to what would be involved? Jack Neely -- Jack Neely [EMAIL PROTECTED] Campus Linux Services Project Lead PAMS Computer Operations at NC State University GPG Fingerprint: 1917 5AC1 E828 9337 7AA4 EA6B 213B 765F 3B6A 5B89 ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Slow Logins on RHEL 4
Folks, Thanks for the advice. Looks like there was some issues in the version of pam_krb5 I was using after all. I thought that I had eliminated that possibility but...alas... Thanks, Jack -- Jack Neely [EMAIL PROTECTED] Realm Linux Administration and Development PAMS Computer Operations at NC State University GPG Fingerprint: 1917 5AC1 E828 9337 7AA4 EA6B 213B 765F 3B6A 5B89 ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] Slow Logins on RHEL 4
Folks, I'm seeing slow logins on RHEL 4. My current configuration is OpenAFS 1.3.84 and kernel version 2.6.9-11.ELsmp. I also see the same problem on earlier RHEL 4 kernels, non-smp machines, and earlier releases of OpenAFS. If I shutdown AFS (login scripts create users a temporary home directory) then logins are quite quick. With AFS they take a good 5 to 15 seconds. I've checked my PAM configuration. No slow downs there. It auths me and opens my session pretty quick. Also don't have any problems getting tokens. I get the same slow performance with either memcache or diskcache. Console logins, GDM, SSH...all slow. Any suggestions for locating the cause of this? Thanks! Jack Neely -- Jack Neely [EMAIL PROTECTED] Realm Linux Administration and Development PAMS Computer Operations at NC State University GPG Fingerprint: 1917 5AC1 E828 9337 7AA4 EA6B 213B 765F 3B6A 5B89 ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] unable to stop openafs
Jeffrey, I can confirm this. My test machine is RHEL 4 Beta 2 running OpenAFS 1.3.78. After a fresh boot, cat'ing a web page, I can shutdown AFS cleanly. However, after I reboot again, then run fs la on that same web page AFS will not shut down. umount: /afs: device is busy afsd: Shutting down all afs processes and afs state AFS isn't unmounted yet! Call aborted afsd: AFS still mounted; Not shutting down The kernel module cannot be removed either. Jack Neely On Tue, Feb 08, 2005 at 12:31:10PM -0500, Jeffrey Hutzelman wrote: On Tuesday, February 08, 2005 11:13:20 AM +0100 Vladimir Nadvornik [EMAIL PROTECTED] wrote: On Sunday 06 February 2005 19:07, Guillaume Rousse wrote: I'm unable to umount an afs partition: umount /mnt/afs fails because peripheral is still occupied, however I can't see any process using any file located under /mnt/afs using lsof. I have the same problem with 1.3.78 and vanilla linux kernel 2.6.10. It seems to happen only if afs syscall is used, for example fs listacl. Hm. This is a problem we haven't been able to track down so far. Can you confirm that if you start AFS then read a file, you can unmount /afs, but if you do something like 'fs la' on the same file, you stop being able to unmount it? If that's the case, we may be well on the road to finding the problem. -- Jeffrey T. Hutzelman (N3NHS) [EMAIL PROTECTED] Sr. Research Systems Programmer School of Computer Science - Research Computing Facility Carnegie Mellon University - Pittsburgh, PA ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info -- Jack Neely [EMAIL PROTECTED] Realm Linux Administration and Development PAMS Computer Operations at NC State University GPG Fingerprint: 1917 5AC1 E828 9337 7AA4 EA6B 213B 765F 3B6A 5B89 ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] 1.3.77 Brokenness
Derrick, That patch works. OpenAFS seems to work fairly well in memcache mode. Not seeing file corruption of missing files. The attached version of the patch (mainly for the list) applies to 1.3.77. So, I'm using this + Matthew's patches for a working 1.3.77. Jack -- Jack Neely [EMAIL PROTECTED] Realm Linux Administration and Development PAMS Computer Operations at NC State University GPG Fingerprint: 1917 5AC1 E828 9337 7AA4 EA6B 213B 765F 3B6A 5B89 --- src/afsd/afsd.c.old 2005-01-14 13:27:10.557882648 -0500 +++ src/afsd/afsd.c 2005-01-14 13:28:06.141432664 -0500 @@ -1932,7 +1932,11 @@ if (afsd_debug) printf(%s: Calling AFSOP_VOLUMEINFO: volume info file is '%s'\n, rn, fullpn_VolInfoFile); -call_syscall(AFSOP_VOLUMEINFO, fullpn_VolInfoFile); + +/* once again, meaningless for a memory-based cache. */ +if (!(cacheFlags AFSCALL_INIT_MEMCACHE)) + call_syscall(AFSOP_VOLUMEINFO, fullpn_VolInfoFile); + /* * Pass the kernel the name of the afs logging file holding the volume
Re: [OpenAFS] 1.3.77 Brokenness
I was starting to wonder what that other patch actually did. :-) I also did s/printk/printf/ in your patch. Attached is dmesg output. Jack Neely On Wed, Jan 12, 2005 at 04:45:32PM -0500, Derrick J Brashear wrote: correct diff. and no, once you get an oops, you can't start afs again. ? .gdb_history ? diff ? ppc_darwin_70 ? src/libafs/afs.ppc_darwin_70.plist ? src/packaging/MacOS/OpenAFS.pkg ? src/packaging/MacOS/OpenAFS.pkg.tar.gz ? src/venus/kdump.c. Index: src/afs/afs_memcache.c === RCS file: /cvs/openafs/src/afs/afs_memcache.c,v retrieving revision 1.16 diff -u -r1.16 afs_memcache.c --- src/afs/afs_memcache.c1 Dec 2004 23:38:56 - 1.16 +++ src/afs/afs_memcache.c11 Jan 2005 22:13:19 - @@ -45,6 +45,7 @@ memCacheBlkSize = blkSize; memMaxBlkNumber = blkCount; +printk(memMaxBlkNumber %d\n, memMaxBlkNumber); memCache = (struct memCacheEntry *) afs_osi_Alloc(memMaxBlkNumber * sizeof(struct memCacheEntry)); if (flags AFSCALL_INIT_MEMCACHE_SLEEP) { @@ -89,7 +90,7 @@ return 0; } -#if defined(AFS_SUN57_64BIT_ENV) || defined(AFS_SGI62_ENV) +#if defined(AFS_SUN57_64BIT_ENV) || defined(AFS_SGI62_ENV) || defined(AFS_LINUX26_ENV) void * afs_MemCacheOpen(ino_t blkno) #else @@ -100,6 +101,7 @@ struct memCacheEntry *mep; if (blkno 0 || blkno memMaxBlkNumber) { + printk(blkno %d\n, blkno); osi_Panic(afs_MemCacheOpen: invalid block #); } mep = (memCache + blkno); Index: src/afs/afs_prototypes.h === RCS file: /cvs/openafs/src/afs/afs_prototypes.h,v retrieving revision 1.57 diff -u -r1.57 afs_prototypes.h --- src/afs/afs_prototypes.h 1 Dec 2004 23:38:56 - 1.57 +++ src/afs/afs_prototypes.h 11 Jan 2005 22:13:19 - @@ -427,7 +427,7 @@ /* afs_memcache.c */ extern int afs_InitMemCache(int blkCount, int blkSize, int flags); extern int afs_MemCacheClose(struct osi_file *file); -#if defined(AFS_SUN57_64BIT_ENV) || defined(AFS_SGI62_ENV) +#if defined(AFS_SUN57_64BIT_ENV) || defined(AFS_SGI62_ENV) || defined(AFS_LINUX26_ENV) extern void *afs_MemCacheOpen(ino_t blkno); #else extern void *afs_MemCacheOpen(afs_int32 blkno); @@ -590,7 +590,7 @@ /* ARCH/osi_file.c */ extern int afs_osicred_initialized; -#if defined(AFS_SUN57_64BIT_ENV) || defined(AFS_SGI62_ENV) +#if defined(AFS_SUN57_64BIT_ENV) || defined(AFS_SGI62_ENV) || defined(AFS_LINUX26_ENV) extern void *osi_UFSOpen(ino_t ainode); #else extern void *osi_UFSOpen(afs_int32 ainode); Index: src/afs/LINUX/osi_file.c === RCS file: /cvs/openafs/src/afs/LINUX/osi_file.c,v retrieving revision 1.22 diff -u -r1.22 osi_file.c --- src/afs/LINUX/osi_file.c 8 Dec 2004 17:21:04 - 1.22 +++ src/afs/LINUX/osi_file.c 11 Jan 2005 22:13:19 - @@ -27,7 +27,11 @@ extern struct super_block *afs_cacheSBp; void * +#ifdef AFS_LINUX26_ENV +osi_UFSOpen(ino_t ainode) +#else osi_UFSOpen(afs_int32 ainode) +#endif { register struct osi_file *afile = NULL; extern int cacheDiskType; -- Jack Neely [EMAIL PROTECTED] Realm Linux Administration and Development PAMS Computer Operations at NC State University GPG Fingerprint: 1917 5AC1 E828 9337 7AA4 EA6B 213B 765F 3B6A 5B89 ng delay loop... 3923.96 BogoMIPS (lpj=1961984) Security Scaffold v1.0.0 initialized SELinux: Initializing. SELinux: Starting in permissive mode There is already a security framework initialized, register_security failed. selinux_register_security: Registering secondary module capability Capability LSM initialized as secondary Mount-cache hash table entries: 512 (order: 0, 4096 bytes) CPU: After generic identify, caps: 078bfbff e1d3fbff CPU: After vendor identify, caps: 078bfbff e1d3fbff CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) CPU: L2 Cache: 1024K (64 bytes/line) CPU: After all inits, caps:078bf3ff e1d3fbff 0010 Intel machine check architecture supported. Intel machine check reporting enabled on CPU#0. CPU: AMD Opteron(tm) Processor 146 stepping 08 Enabling fast FPU save and restore... done. Enabling unmasked SIMD FPU exception support... done. Checking 'hlt' instruction... OK. ACPI: IRQ9 SCI: Edge set to Level Trigger. checking if image is initramfs... it is Freeing initrd memory: 476k freed NET: Registered protocol family 16 PCI: PCI BIOS revision 2.10 entry at 0xf0031, last bus=1 PCI: Using configuration type 1 mtrr: v2.0 (20020519) ACPI: Subsystem revision 20040816 ACPI: Interpreter enabled ACPI: Using PIC for interrupt routing ACPI: PCI Root Bridge [PCI0] (00:00) PCI: Probing PCI hardware (bus 00) ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT] ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 7 10 *11 14 15) ACPI: PCI
Re: [OpenAFS] 1.3.77 Brokenness
= aslot; +tdc-bucket = 0; afs_indexUnique[aslot] = tdc-f.fid.Fid.Unique; if (existing) { @@ -2840,6 +2939,17 @@ #endif lasterrtime = osi_Time(); afs_indexUnique[aslot] = tdc-f.fid.Fid.Unique; + tdc-bucket = 0; +} else { + /* this should be functionalized better. GetVCache on fid? */ + struct volume *tv = afs_FindVolume(tdc-f.fid, READ_LOCK); + if (tv-states VRO) { + tdc-bucket = 2; + } else if (tv-states VBackup) { + tdc-bucket = 1; + } else { + tdc-bucket = 1; + } } tdc-refCount = 1; tdc-index = aslot; @@ -3235,6 +3345,7 @@ afs_dcentries = aDentries; afs_blocksUsed = 0; +afs_DCSizeInit(); QInit(afs_DLRU); } Index: src/afs/afs_vcache.c === RCS file: /cvs/openafs/src/afs/afs_vcache.c,v retrieving revision 1.69 diff -u -r1.69 afs_vcache.c --- src/afs/afs_vcache.c 13 Oct 2004 00:36:59 - 1.69 +++ src/afs/afs_vcache.c 18 Oct 2004 10:24:55 - @@ -1096,7 +1096,7 @@ tvc-vmh = tvc-segid = NULL; tvc-credp = NULL; #endif -#if defined(AFS_SUN_ENV) || defined(AFS_ALPHA_ENV) || defined(AFS_SUN5_ENV) +#if defined(AFS_DARWIN_ENV) || defined(AFS_ALPHA_ENV) || defined(AFS_SUN5_ENV) #if defined(AFS_SUN5_ENV) rw_init(tvc-rwlock, vcache rwlock, RW_DEFAULT, NULL); Index: src/lwp/lwp.h === RCS file: /cvs/openafs/src/lwp/lwp.h,v retrieving revision 1.14 diff -u -r1.14 lwp.h --- src/lwp/lwp.h 15 Jul 2003 23:15:45 - 1.14 +++ src/lwp/lwp.h 18 Oct 2004 10:25:00 - @@ -299,7 +299,7 @@ #if defined(USE_UCONTEXT) defined(HAVE_UCONTEXT_H) #define AFS_LWP_MINSTACKSIZE (288 * 1024) #else -#if defined(AFS_LINUX22_ENV) +#if defined(AFS_LINUX22_ENV) || defined(AFS_SUN5_ENV) #define AFS_LWP_MINSTACKSIZE (192 * 1024) #else #define AFS_LWP_MINSTACKSIZE (48 * 1024) Index: src/rx/rx.c === RCS file: /cvs/openafs/src/rx/rx.c,v retrieving revision 1.65 diff -u -r1.65 rx.c --- src/rx/rx.c 15 Oct 2004 06:01:35 - 1.65 +++ src/rx/rx.c 18 Oct 2004 10:25:10 - @@ -5742,7 +5742,7 @@ rx_interface_stat_p rpc_stat, nrpc_stat; size_t space; MUTEX_EXIT(peer-peer_lock); - MUTEX_DESTROY(peer-peer_lock); + /*MUTEX_DESTROY(peer-peer_lock);*/ for (queue_Scan (peer-rpcStats, rpc_stat, nrpc_stat, rx_interface_stat)) { -- Jack Neely [EMAIL PROTECTED] Realm Linux Administration and Development PAMS Computer Operations at NC State University GPG Fingerprint: 1917 5AC1 E828 9337 7AA4 EA6B 213B 765F 3B6A 5B89 g ACPI for IRQ routing ACPI: PCI Interrupt Link [LNKA] enabled at IRQ 11 ACPI: PCI interrupt :00:07.0[A] - GSI 11 (level, low) - IRQ 11 ACPI: PCI Interrupt Link [LNKC] enabled at IRQ 5 ACPI: PCI interrupt :00:08.0[A] - GSI 5 (level, low) - IRQ 5 ACPI: PCI Interrupt Link [LNKB] enabled at IRQ 10 ACPI: PCI interrupt :00:0a.0[A] - GSI 10 (level, low) - IRQ 10 ACPI: PCI interrupt :00:0b.0[A] - GSI 11 (level, low) - IRQ 11 ACPI: PCI interrupt :00:0f.0[B] - GSI 10 (level, low) - IRQ 10 ACPI: PCI interrupt :00:0f.1[A] - GSI 11 (level, low) - IRQ 11 ACPI: PCI interrupt :00:10.0[A] - GSI 11 (level, low) - IRQ 11 ACPI: PCI interrupt :00:10.1[A] - GSI 11 (level, low) - IRQ 11 ACPI: PCI interrupt :00:10.2[B] - GSI 10 (level, low) - IRQ 10 ACPI: PCI interrupt :00:10.3[B] - GSI 10 (level, low) - IRQ 10 ACPI: PCI interrupt :00:10.4[C] - GSI 5 (level, low) - IRQ 5 ACPI: PCI interrupt :00:11.5[C] - GSI 5 (level, low) - IRQ 5 ACPI: PCI interrupt :01:00.0[A] - GSI 11 (level, low) - IRQ 11 apm: BIOS version 1.2 Flags 0x03 (Driver version 1.16ac) apm: overridden by ACPI. audit: initializing netlink socket (disabled) audit(1105546100.521:0): initialized highmem bounce pool size: 64 pages Total HugeTLB memory allocated, 0 VFS: Disk quotas dquot_6.5.1 Dquot-cache hash table entries: 1024 (order 0, 4096 bytes) SELinux: Registering netfilter hooks Initializing Cryptographic API ksign: Installing public key data Loading keyring - Added public key E07BC3E85BE30CFD - User ID: Red Hat, Inc. (Kernel Module GPG key) pci_hotplug: PCI Hot Plug PCI Core version: 0.5 vesafb: probe of vesafb0 failed with error -6 ACPI: Processor [CPU1] (supports C1) Real Time Clock Driver v1.12 Linux agpgart interface v0.100 (c) Dave Jones agpgart: Detected AGP bridge 0 agpgart: Maximum main memory to use for agp memory: 941M agpgart: AGP aperture is 64M @ 0xf800 serio: i8042 AUX port at 0x60,0x64 irq 12 serio: i8042 KBD port at 0x60,0x64 irq 1 Serial: 8250/16550 driver $Revision: 1.90 $ 8 ports, IRQ sharing enabled ttyS0 at I/O
Re: [OpenAFS] 1.3.77 Brokenness
On Tue, Jan 11, 2005 at 01:05:16AM -0500, Matthew Miller wrote: On Mon, Jan 10, 2005 at 06:43:09PM -0500, Jack Neely wrote: I keep RHEL/FC install trees in AFS which are served out via HTTP. I build the tree on a test machine and use rsync to move it out to AFS land. In this case my server is a RHEL3 linux box running OpenAFS 1.2.11. After moving the tree with rsync (a little over 2G worth) I see that the cache gets out of sync on large files (20MB, 70MB, etc.) Rsync complains it cannot delete its dot-files (the file it creates before moving it to the real filename). These . files show in an 'ls' and when I try to rm them I get No file or directory yet they still appear in AFS. Many of my RPM pacakges are corrupt as well. This is *exactly* the problem I've seen. It's reported at http://rt.central.org/rt/Ticket/Display.html?id=16965. Indeed. This is going to be a showstopper for us. Jack -- Matthew Miller [EMAIL PROTECTED]http://www.mattdm.org/ Boston University Linux --http://linux.bu.edu/ -- Jack Neely [EMAIL PROTECTED] Realm Linux Administration and Development PAMS Computer Operations at NC State University GPG Fingerprint: 1917 5AC1 E828 9337 7AA4 EA6B 213B 765F 3B6A 5B89 ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] 1.3.77 Brokenness
Any in reply to my own post... Has anyone tried opening up FC3/RHEL4 beta's natilus window on a directory in AFS, such as a home directory? Really, really, slow. The window has been scaning my home directory for 15 minutes now and is still not complete. Load has gone to 6. Lots of small files and some larger ones...only about 145MB worth... Jack Neely -- Jack Neely [EMAIL PROTECTED] Realm Linux Administration and Development PAMS Computer Operations at NC State University GPG Fingerprint: 1917 5AC1 E828 9337 7AA4 EA6B 213B 765F 3B6A 5B89 ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] 1.3.77 Brokenness
And this is what happens when I try to use memcache... Jack -- Jack Neely [EMAIL PROTECTED] Realm Linux Administration and Development PAMS Computer Operations at NC State University GPG Fingerprint: 1917 5AC1 E828 9337 7AA4 EA6B 213B 765F 3B6A 5B89 15) ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 7 *10 11 14 15) ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 *5 7 10 11 14 15) ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 5 7 10 11 14 15) *0, disabled. ACPI: PCI Interrupt Link [LNKE] (IRQs 3 4 5 7 10 11 14 15) *0, disabled. ACPI: PCI Interrupt Link [LNKF] (IRQs 3 4 5 7 10 11 14 15) *0, disabled. ACPI: PCI Interrupt Link [LNKG] (IRQs 3 4 5 7 10 11 14 15) *0, disabled. ACPI: PCI Interrupt Link [LNKH] (IRQs 3 4 5 7 10 11 14 15) *0, disabled. Linux Plug and Play Support v0.97 (c) Adam Belay usbcore: registered new driver usbfs usbcore: registered new driver hub PCI: Using ACPI for IRQ routing ACPI: PCI Interrupt Link [LNKA] enabled at IRQ 11 ACPI: PCI interrupt :00:07.0[A] - GSI 11 (level, low) - IRQ 11 ACPI: PCI Interrupt Link [LNKC] enabled at IRQ 5 ACPI: PCI interrupt :00:08.0[A] - GSI 5 (level, low) - IRQ 5 ACPI: PCI Interrupt Link [LNKB] enabled at IRQ 10 ACPI: PCI interrupt :00:0a.0[A] - GSI 10 (level, low) - IRQ 10 ACPI: PCI interrupt :00:0b.0[A] - GSI 11 (level, low) - IRQ 11 ACPI: PCI interrupt :00:0f.0[B] - GSI 10 (level, low) - IRQ 10 ACPI: PCI interrupt :00:0f.1[A] - GSI 11 (level, low) - IRQ 11 ACPI: PCI interrupt :00:10.0[A] - GSI 11 (level, low) - IRQ 11 ACPI: PCI interrupt :00:10.1[A] - GSI 11 (level, low) - IRQ 11 ACPI: PCI interrupt :00:10.2[B] - GSI 10 (level, low) - IRQ 10 ACPI: PCI interrupt :00:10.3[B] - GSI 10 (level, low) - IRQ 10 ACPI: PCI interrupt :00:10.4[C] - GSI 5 (level, low) - IRQ 5 ACPI: PCI interrupt :00:11.5[C] - GSI 5 (level, low) - IRQ 5 ACPI: PCI interrupt :01:00.0[A] - GSI 11 (level, low) - IRQ 11 apm: BIOS version 1.2 Flags 0x03 (Driver version 1.16ac) apm: overridden by ACPI. audit: initializing netlink socket (disabled) audit(1105455518.017:0): initialized highmem bounce pool size: 64 pages Total HugeTLB memory allocated, 0 VFS: Disk quotas dquot_6.5.1 Dquot-cache hash table entries: 1024 (order 0, 4096 bytes) SELinux: Registering netfilter hooks Initializing Cryptographic API ksign: Installing public key data Loading keyring - Added public key E07BC3E85BE30CFD - User ID: Red Hat, Inc. (Kernel Module GPG key) pci_hotplug: PCI Hot Plug PCI Core version: 0.5 vesafb: probe of vesafb0 failed with error -6 ACPI: Processor [CPU1] (supports C1) Real Time Clock Driver v1.12 Linux agpgart interface v0.100 (c) Dave Jones agpgart: Detected AGP bridge 0 agpgart: Maximum main memory to use for agp memory: 941M agpgart: AGP aperture is 64M @ 0xf800 serio: i8042 AUX port at 0x60,0x64 irq 12 serio: i8042 KBD port at 0x60,0x64 irq 1 Serial: 8250/16550 driver $Revision: 1.90 $ 8 ports, IRQ sharing enabled ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A RAMDISK driver initialized: 16 RAM disks of 16384K size 1024 blocksize divert: not allocating divert_blk for non-ethernet device lo Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2 ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx VP_IDE: IDE controller at PCI slot :00:0f.1 ACPI: PCI interrupt :00:0f.1[A] - GSI 11 (level, low) - IRQ 11 VP_IDE: chipset revision 6 VP_IDE: not 100% native mode: will probe irqs later VP_IDE: VIA vt8237 (rev 00) IDE UDMA133 controller on pci:00:0f.1 ide0: BM-DMA at 0xfc00-0xfc07, BIOS settings: hda:DMA, hdb:pio ide1: BM-DMA at 0xfc08-0xfc0f, BIOS settings: hdc:DMA, hdd:pio Probing IDE interface ide0... hda: WDC WD1200JB-00FUA0, ATA DISK drive Using cfq io scheduler ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 Probing IDE interface ide1... hdc: SONY CD-RW CRX320E, ATAPI CD/DVD-ROM drive ide1 at 0x170-0x177,0x376 on irq 15 Probing IDE interface ide2... ide2: Wait for ready failed before probe ! Probing IDE interface ide3... ide3: Wait for ready failed before probe ! Probing IDE interface ide4... ide4: Wait for ready failed before probe ! Probing IDE interface ide5... ide5: Wait for ready failed before probe ! hda: max request size: 1024KiB hda: 234441648 sectors (120034 MB) w/8192KiB Cache, CHS=16383/255/63, UDMA(100) hda: cache flushes supported hda: hda1 hda2 hda3 hda4 hda5 hda6 hda7 hdc: ATAPI 52X DVD-ROM CD-R/RW drive, 2048kB Cache, UDMA(33) Uniform CD-ROM driver Revision: 3.20 ide-floppy driver 0.99.newide usbcore: registered new driver hiddev usbcore: registered new driver usbhid drivers/usb/input/hid-core.c: v2.0:USB HID core driver mice: PS/2 mouse device common for all mice input: AT Translated Set 2 keyboard on isa0060/serio0 input: ImPS/2 Generic Wheel Mouse on isa0060/serio1 md: md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27 NET: Registered protocol family 2 IP: routing cache hash table of 2048 buckets, 64Kbytes TCP: Hash tables configured (established 262144 bind
Re: [OpenAFS] 1.3.77 Brokenness
Yes. -- Jack Neely [EMAIL PROTECTED] Realm Linux Administration and Development PAMS Computer Operations at NC State University GPG Fingerprint: 1917 5AC1 E828 9337 7AA4 EA6B 213B 765F 3B6A 5B89 ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] 1.3.77 Brokenness
Folks, I've been doing some testing with OpenAFS 1.3.77 on the RHEL4 betas. Just been testing the client against Transarc/OpenAFS 1.2.x servers. Mostly, things work pretty well. However, I'm having problems with large files and have a pretty easy test case. I keep RHEL/FC install trees in AFS which are served out via HTTP. I build the tree on a test machine and use rsync to move it out to AFS land. In this case my server is a RHEL3 linux box running OpenAFS 1.2.11. After moving the tree with rsync (a little over 2G worth) I see that the cache gets out of sync on large files (20MB, 70MB, etc.) Rsync complains it cannot delete its dot-files (the file it creates before moving it to the real filename). These . files show in an 'ls' and when I try to rm them I get No file or directory yet they still appear in AFS. Many of my RPM pacakges are corrupt as well. When I do the same operation with the 2.4 kernel runnign OpenAFS 1.2.11 things work as expected. That should be easy enough to reproduce...if you need any more information just let me know. Jack Neely -- Jack Neely [EMAIL PROTECTED] Realm Linux Administration and Development PAMS Computer Operations at NC State University GPG Fingerprint: 1917 5AC1 E828 9337 7AA4 EA6B 213B 765F 3B6A 5B89 ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] Unknown symbol force_sig on RHEL 4 beta
Folks, I'm working on building OpenAFS RPMs for the Red Hat Enterprise Linux 4 betas. I'm using the most recent kernel (which I will be glad to provide as I'm not sure you can find it outside of RHN) which is kernel-smp-2.6.9-1.675_EL Build works fine using the patches from Matthew, but on modprobe I get libafs: Unknown symbol force_sig Which appears to be in the above kernel. Any clues? Matthew: RHEL4 looks as if it will have kernel-devel and kernel-smp-devel packages. They are quite handy. :-) Jack Neely -- Jack Neely [EMAIL PROTECTED] Realm Linux Administration and Development PAMS Computer Operations at NC State University GPG Fingerprint: 1917 5AC1 E828 9337 7AA4 EA6B 213B 765F 3B6A 5B89 ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] Unknown symbol force_sig on RHEL 4 beta
Folks, I'm working on building OpenAFS RPMs for the Red Hat Enterprise Linux 4 betas. I'm using the most recent kernel (which I will be glad to provide as I'm not sure you can find it outside of RHN) which is kernel-smp-2.6.9-1.675_EL Build works fine using the patches from Matthew, but on modprobe I get libafs: Unknown symbol force_sig Which appears to be in the above kernel. Any clues? Matthew: BTW, RHEL4 looks as if it will have kernel-devel and kernel-smp-devel packages. They are quite handy. :-) Jack Neely -- Jack Neely [EMAIL PROTECTED] Realm Linux Administration and Development PAMS Computer Operations at NC State University GPG Fingerprint: 1917 5AC1 E828 9337 7AA4 EA6B 213B 765F 3B6A 5B89 ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Unknown symbol force_sig on RHEL 4 beta
Folks, Sorry for any multiple posts... Also, acording to the Red Hat folks this was a bug in that particular kernel that's been fixed. Thanks! Jack Neely -- Jack Neely [EMAIL PROTECTED] Realm Linux Administration and Development PAMS Computer Operations at NC State University GPG Fingerprint: 1917 5AC1 E828 9337 7AA4 EA6B 213B 765F 3B6A 5B89 ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] SQLite?
On Tue, Nov 30, 2004 at 01:40:23PM -0500, John S. Bucy wrote: Does anyone here have any experience putting SQLite databases in AFS with concurrent access from multiple clients? john ___ OpenAFS-info mailing list [EMAIL PROTECTED] https://lists.openafs.org/mailman/listinfo/openafs-info I believe SQLite (which I think is great, BTW) uses POSIX advisory locking. The documentation suggests that you should avoid network file systems. I actually haven't tried this, but I wouldn't trust SQLite dbs to work in AFS. http://www.sqlite.org/lockingv3.html Jack -- Jack Neely [EMAIL PROTECTED] Realm Linux Administration and Development PAMS Computer Operations at NC State University GPG Fingerprint: 1917 5AC1 E828 9337 7AA4 EA6B 213B 765F 3B6A 5B89 ___ OpenAFS-info mailing list [EMAIL PROTECTED] https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Status on 2.6 Kernel?
On Tue, Oct 12, 2004 at 07:55:15AM -0700, e r0ck wrote: no change. but why would there be? the kernel headers in there are the same as in /usr/src/linux. The 2.6 kernel has a completely different build system from the 2.4 kernel. All the build magic that you need to build third party kernel modules for your running kernel are to be found in /lib/modules/`uname -r`/build /usr/src/linux is the directory that contains the actual source code for your kernel. Very different. However, if your running kernel was compiled in /usr/src/linux then all the build stuff will have been created there as well. i'm gonna try to get arla to work. Arla is much more stable than my patches. Although, I do not understand why you would be getting no space left on device with my patches and normal behavior without. I'd have to poke a bit more at it. Jack Neely -- Jack Neely [EMAIL PROTECTED] Realm Linux Administration and Development PAMS Computer Operations at NC State University GPG Fingerprint: 1917 5AC1 E828 9337 7AA4 EA6B 213B 765F 3B6A 5B89 ___ OpenAFS-info mailing list [EMAIL PROTECTED] https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Status on 2.6 Kernel?
On Mon, Oct 11, 2004 at 09:46:57AM -0700, e r0ck wrote: it supposedly works with 1.3.71. but i have not been able to get the kernel modules going. see my note from last week, which no one answered. someone at ncsu supposedly has PAGs going (see link below). i'm actually only in need of the client at this time. is there a way to get that going without PAGs? here is the text of my note from last week.: hi all, i'm trying to get just the client going on linux 2.6.8. kernel is running fine. (new install) i patched the openafs source with the hooks for PAGs from here: http://www.linux.ncsu.edu/projects/openafs-rpms/ had to hand apply one of them... built the openafs source by hand via: ../configure --with-afs-sysname=i386_linux26 --with-linux-kernel-headers=/usr/src/linux --enable-transarc-paths --enable-kernel-module You want --with-linux-kernel-headers=/lib/modules/`uname -r`/build Build with that instead and see if it doesn't work better. Jack make make install /usr/vice/etc/modload is empty so i tried copying: cp libafs/MODLOAD-2.6.8.1afs-SP/libafs-2.6.8.1afs.ko /usr/vice/etc/modload/libafs-2.6.8.1afs.o and cp libafs/MODLOAD-2.6.8.1afs-SP/libafs.ko /lib/modules/2.6.8.1afs/kernel/fs/ but when i try to load the module i get module not found. even if i specify the path directly. /usr/vice/etc/ looks like this: cacheinfo CellServDB.rpmsave modload ThisCell.rpmsave cacheinfo.rpmsave libafs-2.6.8.1afs.ko rc_afs CellServDB libafs-2.6.8.1afs.mp.ko ThisCell in src/libafs/ i have MODLOAD blah blah for SP, and MP. should there be a UP in there? this is a uniprocessor i686 P3 coppermine. any ideas? TIA -Original Message-Previous posts on this list have said it works other than PAGs (which, for desktop use, is likely fine), but I don't know what version of the code this was talking about. I'd also be interested in knowing. On Thu, Sep 30, 2004 at 08:58:05PM -0700, Dark Avenger wrote: Just curious...do we know approximately when OpenAFS (stable) will be released for the 2.6 kernel? I'm considering moving to FC2 sometime soon, but can't unless I have stable AFS capability. danno -- dan pritts - systems administrator - internet2 734/352-4953 office 734/834-7224 mobile ___ OpenAFS-info mailing list [EMAIL PROTECTED] https://lists.openafs.org/mailman/listinfo/openafs-info . -- Jack Neely [EMAIL PROTECTED] Realm Linux Administration and Development PAMS Computer Operations at NC State University GPG Fingerprint: 1917 5AC1 E828 9337 7AA4 EA6B 213B 765F 3B6A 5B89 ___ OpenAFS-info mailing list [EMAIL PROTECTED] https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] OpenAFS on Fedora Core 1 [working RPMs]
Its difficult to do that. My RPMs are not a fork or the RPMS form the OpenAFS folks. That have been built up from scatch over the years. You folks are welcome to take a look. I'll attach the spec file. Jack Neely -- Jack Neely [EMAIL PROTECTED] Realm Linux Administration and Development PAMS Computer Operations at NC State University GPG Fingerprint: 1917 5AC1 E828 9337 7AA4 EA6B 213B 765F 3B6A 5B89 %define thiscell unity.ncsu.edu %{!?ksource_dir: %define ksource_dir /lib/modules/%(uname -r)/build} %define uts_release %( gcc -E -D__BOOT_KERNEL_H_ -dM %{ksource_dir}/include/linux/version.h | grep UTS | sed 's/^#define UTS_RELEASE //;s///g' ) %ifarch i386 i586 i686 athlon %define sysname i386_linux24 %else %ifarch alpha %define sysname alpha_linux24 %endif %else %ifarch ia64 %define sysname ia64_linux24 %endif %else %ifarch ppc ppc64 %define sysname ppc_linux24 %endif %ifarch s390 %define sysname s390_linux24 %endif %endif %define basearchs i386 alpha ia64 ppc s390 %define build_enterprise 0 Summary: OpenAFS Enterprise Network File System Name: openafs Version: 1.2.10 Release: 2 License: IBM Public License Group: System Environment/Daemons URL: http://oss.software.ibm.com/developerworks/opensource/afs/downloads.html Source0: http://www.openafs.org/dl/openafs/%{version}/%{name}-%{version}-src.tar.bz2 Source1: /afs/transarc.com/service/CellServDB Source2: cacheinfo Source3: openafs.init Source4: afs.conf BuildRoot: %{_tmppath}/%{name}-root BuildRequires: kernel-source = %{uts_release}, kernel = %{uts_release}, BuildRequires: pam-devel %description The AFS distributed filesystem. AFS is a distributed filesystem allowing cross-platform sharing of files among multiple computers. Facilities are provided for access control, authentication, backup and administrative management. This package provides common files shared across all the various OpenAFS packages but are not necessarily tied to a client or server. %package client Summary: OpenAFS Filesystem client Group: System Environment/Daemons Prereq: bash, fileutils, chkconfig Requires: openafs, openafs-kernel = %{PACKAGE_VERSION} Obsoletes: afs-client Conflicts: arla %description client The AFS distributed filesystem. AFS is a distributed filesystem allowing cross-platform sharing of files among multiple computers. Facilities are provided for access control, authentication, backup and administrative management. This package provides basic client support to mount and manipulate AFS. %package devel Summary: OpenAFS development header files and static libraries Group: Development/Libraries Obsoletes: afs-devel Conflicts: arla-devel Requires: openafs Prereq: /sbin/ldconfig %description devel The AFS distributed filesystem. AFS is a distributed filesystem allowing cross-platform sharing of files among multiple computers. Facilities are provided for access control, authentication, backup and administrative management. This package provides static development libraries and headers needed to compile AFS applications. Note: AFS currently does not provide shared libraries. %package server Summary: OpenAFS Filesystem Server Group: System Environment/Daemons Requires: openafs-client = %{PACKAGE_VERSION}, openafs = %{PACKAGE_VERSION} Prereq: openafs-client Obsoletes: afs-server Conflicts: milko %description server The AFS distributed filesystem. AFS is a distributed filesystem allowing cross-platform sharing of files among multiple computers. Facilities are provided for access control, authentication, backup and administrative management. This package provides basic server support to host files in an AFS Cell. %package kernel Summary: OpenAFS Filesystem Kernel Modules Group: System Environment/Daemons Requires: openafs = %{PACKAGE_VERSION}, kernel = %{uts_release} Obsoletes: afs-modules afs-module Conflicts: arla %description kernel The AFS distributed filesystem. AFS is a distributed filesystem allowing cross-platform sharing of files among multiple computers. Facilities are provided for access control, authentication, backup and administrative management. This package provides the kernel modules for use with OpenAFS. The package was built for kernel versions %{uts_release} and %{uts_release}smp. %package kernel-source Summary: OpenAFS Filesystem Kernel Modules Source Group: Development/System Requires: openafs = %{PACKAGE_VERSION}, kernel = %{uts_release} %description kernel-source The AFS distributed filesystem. AFS is a distributed filesystem allowing cross-platform sharing of files among multiple computers. Facilities are provided for access control, authentication, backup and administrative management. This package provides source to build OpenAFS kernel modules for a different kernel. Largely useful if you build custom kernels from scratch; with rpm-based kernels, you are likely better off just rebuilding the source RPM with the command rpm --rebuild --target=arch openafs-%{PACKAGE_VERSION}.src.rpm where arch
Re: [OpenAFS] sys_call_table for RH8.0 kernel
On Wed, Oct 16, 2002 at 05:14:14PM +0530, Omkar Sathe wrote: Hi - Can someone please point to a tested patch for exporting sys_call_table for RH8.0 kernel ? regards omkar sathe IBM India, Pune Lab. ___ OpenAFS-info mailing list [EMAIL PROTECTED] https://lists.openafs.org/mailman/listinfo/openafs-info Edit the RHL kernel spec file to include this patch...say somewhere around patch # 10050. It needs to be toward the end. Jack -- Jack Neely [EMAIL PROTECTED] Linux Realm Kit Administration and Development PAMS Computer Operations at NC State University GPG Fingerprint: 1917 5AC1 E828 9337 7AA4 EA6B 213B 765F 3B6A 5B89 diff -ru kernel-2.4.18-old/linux/kernel/ksyms.c kernel-2.4.18/linux/kernel/ksyms.c --- kernel-2.4.18-old/linux/kernel/ksyms.c 2002-10-08 15:17:21.0 -0400 +++ kernel-2.4.18/linux/kernel/ksyms.c 2002-10-08 15:25:06.0 -0400 @@ -522,6 +522,9 @@ EXPORT_SYMBOL(simple_strtoull); EXPORT_SYMBOL(system_utsname); /* UTS data */ EXPORT_SYMBOL(uts_sem);/* UTS semaphore */ +#ifndef __mips__ +EXPORT_SYMBOL(sys_call_table); +#endif EXPORT_SYMBOL(machine_restart); EXPORT_SYMBOL(machine_halt); EXPORT_SYMBOL(machine_power_off);
Re: [OpenAFS] Viewing files through Gnome
On Sun, Jul 14, 2002 at 07:50:20PM -0500, Jack Britton wrote: I have afs up and running. I can access /afs/cellname/home/username after logging in with klog. I have set my user up with all permissions except for administer. When I traverse the file structure all is well until I get into my user's home directory on the afs server. When I select a file through Nautilus in Gnome the file disappears. Do I have something set up wrong??? Jack This problem actually has nothing to do with AFS. Nautilus uses a utility called fam to monitor files for changes. Fam is quite broken by design and with AFS just doesn't work because there's no way to run the daemon with the authentication of the user. Stop fam by doing a chkconfig sgi_fam off and see if that doesn't clear up the problems. Also, it seems to work with the rights for anyuser set to 'l'. As a side note, are you running Red Hat Linux with the default Gnome user environment? Have you seen the problems with GConf? That /IS/ an AFS problem. GConf depends on proper file locking via fcntl() which simply doesn't work right. Are these locking issues being worked on? They are quite a misfeature that is causing me, and others, lots of pain. Jack Neely -- Jack Neely [EMAIL PROTECTED] Linux Realm Kit Administration and Development PAMS Computer Operations at NC State University GPG Fingerprint: 1917 5AC1 E828 9337 7AA4 EA6B 213B 765F 3B6A 5B89 ___ OpenAFS-info mailing list [EMAIL PROTECTED] https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Viewing files through Gnome
On Sun, Jul 14, 2002 at 07:50:20PM -0500, Jack Britton wrote: I have afs up and running. I can access /afs/cellname/home/username after logging in with klog. I have set my user up with all permissions except for administer. When I traverse the file structure all is well until I get into my user's home directory on the afs server. When I select a file through Nautilus in Gnome the file disappears. Do I have something set up wrong??? Jack This problem actually has nothing to do with AFS. Nautilus uses a utility called fam to monitor files for changes. Fam is quite broken by design and with AFS just doesn't work because there's no way to run the daemon with the authentication of the user. Stop fam by doing a chkconfig sgi_fam off and see if that doesn't clear up the problems. Also, it seems to work with the rights for anyuser set to 'l'. As a side note, are you running Red Hat Linux with the default Gnome user environment? Have you seen the problems with GConf? That /IS/ an AFS problem. GConf depends on proper file locking via fcntl() which simply doesn't work right. Are these locking issues being worked on? They are quite a misfeature that is causing me, and others, lots of pain. Jack Neely -- Jack Neely [EMAIL PROTECTED] Linux Realm Kit Administration and Development PAMS Computer Operations at NC State University GPG Fingerprint: 1917 5AC1 E828 9337 7AA4 EA6B 213B 765F 3B6A 5B89 ___ OpenAFS-info mailing list [EMAIL PROTECTED] https://lists.openafs.org/mailman/listinfo/openafs-info