Firefox profile in NFSv4 directory
Hi, All. We see periodic file system hangs when a firefox profile is stored in an NFSv4 directory. Both client and server are fully updated SL6.2. We reliably see this when running firefox with the profile stored in an NFSv4 file system, and do not see this when the client switches to NFSv3. To reproduce this, we simply run firefox with the profile stored in an NFSv4 share (for example, mount your home directory using NFSv4). Eventually the NFSv4 file system will wedge and all access to that FS from that client will block. When this happens, "umount -f /file/system" will un-wedge the file system and everything continues where it left off. [root@cesr3601 ~]# umount -f /home/rf_ctl umount2: Device or resource busy umount: /home/rf_ctl: device is busy. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1)) umount2: Device or resource busy Firefox opens a bunch of sqlite files and sqlite uses flock to mediate access, so this looks to be consistent with a problem w/flock and NFSv4. Doing strace on a hung firefox and then doing the 'umount -f' to unhang shows it sitting in a futex which (presumably) gets woken by the umount attempt: futex(0x2b5710c9eab0, FUTEX_WAIT_PRIVATE, 2, NULL) = 0 futex(0x2b5710c9eab0, FUTEX_WAIT_PRIVATE, 2, NULL) = 0 futex(0x2b5710c9eab0, FUTEX_WAKE_PRIVATE, 1) = 0 futex(0x2b5710d8ca4c, FUTEX_CMP_REQUEUE_PRIVATE, 1, 2147483647, 0x2b570f606238, 89100) = 1 futex(0x2b5710c9eab0, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x2b56fdd0c040, FUTEX_WAIT_PRIVATE, 2, NULL) = -1 EAGAIN (Resource temporarily unavailable) futex(0x2b56fdd0c040, FUTEX_WAKE_PRIVATE, 1) = 0 futex(0x2b57083f630c, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x2b57083f6308, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1 We find lots of reports of problems with NFSv4 home directories and firefox with FC16 and Ubuntu: https://bugzilla.redhat.com/show_bug.cgi?id=732748 https://bugzilla.redhat.com/show_bug.cgi?id=811138 http://thread.gmane.org/gmane.linux.nfs/48690/focus=48705 https://bugs.launchpad.net/ubuntu/+source/linux/+bug/974664 We have now opened a report for RHEL6, but it appears it won't get much traction until confirmed by someone with a RH Support Contract. https://bugzilla.redhat.com/show_bug.cgi?id=828521 Has anyone else experienced this or does anyone have any suggestions? Thanks in advance, Devin
Re: opengl on remote SL6 system when local system uses nvidia
Just incase anyone else runs into this issue, here's a link to the bug report where you can find a patch to Mesa that resolves this issue. https://bugzilla.redhat.com/show_bug.cgi?id=820746 Devin
Re: opengl on remote SL6 system when local system uses nvidia
Hi z, Thank you for your followup. Unfortunately we have this problem regardless of whether a valid xorg.conf exists on the remote system, or whether the X server is started on the remote system. We have verified this problem using both the latest nvidia driver from elrepo and the latest driver from nvidia.com. Any other suggestions would be greatly appreciated. One workaround would be to install the nvidia driver on all systems regardless of what their graphics card is, but I'm somewhat reluctant to proliferate proprietary drivers where they shouldn't be needed. This certainly wasn't needed with SL5. It would also be helpful to know if anyone else with a mixed nvidia / non-nvidia environment can reproduce this proble Thanks again, Devin On Apr 6, 2012, at 12:12 PM, Devin Bougie wrote: > Hi, All. We're seeing a problem running opengl on a remote SL6 system when > the local system uses the proprietary nvidia drivers. This does not seem to > be a problem with remote SL5 systems. > > The problem seems to only be when sitting at a local system with the nvidia > drivers (tested with local SL5, SL6, and OS X) and running opengl on a remote > SL6 system that doesn't have the nvidia drivers. > > I think the example below probably demonstrates this better than my > description. Any suggestions would be greatly appreciated, and please let me > know if there is any more information we can provide. > > Many thanks, > Devin > > -- > [dab66@lnx246 ~]% glxinfo |head -n 15 > name of display: :0.0 > display: :0 screen: 0 > direct rendering: Yes > server glx vendor string: NVIDIA Corporation > server glx version string: 1.4 > server glx extensions: > GLX_EXT_visual_info, GLX_EXT_visual_rating, GLX_SGIX_fbconfig, > GLX_SGIX_pbuffer, GLX_SGI_video_sync, GLX_SGI_swap_control, > GLX_EXT_swap_control, GLX_EXT_texture_from_pixmap, GLX_ARB_create_context, > GLX_ARB_create_context_profile, GLX_EXT_create_context_es2_profile, > GLX_ARB_create_context_robustness, GLX_ARB_multisample, > GLX_NV_float_buffer, GLX_ARB_fbconfig_float, GLX_EXT_framebuffer_sRGB > client glx vendor string: NVIDIA Corporation > client glx version string: 1.4 > client glx extensions: > > [dab66@lnx246 ~]% ssh sl5-no-nvidia > [dab66@sl5-no-nvidia ~]% glxinfo |head -n 15 > name of display: localhost:11.0 > display: localhost:11 screen: 0 > direct rendering: No > server glx vendor string: NVIDIA Corporation > server glx version string: 1.4 > server glx extensions: > GLX_EXT_visual_info, GLX_EXT_visual_rating, GLX_SGIX_fbconfig, > GLX_SGIX_pbuffer, GLX_SGI_video_sync, GLX_SGI_swap_control, > GLX_EXT_swap_control, GLX_EXT_texture_from_pixmap, GLX_ARB_create_context, > GLX_ARB_create_context_profile, GLX_EXT_create_context_es2_profile, > GLX_ARB_create_context_robustness, GLX_ARB_multisample, > GLX_NV_float_buffer, GLX_ARB_fbconfig_float, GLX_EXT_framebuffer_sRGB > client glx vendor string: SGI > client glx version string: 1.4 > client glx extensions: > > [dab66@lnx246 ~]% ssh sl6-no-nvidia > [dab66@sl6-no-nvidia ~]% glxinfo > name of display: localhost:27.0 > Error: couldn't find RGB GLX visual or fbconfig > > [dab66@lnx246 ~]% ssh sl6-nvidia > [dab66@sl6-nvidia ~]% glxinfo |head -n 15 > NVIDIA: could not open the device file /dev/nvidiactl (Permission denied). > name of display: localhost:10.0 > display: localhost:10 screen: 0 > direct rendering: No (If you want to find out why, try setting > LIBGL_DEBUG=verbose) > server glx vendor string: NVIDIA Corporation > server glx version string: 1.4 > server glx extensions: > GLX_EXT_visual_info, GLX_EXT_visual_rating, GLX_SGIX_fbconfig, > GLX_SGIX_pbuffer, GLX_SGI_video_sync, GLX_SGI_swap_control, > GLX_EXT_swap_control, GLX_EXT_texture_from_pixmap, GLX_ARB_create_context, > GLX_ARB_create_context_profile, GLX_EXT_create_context_es2_profile, > GLX_ARB_create_context_robustness, GLX_ARB_multisample, > GLX_NV_float_buffer, GLX_ARB_fbconfig_float, GLX_EXT_framebuffer_sRGB > client glx vendor string: NVIDIA Corporation > client glx version string: 1.4 > client glx extensions: > -- On Apr 6, 2012, at 6:49 PM, zxq9 wrote: > Unfortunately I don't have any magical answer for you, but I can confirm that > I can't reproduce this between Radeon HD 6310s and 4250s. > > I do have a suspicion about what might be wrong, though... > > Does the nVidia driver setup create an /etc/X11/xorg.conf during a > post-install step? I SL6 doesn't have one by default but SL5 did. If > something about the chipset or other detail of the X11 setup you've got is > not detected correctly by default (as in, not detected correctly during the > bo
opengl on remote SL6 system when local system uses nvidia
Hi, All. We're seeing a problem running opengl on a remote SL6 system when the local system uses the proprietary nvidia drivers. This does not seem to be a problem with remote SL5 systems. The problem seems to only be when sitting at a local system with the nvidia drivers (tested with local SL5, SL6, and OS X) and running opengl on a remote SL6 system that doesn't have the nvidia drivers. I think the example below probably demonstrates this better than my description. Any suggestions would be greatly appreciated, and please let me know if there is any more information we can provide. Many thanks, Devin -- [dab66@lnx246 ~]% glxinfo |head -n 15 name of display: :0.0 display: :0 screen: 0 direct rendering: Yes server glx vendor string: NVIDIA Corporation server glx version string: 1.4 server glx extensions: GLX_EXT_visual_info, GLX_EXT_visual_rating, GLX_SGIX_fbconfig, GLX_SGIX_pbuffer, GLX_SGI_video_sync, GLX_SGI_swap_control, GLX_EXT_swap_control, GLX_EXT_texture_from_pixmap, GLX_ARB_create_context, GLX_ARB_create_context_profile, GLX_EXT_create_context_es2_profile, GLX_ARB_create_context_robustness, GLX_ARB_multisample, GLX_NV_float_buffer, GLX_ARB_fbconfig_float, GLX_EXT_framebuffer_sRGB client glx vendor string: NVIDIA Corporation client glx version string: 1.4 client glx extensions: [dab66@lnx246 ~]% ssh sl5-no-nvidia [dab66@sl5-no-nvidia ~]% glxinfo |head -n 15 name of display: localhost:11.0 display: localhost:11 screen: 0 direct rendering: No server glx vendor string: NVIDIA Corporation server glx version string: 1.4 server glx extensions: GLX_EXT_visual_info, GLX_EXT_visual_rating, GLX_SGIX_fbconfig, GLX_SGIX_pbuffer, GLX_SGI_video_sync, GLX_SGI_swap_control, GLX_EXT_swap_control, GLX_EXT_texture_from_pixmap, GLX_ARB_create_context, GLX_ARB_create_context_profile, GLX_EXT_create_context_es2_profile, GLX_ARB_create_context_robustness, GLX_ARB_multisample, GLX_NV_float_buffer, GLX_ARB_fbconfig_float, GLX_EXT_framebuffer_sRGB client glx vendor string: SGI client glx version string: 1.4 client glx extensions: [dab66@lnx246 ~]% ssh sl6-no-nvidia [dab66@sl6-no-nvidia ~]% glxinfo name of display: localhost:27.0 Error: couldn't find RGB GLX visual or fbconfig [dab66@lnx246 ~]% ssh sl6-nvidia [dab66@sl6-nvidia ~]% glxinfo |head -n 15 NVIDIA: could not open the device file /dev/nvidiactl (Permission denied). name of display: localhost:10.0 display: localhost:10 screen: 0 direct rendering: No (If you want to find out why, try setting LIBGL_DEBUG=verbose) server glx vendor string: NVIDIA Corporation server glx version string: 1.4 server glx extensions: GLX_EXT_visual_info, GLX_EXT_visual_rating, GLX_SGIX_fbconfig, GLX_SGIX_pbuffer, GLX_SGI_video_sync, GLX_SGI_swap_control, GLX_EXT_swap_control, GLX_EXT_texture_from_pixmap, GLX_ARB_create_context, GLX_ARB_create_context_profile, GLX_EXT_create_context_es2_profile, GLX_ARB_create_context_robustness, GLX_ARB_multisample, GLX_NV_float_buffer, GLX_ARB_fbconfig_float, GLX_EXT_framebuffer_sRGB client glx vendor string: NVIDIA Corporation client glx version string: 1.4 client glx extensions: --
allow members of group to unlock screen
Hi, All. Throughout our control system, we have several SL6 terminals that auto-login with a dedicated control system account and launch various monitoring and control applications. In general the passwords for these accounts are not known and never needed. For some of these systems that are not always in adequately protected areas, we would like to lock the screen after a period of inactivity. We would then like to give a set of users (ideally members of a unix group) the ability to unlock that screen using their own username and password. We should be able to use PAM, but the xscreensaver (and kde and gnome-screensaver) authentication window only lets you modify the password field (not the "user" field). Before we start hacking the source for one of these screensaver applications, we thought we'd see what solutions are in use at other labs. Any recommendations for achieving this (or suggestions of a different workflow) in SL6 would be greatly appreciated. Many thanks, Devin -- Devin Bougie Cornell University Laboratory for Elementary-Particle Physics devin.bou...@cornell.edu
Re: Intel Corporation 82572EI Gigabit Ethernet Controller (Copper) on scienfic linux 6.1
Hi Eero, Yes, that's what I meant. In your original post you said you were using the e1000 driver, but I now see that in the output you include shows you were actually using the e1000e. While I don't have any experience with this card in SL6, we've had no problems with several of them in SL5.6 (using the default e1000e driver provided by SL). Devin On Aug 22, 2011, at 10:42 AM, Eero Volotinen wrote: > 2011/8/22 Devin Bougie : >> Hi Eero, >> >> Have you tried using the e1000e driver that's provided by SL? We haven't >> had any problems using 82572EI cards in SL5.6 with the e1000e driver. >> >> I hope this helps, >> Devin > > You mean the default driver? yes, it cannot detect line. > > -- > Eero
Re: Intel Corporation 82572EI Gigabit Ethernet Controller (Copper) on scienfic linux 6.1
Hi Eero, Have you tried using the e1000e driver that's provided by SL? We haven't had any problems using 82572EI cards in SL5.6 with the e1000e driver. I hope this helps, Devin On Aug 22, 2011, at 10:10 AM, Eero Volotinen wrote: > Hi, > > Any ideas how to get Intel Corporation 82572EI Gigabit Ethernet > Controller (Copper) working on scientific linux 6.1? > Looks like e1000 driver is tool old to support this card? > > Info: > > > 02:00.0 Ethernet controller: Intel Corporation 82572EI Gigabit > Ethernet Controller (Copper) (rev 06) >Subsystem: Intel Corporation PRO/1000 PT Server Adapter >Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- > ParErr- Stepping- SERR- FastB2B- DisINTx+ >Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- > SERR- Latency: 0, Cache Line Size: 64 bytes >Interrupt: pin A routed to IRQ 36 >Region 0: Memory at fb64 (32-bit, non-prefetchable) [size=128K] >Region 1: Memory at fb62 (32-bit, non-prefetchable) [size=128K] >Region 2: I/O ports at d000 [size=32] >Expansion ROM at fb60 [disabled] [size=128K] >Capabilities: [c8] Power Management version 2 >Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA > PME(D0+,D1-,D2-,D3hot+,D3cold+) >Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME- >Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+ >Address: fee0f00c Data: 41e1 >Capabilities: [e0] Express (v1) Endpoint, MSI 00 >DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s > <512ns, L1 <64us >ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset- >DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ > Unsupported+ >RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+ >MaxPayload 128 bytes, MaxReadReq 512 bytes >DevSta: CorrErr- UncorrErr+ FatalErr- UnsuppReq+ > AuxPwr+ TransPend- >LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s, > Latency L0 <4us, L1 <64us >ClockPM- Surprise- LLActRep- BwNot- >LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+ >ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- >LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- > SlotClk+ DLActive- BWMgmt- ABWMgmt- >Capabilities: [100] Advanced Error Reporting >UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- > UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol- >UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- > UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- >UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- > UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- >CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr- >CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr- >AERCap: First Error Pointer: 14, GenCap- CGenEn- ChkCap- ChkEn- >Capabilities: [140] Device Serial Number 00-1b-21-ff-ff-b0-a7-7e >Kernel driver in use: e1000e >Kernel modules: e1000e > > What is best way to update driver? Latest source from intel supports > this card, but is i prefer rpm way .. > > -- > Eero
MRG Realtime components for SL5
At LEPP we are beginning to look at using the MRG Realtime components for SL5 in the CESR control system. We were happy to find the MRG repository at CERN, and would be very grateful to hear of any experience others have had with real-time (including the Real-Time Specification for Java) on SL(C). http://linuxsoft.cern.ch/cern/mrg/slc5X/x86_64/RPMS/repoview/ Any quick comments on how other labs are or are considering using MRG Realtime would be greatly appreciated. Many thanks, Devin -- Devin Bougie Cornell University Laboratory for Elementary-Particle Physics devin.bou...@cornell.edu
Re: KVM on SL5.5 randomly pausing guests
Just to follow-up on this posting, the problem was that the filesystem the KVM guests were on was full. Everything worked properly after clearing up a little space. Devin On Apr 22, 2011, at 10:22 AM, James M Pulver wrote: I’ve noticed on one of my servers that KVM is randomly pausing guests. I’m not sure why it’s doing so, virt-manager gives me no real info. The two guests affected so far was a Windows 7 VM I was trying to install SP1 on and on the same host server a Server 2008 R2 guest that I wasn’t interacting with at all. Any ideas / troubleshooting? -- James Pulver Information Technology Area Supervisor LEPP Computer Group Cornell University
End Support Date for SL5
Hi Troy and Connie, If possible, any update to the End Support Date for SL5 (still listed as "at least until 2012-02-02" on the SL Roadmap) would be greatly appreciated. We have been in the process of migrating our control systems and general infrastructure from 32-bit SL4 to 64-bit SL5, and are just beginning to evaluate SL6. Many thanks, Devin ------ Devin Bougie Cornell University Laboratory for Elementary-Particle Physics devin.bou...@cornell.edu
Re: EVO video with nvidia drivers
On Jan 4, 2011, at 10:57 AM, Akemi Yagi wrote: > The module-init-tools version on your system has a bug as detailed here: > http://elrepo.org/tiki/Update > You need to update module-init-tools to the current version which has > the bug fixed. I've now found this updated version in the sl-fastbugs repo. Thanks, Devin
Re: EVO video with nvidia drivers
On Jan 4, 2011, at 10:57 AM, Akemi Yagi wrote: On Tue, Jan 4, 2011 at 7:45 AM, Devin Bougie mailto:devin.bou...@cornell.edu>> wrote: Thanks for the suggestion, Akemi and Mark. When I try to use the ELRepo packages, kmod-nvidia conflicts with module-init-tools. The module-init-tools version on your system has a bug as detailed here: http://elrepo.org/tiki/Update You need to update module-init-tools to the current version which has the bug fixed. Either that or install the one provided by ELRepo. yum --disablerepo \* --enablerepo elrepo update module-init-tools kmod-nvidia Thanks. Once this replaced the module-init-tools from SL 5 base with module-init-tools from ELRepo (both 3.3-0.pre3.1.60.el5), I was able to install nvidia-drv-x11 and kmod-nvidia from ELrepo. With this configuration, ViEVO works correctly. I am a little hesitant to push module-init-tools from ELRepo to all of our SL5 systems. Is there any chance of incorporating the patched version of module-init-tools (or an nvidia driver that provides OpenGL 1.5+) into SL5? Thanks again, Devin
Re: EVO video with nvidia drivers
Thanks for the suggestion, Akemi and Mark. When I try to use the ELRepo packages, kmod-nvidia conflicts with module-init-tools. Devin -- [r...@lnx226 ~]# yum --enablerepo=elrepo install nvidia-x11-drv nvidia-x11-drv-32bit Loaded plugins: downloadonly, fastestmirror, kernel-module, priorities Loading mirror speeds from cached hostfile * elrepo: elrepo.org * sl-base: repos.lepp.cornell.edu * sl-security: repos.lepp.cornell.edu 16 packages excluded due to repository priority protections Setting up Install Process Resolving Dependencies --> Running transaction check ---> Package nvidia-x11-drv.x86_64 0:260.19.29-1.el5.elrepo set to be updated --> Processing Dependency: nvidia-kmod = 260.19.29-1.el5.elrepo for package: nvidia-x11-drv --> Processing Dependency: nvidia-kmod = 260.19.29-1.el5.elrepo for package: nvidia-x11-drv ---> Package nvidia-x11-drv-32bit.x86_64 0:260.19.29-1.el5.elrepo set to be updated --> Running transaction check ---> Package kmod-nvidia.x86_64 0:260.19.29-1.el5.elrepo set to be updated --> Processing Conflict: kmod-nvidia conflicts module-init-tools = 3.3-0.pre3.1.60.el5 --> Finished Dependency Resolution kmod-nvidia-260.19.29-1.el5.elrepo.x86_64 from elrepo has depsolving problems --> kmod-nvidia conflicts with module-init-tools Beginning Kernel Module Plugin Finished Kernel Module Plugin Error: kmod-nvidia conflicts with module-init-tools You could try using --skip-broken to work around the problem You could try running: package-cleanup --problems package-cleanup --dupes rpm -Va --nofiles --nodigest -- On Jan 3, 2011, at 5:55 PM, Akemi Yagi wrote: > On Mon, Jan 3, 2011 at 9:00 AM, Devin Bougie wrote: >> And, here is the reply we've received from the EVO support folks. >> >> -- >> The reason is in OpenGL version. You need at least v.1.5+ to be able to >> use the most recent ViEVO. As you can see from your logs, you have 3.3 >> in case when you can successfully use it and 1.4 when you cannot. >> Perhaps RPMs are messed up. Please use drivers from nvidia web site, >> until SLC5 will resolve this issue. >> -- > > I, too, recommend installing ELRepo's nvidia packages (OK, I'm > biased). They get updated in a timely manner (so far at least). If you > install them now, it is version 260.19.29 (OpenGL 3.3.0). Unlike the > Nvidia's installer, you do not need development software and there is > no need to reinstall upon kernel updates (kABI-tracking kernel > module). > > Akemi
Re: EVO video with nvidia drivers
And, here is the reply we've received from the EVO support folks. -- The reason is in OpenGL version. You need at least v.1.5+ to be able to use the most recent ViEVO. As you can see from your logs, you have 3.3 in case when you can successfully use it and 1.4 when you cannot. Perhaps RPMs are messed up. Please use drivers from nvidia web site, until SLC5 will resolve this issue. -- Thanks again, Devin On Jan 3, 2011, at 11:39 AM, Devin Bougie wrote: > Hi, All. We have several SL5 systems using the proprietary nvidia driver > from sl-contrib. We did not have any trouble with these systems until we > tried to view video in EVO (ViEVO). We have no problem using EVO with the > latest 260.19.29 drivers downloaded and installed from nvidia.com. However, > we are unable to view video when using nvidia-x11-drv from sl-contrib or > nvidia-graphics260.19.21 from atrpms. In all cases, xorg.conf is essentially > unchanged. > > Please see below for example output from unsuccessful and successful attempts > at running ViEVO. Any suggestions for fixing this using the sl-contrib (or > atrpms) drivers would be greatly appreciated. > > Many thanks, > Devin > > Here is the output from attempting to run ViEVO while using nvidia-x11-drv > from sl-contrib: > -- > [da...@lnx226 ~]% rpm -qa |grep -i nvidia > nvidia-x11-drv-195.36.24-1.0.x86_64 > nvidia-x11-drv-32bit-195.36.24-1.0.x86_64 > [da...@lnx226 ~]% uname -a > Linux lnx226.lns.cornell.edu 2.6.18-194.26.1.el5 #1 SMP Tue Nov 9 12:46:16 > EST 2010 x86_64 x86_64 x86_64 GNU/Linux > [da...@lnx226 ViEVO]% ./ViEVO 127.0.0.1/:2224 > Settings error: .preferences.xml: cannot open file > Got Doublebuffered Visual! > glX-Version 1.4 > Sorry, no Direct Rendering possible! > System name - Linux > Nodename - lnx226.lns.cornell.edu > Release - 2.6.18-194.26.1.el5 > Version - #1 SMP Tue Nov 9 12:46:16 EST 2010 > Architecture - x86_64 > Error when oppening BMPimage file: earth.bkg > /// VERSION > / > > Vievo 2.0 (Build: linux 32-bit PROD Date: 13.05.2010 ) > > /// OpenGL Support Protocol > /// > OpenGL version: 1.4 (2.1.2 NVIDIA 195.36.24) > Vendor: NVIDIA Corporation > Renderer: Quadro NVS 290/PCI/SSE2 > --- > YUV_TEXTURES_SUPPORTED : 0 > > GLEE_VERSION_2_0 : 0 > GLEE_ARB_shader_objects : 0 > GLEE_ARB_vertex_shader :0 > GLEE_ARB_fragment_shader : 0 > GLEE_ARB_shading_language_100 : 0 > - > PIXEL_BUFFER_OBJECTS_ARB_SUPPORTED : 0 > OPEN_GL_2_1_SUPPORTED : 0 > / END > / > > libDeckLinkAPI.so: cannot open shared object file: No such file or directory > No DeckLink drivers > TCP Channel: Not Connected: Connection refused > X Error of failed request: BadLength (poly request too large or internal > Xlib length error) > Major opcode of failed request: 143 (GLX) > Minor opcode of failed request: 1 (X_GLXRender) > Serial number of failed request: 410 > Current serial number in output stream: 411 > -- > > And, here is the output from a successful run using 260.19.29 from nvidia.com: > -- > [da...@lnx226 ~]% uname -a > Linux lnx226.lns.cornell.edu 2.6.18-194.26.1.el5 #1 SMP Tue Nov 9 12:46:16 > EST 2010 x86_64 x86_64 x86_64 GNU/Linux > [da...@lnx226 ~]% grep GLX /var/log/Xorg.0.log > (II) NVIDIA GLX Module 260.19.29 Wed Dec 8 12:24:30 PST 2010 > [da...@lnx226 ViEVO]% ./ViEVO 127.0.0.1/:2224 > Got Doublebuffered Visual! > glX-Version 1.4 > Congrats, you have Direct Rendering! > System name - Linux > Nodename - lnx226.lns.cornell.edu > Release - 2.6.18-194.26.1.el5 > Version - #1 SMP Tue Nov 9 12:46:16 EST 2010 > Architecture - x86_64 > Error when oppening BMPimage file: earth.bkg > /// VERSION > / > > Vievo 2.0 (Build: linux 32-bit PROD Date: 13.05.2010 ) > > /// OpenGL Support Protocol > /// > OpenGL version: 3.3.0 NVIDIA 260.19.29 > Vendor: NVIDIA Corporation > Renderer: Quadro NVS 290/PCI/SSE2 > --- > YUV_TEXTURES_SUPPORTED : 1 > > GLEE_VERSION_2_0 : 1 > GLEE_ARB_shader_objects : 1 > GLEE_ARB_vertex_shader :1 > GLEE_ARB_fragment_shader : 1 > GLEE_ARB_shad
EVO video with nvidia drivers
Hi, All. We have several SL5 systems using the proprietary nvidia driver from sl-contrib. We did not have any trouble with these systems until we tried to view video in EVO (ViEVO). We have no problem using EVO with the latest 260.19.29 drivers downloaded and installed from nvidia.com. However, we are unable to view video when using nvidia-x11-drv from sl-contrib or nvidia-graphics260.19.21 from atrpms. In all cases, xorg.conf is essentially unchanged. Please see below for example output from unsuccessful and successful attempts at running ViEVO. Any suggestions for fixing this using the sl-contrib (or atrpms) drivers would be greatly appreciated. Many thanks, Devin Here is the output from attempting to run ViEVO while using nvidia-x11-drv from sl-contrib: -- [da...@lnx226 ~]% rpm -qa |grep -i nvidia nvidia-x11-drv-195.36.24-1.0.x86_64 nvidia-x11-drv-32bit-195.36.24-1.0.x86_64 [da...@lnx226 ~]% uname -a Linux lnx226.lns.cornell.edu 2.6.18-194.26.1.el5 #1 SMP Tue Nov 9 12:46:16 EST 2010 x86_64 x86_64 x86_64 GNU/Linux [da...@lnx226 ViEVO]% ./ViEVO 127.0.0.1/:2224 Settings error: .preferences.xml: cannot open file Got Doublebuffered Visual! glX-Version 1.4 Sorry, no Direct Rendering possible! System name - Linux Nodename - lnx226.lns.cornell.edu Release - 2.6.18-194.26.1.el5 Version - #1 SMP Tue Nov 9 12:46:16 EST 2010 Architecture - x86_64 Error when oppening BMPimage file: earth.bkg /// VERSION / Vievo 2.0 (Build: linux 32-bit PROD Date: 13.05.2010 ) /// OpenGL Support Protocol /// OpenGL version: 1.4 (2.1.2 NVIDIA 195.36.24) Vendor: NVIDIA Corporation Renderer: Quadro NVS 290/PCI/SSE2 --- YUV_TEXTURES_SUPPORTED : 0 GLEE_VERSION_2_0 : 0 GLEE_ARB_shader_objects : 0 GLEE_ARB_vertex_shader :0 GLEE_ARB_fragment_shader : 0 GLEE_ARB_shading_language_100 : 0 - PIXEL_BUFFER_OBJECTS_ARB_SUPPORTED : 0 OPEN_GL_2_1_SUPPORTED : 0 / END / libDeckLinkAPI.so: cannot open shared object file: No such file or directory No DeckLink drivers TCP Channel: Not Connected: Connection refused X Error of failed request: BadLength (poly request too large or internal Xlib length error) Major opcode of failed request: 143 (GLX) Minor opcode of failed request: 1 (X_GLXRender) Serial number of failed request: 410 Current serial number in output stream: 411 -- And, here is the output from a successful run using 260.19.29 from nvidia.com: -- [da...@lnx226 ~]% uname -a Linux lnx226.lns.cornell.edu 2.6.18-194.26.1.el5 #1 SMP Tue Nov 9 12:46:16 EST 2010 x86_64 x86_64 x86_64 GNU/Linux [da...@lnx226 ~]% grep GLX /var/log/Xorg.0.log (II) NVIDIA GLX Module 260.19.29 Wed Dec 8 12:24:30 PST 2010 [da...@lnx226 ViEVO]% ./ViEVO 127.0.0.1/:2224 Got Doublebuffered Visual! glX-Version 1.4 Congrats, you have Direct Rendering! System name - Linux Nodename - lnx226.lns.cornell.edu Release - 2.6.18-194.26.1.el5 Version - #1 SMP Tue Nov 9 12:46:16 EST 2010 Architecture - x86_64 Error when oppening BMPimage file: earth.bkg /// VERSION / Vievo 2.0 (Build: linux 32-bit PROD Date: 13.05.2010 ) /// OpenGL Support Protocol /// OpenGL version: 3.3.0 NVIDIA 260.19.29 Vendor: NVIDIA Corporation Renderer: Quadro NVS 290/PCI/SSE2 --- YUV_TEXTURES_SUPPORTED : 1 GLEE_VERSION_2_0 : 1 GLEE_ARB_shader_objects : 1 GLEE_ARB_vertex_shader :1 GLEE_ARB_fragment_shader : 1 GLEE_ARB_shading_language_100 : 1 - PIXEL_BUFFER_OBJECTS_ARB_SUPPORTED : 1 OPEN_GL_2_1_SUPPORTED : 1 / END / libDeckLinkAPI.so: cannot open shared object file: No such file or directory No DeckLink drivers TCP Channel: Not Connected: Connection refused Exiting sanely... 222 --
Re: Using SAMBA to share CUPS printers to Windows 7 pushed via GPP
Hi, All. Thank you for the suggestions. We were able to fix our problem after upgrading from "samba" to "samba3x" and making some configuration changes on both the Linux and Windows side. On Linux, in order for "cupsaddsmb" to work we needed to map the AD user that runs "cupsaddsmb" to root. I'm not sure exactly what was needed for our GPO to work properly, but we should be able to post more information if others are interested. Thanks again for the help, Devin On Nov 17, 2010, at 1:18 PM, Jon Peatfield wrote: > On Wed, 17 Nov 2010, James M Pulver wrote: > >> I'm not really sure where the right place to ask this question as it >> touches on so many disparate technologies, but here goes. I'm trying to >> set up Windows 7 clients to print to printers served from SL5.5 CUPS >> server using SAMBA to provide windows print sharing. We've got it >> working with the default CUPS postscript drivers, but it requires Admin >> on the Windows clients to install the driver. > ... > > Just to add to what Andrew already said... I look after two samba servers > which support printing and both are using the CUPS Windows printer drivers > (hence using cupsaddsmb to get samba to offer them to Windows clients). > > On one system which is the 'PDC' for a domain we are using samba-3.0.x (it > is sl4 so we can't run the newer version). The windows clients which are > part of the domain can add printers without needing admin rights - though > there may be some magic happening to allow this (I know that there _is_ > some magic I just don't know what it is). In this case all the clients > are XP atm... > > On another server which is for end-users laptops etc we used to run the > samba-3.0.x version and that certainly had some problems with Win-7 > clients and would not support 64-bit drivers. > > A few months ago we updated that to run sl5 with the samba3x-3.3.x > packages. That does support installing 64-bit drivers - and Win-7 clients > generally seem to work more reliably. > > However for this server I *think* that the clients do need admin rights to > install the drivers which for us isn't a big problem since there are > users' own machines so they usually will have rights on them (most of them > seem to run logged in as an admin anyway)... > > BTW the CUPS windows 64-bit drivers have not actually been released but it > isn't hard to find the binaries (they were attached to the cups bug-report > about the lack of 64-bit drivers). Google (or other search engines) > should find them easily enough... > > -- Jon
Re: I/O delays
Just incase anyone was interested in the resolution to this problem, it appears that the poor drive performance was caused by vibrations from the chassis fans. After replacing about every component in this system (hard drive, motherboard, SATA cable, memory, and CPU's) to no avail, we discovered that our chassis had a mix of 6.8W and 2.8W fans. When we corrected this so that all five chassis fans were of the 2.8W variety, our disk performance problems appeared to disappear. For what it's worth, a quick search produces a few reports of vibrations degrading disk performance. For example: http://www.dtc.umn.edu/publications/reports/2005_08.pdf http://etbe.coker.com.au/2009/04/22/vibration-strange-sata-performance/ Finally to quantify the improvement in disk speed, here are the results (showing the average and range) of each test from three runs of bonnie++: bonnie++ -d /mnt/scratch/test_dab66 -n 0 -s 98304 -f ORIGINAL FAN CONFIGURATION (two 2.8W and three 6.8W) Writes AVG = 5348; 4327 - 6369 Rewrites AVG = 2489.5; 2284 - 2695 Reads AVG = 9575.5; 7778 - 11373 Random Seeks AVG = 68.4; 65.3 - 71.5 CORRECT FAN CONFIGURATION (five 2.8W) Writes AVG = 53901.333; 43143 - 60430 Rewrites AVG = 27684; 20178 - 31942 Reads AVG = 75541.333; 65704 - 80717 Random Seeks AVG = 181.333; 117.9 - 218.6 Many thanks for everyone's time and help. Sincerely, Devin
Re: I/O delays
Hi Stijn, Thank you very much for your reply. On Sep 9, 2010, at 2:13 AM, Stijn De Weirdt wrote: > so there is only 1 disk in your system according to dmesg (a 500GB SATA > disk) and dd reports speeds > 1GB/s. you are either the lucky owner of a > wonder SATA disk or you are measuring linux caching ;) Yes, indeed =). > when you run the tests, can you also run > watch grep Dirty /proc/meminfo > and check if it starts increasing, what the maximum is and when it > starts decreasing. It appears to start decreasing at 1GB. With /proc/sys/vm/dirty_ratio set to 40, the maximum I've seen is 1030708 kB. With it set to 2, the max I've seen is 991704 kB. > if you have default SL settings /proc/sys/vm/dirty_ratio is 40, so > that's 19GB of dirty memory allowed. with your dd you won't reach that, > but default /proc/sys/vm/dirty_expire_centisecs is 3000, so after 30 > seconds, the system will start to write the dirty memory away, which is > probably when the "reduced performance" starts to hit you. > > you can safely lower /proc/sys/vm/dirty_ratio to 2 (which is still > almost 1GB of dirty memory) to get a more stable performance (i think > recent kernel do this automated based on total size of memory) > > and if you want to measure your disk performance, use iozone/bonnie/IOR > with sync options (or add 'sync' to you timing stats) The system is currently in use and I haven't yet had a chance to run any real disk benchmarks. For what it's worth, here's the dd output with a subsequent "time sync". During the sync, the system as a whole becomes sluggish (as we only have a single drive used both for the OS and user data) and it takes several seconds for commands like "top" to return. We also see comparable behavior when using ext3 or XFS. [da...@lnx4103 ~]% time dd if=/dev/zero of=/mnt/scratch/test bs=1M count=1K ; time sync 1024+0 records in 1024+0 records out 1073741824 bytes (1.1 GB) copied, 0.928643 seconds, 1.2 GB/s real0m1.077s user0m0.000s sys 0m1.068s real1m35.341s user0m0.000s sys 0m0.182s > btw, try dstat (sort of iostat and vmstat combined, with colors ;) Yes, dstat is very nice! We will let you know whether or not things improve after replacing the disk. Thanks again, Devin
Re: I/O delays
Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. On Sep 8, 2010, at 6:46 PM, Konstantin Olchanski wrote: > On Wed, Sep 08, 2010 at 04:42:43PM -0400, Devin Bougie wrote: >> Hi, All. We are seeing periodic I/O delays on a new large compute node ... > > > Hi, you do not say what your disks are (model number as reported by smartctl), > but I have seen same symptoms as you describe with some WDC 2TB "advanced > format" disks. > The problem vanished with the next shipment of these 2TB disks, which also > happen > to have a slightly different model number. > > So I say, if you see disk delays, try different disks (different maker, > different > model, different capacity). > > > -- > Konstantin Olchanski > Data Acquisition Systems: The Bytes Must Flow! > Email: olchansk-at-triumf-dot-ca > Snail mail: 4004 Wesbrook Mall, TRIUMF, Vancouver, B.C., V6T 2A3, Canada
Re: I/O delays
Hi Steve, On Sep 8, 2010, at 5:03 PM, Steven Timm wrote: > Devin--what is the output of the command "dmesg" > Are there any kernel traces or bugs in that output? Unfortunately I don't see anything unusual in syslog or the output of dmesg. Thanks, Devin dmesg.log Description: dmesg.log
Re: Note - Firefox 3.6 comming today for SL5
Hi David, On Jun 25, 2010, at 4:15 PM, Kinzel, David wrote: >> In our case, home directories are also on NFS shares. The >> problem described above happened when ~/.mozilla/firefox was a >> link to a folder in a different NFS share. Sorry for the confusion. > Out of curiosity, why are people doing this? What benefit are you gaining? The user in question apparently did this to avoid overfilling their home disk quota (thanks to Firefox's urlclassifier3.sqlite). Devin
Re: Note - Firefox 3.6 comming today for SL5
Hi Chris, On Jun 25, 2010, at 4:05 PM, Chris Jones wrote: > On 25 Jun 2010, at 6:05pm, Devin Bougie wrote: >> Just incase it helps, we saw this same problem with the Lazarus extension >> for one of our users. In this case, the problem appears to have been that >> ~/.mozilla/firefox was a link to an NFS share. When I removed the link and >> moved the firefox directory back into ~/.mozilla/, Firefox 3.6 with Lazarus >> works fine. >> >> Firefox 3.0 with Lazarus works fine when ~/.mozilla/firefox is a link to an >> NFS share, but Firefox 3.6 with Lazarus does not. > > In my case my home directory is itself an NFS mount, nothing can be done > about that. ~/.mozilla/firefox is though a regular directory, not a link. > Still, $HOME/.mozilla/firefox is ultimately on an NFS share. Is that the > problem I wonder... In our case, home directories are also on NFS shares. The problem described above happened when ~/.mozilla/firefox was a link to a folder in a different NFS share. Sorry for the confusion. Devin
Re: Note - Firefox 3.6 comming today for SL5
Hi Jon and Chris, On Jun 24, 2010, at 5:26 PM, Jon Peatfield wrote: > ... But one of our users has reported a problem with a particular extension > which seems odd - the extension works the first time that 3.6.4 uses it > but not after that. ... Just incase it helps, we saw this same problem with the Lazarus extension for one of our users. In this case, the problem appears to have been that ~/.mozilla/firefox was a link to an NFS share. When I removed the link and moved the firefox directory back into ~/.mozilla/, Firefox 3.6 with Lazarus works fine. Firefox 3.0 with Lazarus works fine when ~/.mozilla/firefox is a link to an NFS share, but Firefox 3.6 with Lazarus does not. Devin ------ Devin Bougie Cornell University Laboratory for Elementary-Particle Physics devin.bou...@cornell.edu
Re: KVM troubles on SL5.4
Hi Artem, On Feb 22, 2010, at 4:13 AM, Artem Trunov wrote: Have you had a good experience with KVM in the latest SL5.4? Is it production-worthy? Yes, we have had very positive experiences with KVM for production Windows XP guests and test Windows 7, Windows Server 2008 R2, SL4, and SL5 guests. We recently deployed our first production server, and are preparing additional servers to replace our old VMWare Server 1.0.9 systems. Of course we investigated ESXi, etc. and found KVM to be the best performing (and cheapest) solution for us. 1. Boot is only possible from ide bus, and drives are named hda. It seems this is a feature of KVM? IDE is sufficient for our needs. 2. Virtual machine can not use more than 1 host core, even if 2 virtual cpus are configured for it. We haven't noticed this. 3. some times virtual machines lock up and go into a cpu curning loop, consuming 100% of a host core. The solution is to kill qemu process. We haven't seen this either, but have noticed a tendency for Win7 and Server R2008 R2 to bluescreen on KVM occasionally (roughly once a week). We haven't had time to investigate what's causing this, but haven't seen it on test hardware using the same images. We have also noticed significant time drift in our Windows guests that wasn't seen on VMWare Server, though this has been sufficiently corrected using NTP. 4. virt-manager crashes shortly after first start up and few operations. Subsequent restart of virt-manager doesn't lead to more crashes.. virt-manager has never crashed for us. 5. It's slow, presumably because of disk operations. I see some 5MB/s max read/write. I run a qemu native image format, stored on gpfs. We've been using the RAW format which performs very well for us. The qcow format was slow (and reportedly buggy). Our images are stored on an ext3 filesystem using 15K SAS drives in software RAID5. My host system has all latest kernels and patches. Our host system is also running a fully updated x86_64 SL5.4. I hope this helps, Devin -- Devin Bougie Cornell University Laboratory for Elementary-Particle Physics devin.bou...@cornell.edu<mailto:devin.bou...@cornell.edu>
lam-devel and lam-libs after upgrade from SL4.5 to SL4.8
Hi, All. For what it's worth, our systems appear to have two versions of lam-devel and lam-libs after upgrading from SL4.5 to SL4.8: -- [r...@lnx100 ~]# rpm -qi lam-devel Name: lam-develRelocations: (not relocatable) Version : 7.1.2 Vendor: Scientific Linux Release : 8 Build Date: Fri 04 May 2007 01:50:22 AM EDT Install Date: Tue 21 Aug 2007 04:28:45 AM EDT Build Host: yort.fnal.gov Group : Development/Libraries Source RPM: lam-7.1.2-8.src.rpm Size: 5043907 License: BSD Signature : DSA/SHA1, Tue 22 May 2007 04:47:48 PM EDT, Key ID 25dbef78a7048f8d URL : http://www.lam-mpi.org/ Summary : Development files for LAM Description : Contains development headers and libraries for LAM Name: lam-develRelocations: (not relocatable) Version : 7.1.2 Vendor: Scientific Linux Release : 15.el4Build Date: Sat 26 Jul 2008 10:59:07 PM EDT Install Date: Mon 04 Jan 2010 11:36:23 AM EST Build Host: yort1.fnal.gov Group : Development/Libraries Source RPM: lam-7.1.2-15.el4.src.rpm Size: 5376440 License: BSD Signature : DSA/SHA1, Sat 26 Jul 2008 08:25:26 PM EDT, Key ID da6ad00882fd17b2 URL : http://www.lam-mpi.org/ Summary : Development files for LAM Description : Contains development headers and libraries for LAM [r...@lnx100 ~]# rpm -qi lam-libs Name: lam-libs Relocations: (not relocatable) Version : 7.1.2 Vendor: Scientific Linux Release : 8 Build Date: Fri 04 May 2007 01:50:22 AM EDT Install Date: Tue 21 Aug 2007 04:28:29 AM EDT Build Host: yort.fnal.gov Group : System/Libraries Source RPM: lam-7.1.2-8.src.rpm Size: 1051049 License: BSD Signature : DSA/SHA1, Tue 22 May 2007 04:47:48 PM EDT, Key ID 25dbef78a7048f8d URL : http://www.lam-mpi.org/ Summary : Libraries for LAM Description : Runtime libraries for LAM Name: lam-libs Relocations: (not relocatable) Version : 7.1.2 Vendor: Scientific Linux Release : 15.el4Build Date: Sat 26 Jul 2008 10:59:07 PM EDT Install Date: Mon 04 Jan 2010 11:32:16 AM EST Build Host: yort1.fnal.gov Group : System/Libraries Source RPM: lam-7.1.2-15.el4.src.rpm Size: 2495001 License: BSD Signature : DSA/SHA1, Sat 26 Jul 2008 08:25:27 PM EDT, Key ID da6ad00882fd17b2 URL : http://www.lam-mpi.org/ Summary : Libraries for LAM Description : Runtime libraries for LAM -- I am unable to remove the older packages via yum, and instead need to use "rpm -e --nopreun". -- [r...@lnx100 ~]# yum remove lam-devel-7.1.2-8 lam-libs-7.1.2-8 Loading "kernel-module" plugin Setting up Remove Process Resolving Dependencies --> Populating transaction set with selected packages. Please wait. ---> Package lam-devel.i386 2:7.1.2-8 set to be erased ---> Package lam-libs.i386 2:7.1.2-8 set to be erased --> Running transaction check Beginning Kernel Module Plugin Finished Kernel Module Plugin Dependencies Resolved = Package Arch Version RepositorySize = Removing: lam-devel i386 2:7.1.2-8installed 4.8 M lam-libsi386 2:7.1.2-8installed 1.0 M Transaction Summary = Install 0 Package(s) Update 0 Package(s) Remove 2 Package(s) Total download size: 0 Is this ok [y/N]: y Downloading Packages: Running Transaction Test Finished Transaction Test Transaction Test Succeeded Running Transaction error: %preun(lam-libs-7.1.2-8.i386) scriptlet failed, exit status 2 error: %preun(lam-devel-7.1.2-8.i386) scriptlet failed, exit status 2 Removed: lam-devel.i386 2:7.1.2-8 lam-libs.i386 2:7.1.2-8 Complete! [r...@lnx100 ~]# rpm -qi lam-devel-7.1.2-8 Name: lam-develRelocations: (not relocatable) Version : 7.1.2 Vendor: Scientific Linux Release : 8 Build Date: Fri 04 May 2007 01:50:22 AM EDT Install Date: Tue 21 Aug 2007 04:28:45 AM EDT Build Host: yort.fnal.gov Group : Development/Libraries Source RPM: lam-7.1.2-8.src.rpm Size: 5043907 License: BSD Signature : DSA/SHA1, Tue 22 May 2007 04:47:48 PM EDT, Key ID 25dbef78a7048f8d URL : http://www.lam-mpi.org/ Summary : Develo
Infortrend EonStor iSCSI storage systems
Hi, All. We are researching iSCSI storage devices for use in a new high-availability SL server cluster using the RH Cluster Suite. Our basic requirements are for a reliable and scalable solution with redundant and hot-swapable power supplies and RAID controllers. We are currently looking at Infortrend's EonStor storage systems (most likely the S16E-R1130 or the S16E-R1240). We would greatly appreciate feedback from anyone who has experience with Infortrend or their EonStor iSCSI systems. Of course, we would also greatly appreciate suggestions of other devices or vendors to consider. Many thanks, Devin -- Devin Bougie Cornell University Laboratory for Elementary-Particle Physics devin.bou...@cornell.edu<mailto:devin.bou...@cornell.edu>
unstable XDMCP sessions
Hi, All. In our control room, we use XDMCP on SL4.4 to log into a variety of control system Alphas running OpenVMS 8.3 (update 8), with Multinet TCP/IP v5.2. This has worked very well for the past few years. As of about three weeks ago, we started seeing X sessions being killed, typically a few 10s of seconds after they connect to the VMS system and present the VMS XDM login screen. When this occurs, the session quietly ends without any error messages, regardless of whether or not you actually log into the VMS system. While the linux systems all behave identically, the problem seems to shift around between VMS systems. For example, yesterday morning no linux systems were able to remain logged into cesr28, while in the afternoon cesr28 was fine but the problem had shifted to cesr27. The problem has been seen both with Linux display systems and with NCD X terminals. When it happens with the NCDs, they generate a popup menu asking if the session should be killed, giving the user the ability to remain logged in. The Linux display systems, however, simply kill all the windows and redisplay the XDMCP chooser. We have not found any option on the linux side to prevent this. For reliability, we have all updates disabled for these linux systems. As the linux xterms have not changed since their deployment several years ago and this problem first surfaced roughly three weeks ago, it seems as though the linux systems are sensitive to something that is happening on the VMS side or on the network. We have been testing this using the gdm chooser and, for example, the following command: /usr/X11R6/bin/X -ac -once -query cesr27 If anyone cares to look, I can send the packets captured during both successful and failed sessions using tshark. We have also started a thread in the HP IT Resource Forum: http://forums13.itrc.hp.com/service/forums/questionanswer.do?threadId=1362175 Any suggestions for fixing this problem would be greatly appreciated. Many thanks, Devin -- Devin Bougie Laboratory for Elementary-Particle Physics Cornell University devin.bou...@cornell.edu
apr-devel requires gcc 3.4.5 on SL4
Hi, All. We are unable to update apr on an SL4.5 system because apr- devel requires gcc 3.4.5. -- [r...@lnx209 ~]# yum update apr Loading "kernel-module" plugin Setting up Update Process Setting up repositories Reading repository metadata in from local files Resolving Dependencies --> Populating transaction set with selected packages. Please wait. ---> Package apr.i386 0:0.9.4-24.9.el4_8.2 set to be updated --> Running transaction check --> Processing Dependency: apr = 0.9.4-24.5 for package: apr-devel --> Restarting Dependency Resolution with new changes. --> Populating transaction set with selected packages. Please wait. ---> Package apr-devel.i386 0:0.9.4-24.9.el4_8.2 set to be updated --> Running transaction check --> Processing Dependency: gcc = 3.4.5 for package: apr-devel --> Finished Dependency Resolution Beginning Kernel Module Plugin Finished Kernel Module Plugin Error: Missing Dependency: gcc = 3.4.5 is needed by package apr-devel -- It looks like this has been reported in the past, but I'm not sure if there was ever any resolution. Sincerely, Devin -- Devin Bougie Laboratory for Elementary-Particle Physics Cornell University da...@cornell.edu
quota on ext3 with SL4
Hi, All. I recently ran into the following under "Limitations specific to S.L. 4.8." I am considering how to proceed with a few SL4.5 systems that are using quotas on ext3 file systems, and haven't yet found much more about this problem. Any more details or advice would be greatly appreciated. --- Quota on EXT3 file systems Scientific Linux discourages the use of quota on EXT3 file systems. This is because in some cases, doing so can cause a deadlock. Testing has revealed that kjournald can sometimes block some EXT3- specific callouts that are used when quota is running. As such, Scientific Linux does not plan to fix this issue in Scientific Linux 4, as the modifications required would be too invasive. Note that this issue is not present in Scientific Linux 5. --- Many thanks, Devin -- Devin Bougie Laboratory for Elementary-Particle Physics Cornell University da...@cornell.edu
Re: samba on SL5
Hi Stephen, On Feb 4, 2009, at 3:36 AM, Stephen J. Gowdy wrote: Wouldn't it be best to run SAMBA on the machines that actually have the disks? Our home disk server is currently running Solaris 10. When we moved the home disks from an Alpha running Tru64 to this new Sun box, our Solaris Admin attempted to get Samba running on the Solaris box. Of course I don't know the details, but he was never able to make it work. He is no longer employed by the lab and was our sol "Solaris admin" (with many years of experience supporting Solaris), so I am not hopeful about making Samba work when he couldn't. Moreover, until we get the home disks migrated over to a linux server (who knows when we'll have time for that), we are inclined to touch the current home disk server as little as possible. After moving our home disks from the aging Alpha to the Sun box, we continued to make the home disks accessible over Samba using the Alpha (which accessed the disks over NFS). This Alpha recently died, prompting us to finally move the samba service over to linux. In addition, we were hoping to limit the proliferation of Samba servers by having a single "samba" server that users could browse to for access to all of the required unix filesystems. Unless there are any other suggestions, we will move the samba service to an SL4 box where everything just works (using the same smb.conf as on SL5). Thanks, Devin
Re: samba on SL5
On Feb 3, 2009, at 4:10 PM, Dr Andrew C Aitchison wrote: On Tue, 3 Feb 2009, Devin Bougie wrote: Hi, All. We are using samba on an SL5.2 server to share filesystems with our Windows systems. Everything works properly when sharing a filesystem that is local to the samba server. When sharing an nfs filesystem, however, we see locking issues with *some* file types. Which kernel are you using on the *NFS* server ? We had NFS locking issues with the kernel shipped with SL5.2 and some of the updates. We don't seem to get them with 2.6.18-92.1.22.el5 It may be that your macs don't see the problem because they are being less picky with the locks. We see the same problem with all of the NFS shares we've tried, and none of the NFS servers are running SL5. One is Solaris 10, one is SL4 with the 2.6.9-67.0.1.ELsmp kernel, and several are SL3 with the 2.4.21-37.EL.XFSsmp kernel. If anyone else is serving nfs shares over samba on SL5 and isn't seeing this problem, I would be very grateful for a glimpse of your smb.conf file. If anyone is curious, I would be happy to provide ours. Many thanks for the reply, Devin For us the broken locking made mutt very unhappy with an NFS mounted /var/spool/mail but we found no problems with pine. -- Dr. Andrew C. Aitchison Computer Officer, DPMMS, Cambridge a.c.aitchi...@dpmms.cam.ac.uk http://www.dpmms.cam.ac.uk/~werdna
samba on SL5
Hi, All. We are using samba on an SL5.2 server to share filesystems with our Windows systems. Everything works properly when sharing a filesystem that is local to the samba server. When sharing an nfs filesystem, however, we see locking issues with *some* file types. For example, we can only open doc or ppt files read-only using MS Office on Windows, and get erroneous errors saying that the file is locked or already being modified. If you save the file to your desktop and than drag it back to the samba share, it copies over without problem. We can also open txt files using word and edit and save them directly onto the share without incident. We only see this problem on Windows clients, as a Mac can edit doc and ppt files directly on the share. I am not able to reproduce this on an SL4 samba server using the exact same smb.conf. In addition, the problem remains after updating samba on SL5 to 3.0.33-3.7 from 5rolling. So far the only "solutions" we've found, neither of which I'm comfortable with, are to disable locking ("locking = no") or enable fake oplocks ("fake oplocks = yes"). Has anyone else encountered this? Any suggestions would be greatly appreciated. Thanks, Devin -- Devin Bougie Laboratory for Elementary-Particle Physics da...@cornell.edu
kernel panic after receiving openafs-1.4.7-68
Hi, All. Just incase anyone else runs into this, we had a handful of SL4.5 machines kernel panic after receiving the openafs-1.4.7-68 update. Here are the relevant entries in syslog, and please let me know if there is any more information I can provide. Many thanks, Devin -- Nov 25 04:44:16 lnx201 yum: Updated: openafs.i386 1.4.7-68.SL4 Nov 25 04:44:17 lnx201 yum: Updated: openafs-client.i386 1.4.7-68.SL4 Nov 25 04:44:17 lnx201 yum: Updated: openafs-krb5.i386 1.4.7-68.SL4 Nov 25 04:44:23 lnx201 kernel: Failed to invalidate all pages on inode 0xc5987580 Nov 25 04:44:24 lnx201 kernel: WARM shutting down of: CB... afs... BkG... CTrunc... AFSDB... RxEvent... UnmaskRxkSignals... RxListener... Nov 25 04:44:24 lnx201 kernel: VFS: Busy inodes after unmount. Self- destruct in 5 seconds. Have a nice day... Nov 25 04:44:24 lnx201 kernel: slab error in kmem_cache_destroy(): cache `afs_inode_cache': Can't free all objects Nov 25 04:44:24 lnx201 kernel: [] kmem_cache_destroy +0x99/0x132 Nov 25 04:44:24 lnx201 kernel: [] cleanup_module+0x1e/0x84 [libafs] Nov 25 04:44:24 lnx201 kernel: [] sys_delete_module+0x13b/ 0x184 Nov 25 04:44:24 lnx201 kernel: [] unmap_vma_list+0xe/0x17 Nov 25 04:44:24 lnx201 kernel: [] do_munmap+0x108/0x116 Nov 25 04:44:24 lnx201 kernel: [] do_page_fault+0x0/0x5c6 Nov 25 04:44:24 lnx201 kernel: [] syscall_call+0x7/0xb Nov 25 04:44:26 lnx201 kernel: libafs: Ignoring new-style parameters in presence of obsolete ones Nov 25 04:44:26 lnx201 kernel: Found system call table at 0xc03289bc (pattern scan) Nov 25 04:44:26 lnx201 kernel: kmem_cache_create: duplicate cache afs_inode_cache Nov 25 04:44:26 lnx201 kernel: [ cut here ] Nov 25 04:44:26 lnx201 kernel: kernel BUG at mm/slab.c:1453! Nov 25 04:44:26 lnx201 kernel: invalid operand: [#1] Nov 25 04:44:26 lnx201 kernel: SMP Nov 25 04:44:26 lnx201 kernel: Modules linked in: libafs(U) nfs lockd nfs_acl parport_pc lp parport autofs4 i2c_dev i2c_core sunrpc md5 ipv6 loop dm_mirror button battery ac uhci_hcd ehci_hcd hw_random pcspkr tg3 sg ext3 jbd raid1 dm_m od ata_piix libata mptscsih mptsas mptspi mptscsi mptbase sd_mod scsi_mod Nov 25 04:44:26 lnx201 kernel: CPU:3 Nov 25 04:44:26 lnx201 kernel: EIP:0060:[]Tainted: PF VLI Nov 25 04:44:26 lnx201 kernel: EFLAGS: 00010202 (2.6.9-67.0.1.ELsmp) Nov 25 04:44:26 lnx201 kernel: EIP is at kmem_cache_create+0x4b3/0x526 Nov 25 04:44:26 lnx201 kernel: eax: 0033 ebx: f718d874 ecx: c044382c edx: c02ec519 Nov 25 04:44:26 lnx201 kernel: esi: f8d9ae8c edi: f8d9ae9c ebp: f70ae880 esp: e1dcff78 Nov 25 04:44:26 lnx201 kernel: ds: 007b es: 007b ss: 0068 Nov 25 04:44:26 lnx201 kernel: Process modprobe (pid: 6553, threadinfo=e1dcf000 task=f73aa0b0) Nov 25 04:44:26 lnx201 kernel: Stack: f8dacb60 c000 c032dba8 f8d9ae8c 0080 c032dbc8 f8dad380 c032dba8 Nov 25 04:44:26 lnx201 kernel:e1dcf000 f8d8321b 00022000 f8d831f2 f889f01d c013876d b7dfd008 Nov 25 04:44:26 lnx201 kernel:09562120 0035e63a c02d8613 b7dfd008 0009bc84 09562120 09562120 0035e63a Nov 25 04:44:26 lnx201 kernel: Call Trace: Nov 25 04:44:26 lnx201 kernel: [] afs_init_inodecache+0x1d/ 0x2e [libafs] Nov 25 04:44:26 lnx201 kernel: [] init_once+0x0/0xc [libafs] Nov 25 04:44:26 lnx201 kernel: [] init_module+0x1d/0x3d [libafs] Nov 25 04:44:26 lnx201 kernel: [] sys_init_module+0xf8/0x21a Nov 25 04:44:26 lnx201 kernel: [] syscall_call+0x7/0xb Nov 25 04:44:26 lnx201 kernel: Code: 04 19 c0 0c 01 85 c0 75 2a ff 74 24 0c 68 19 c5 2e c0 e8 9c af fd ff 59 b9 2c 38 44 c0 5e f0 ff 05 2c 38 44 c0 0f 8e 64 15 00 00 <0f> 0b ad 05 96 c4 2e c0 8b 1b eb 84 8b 54 24 04 b8 00 f0 ff ff Nov 25 04:44:26 lnx201 kernel: <0>Fatal exception: panic in 5 seconds
Re: aklog on login (was Re: openssh-server in sl-contrib has Kerberos enabled?)
On Aug 26, 2008, at 6:13 PM, Devin Bougie wrote: On Aug 22, 2008, at 3:27 PM, Troy Dawson wrote: Anyway ... the real problem is that annoying message about doing aklog when you log in isn't it? I remember another lab having that problem and we fixed it for them ... I think. It might have been changing the aklog stuff in / etc/krb5.conf ... but let me check. I would also greatly appreciate hearing if anyone has a solution for this. After upgrading from openssh-server-3.9p1-8.SL.4.22 to openssh-server-3.9p1-22.SL.4.22, we see "aklog: Can't get information about cell ..." when logging in (we need the afs client running but do not yet have our own cell). For now, our workaround is to build our own openssh rpms from ftp://ftp.scientificlinux.org/linux/scientific/47/i386/contrib/SRPMS/openssh/openssh-3.9p1-22.SL.4.22.src.rpm , with "--with-afs-krb5" removed from line 373 of openssh.SL4.spec. Any other suggestions would still be greatly appreciated. Thanks, Devin
aklog on login (was Re: openssh-server in sl-contrib has Kerberos enabled?)
Hello, All. On Aug 22, 2008, at 3:27 PM, Troy Dawson wrote: ... Anyway ... the real problem is that annoying message about doing aklog when you log in isn't it? I remember another lab having that problem and we fixed it for them ... I think. It might have been changing the aklog stuff in / etc/krb5.conf ... but let me check. I would also greatly appreciate hearing if anyone has a solution for this. After upgrading from openssh-server-3.9p1-8.SL.4.22 to openssh- server-3.9p1-22.SL.4.22, we see "aklog: Can't get information about cell ..." when logging in (we need the afs client running but do not yet have our own cell). Many thanks, Devin
Re: SL5 kickstart boot CD
On May 18, 2007, at 4:48 PM, Connie Sieh wrote: On Fri, 18 May 2007, Devin Bougie wrote: The .buildstamp(in initrd) and .discinfo (in top level dir) have to "agree" So you probably need to get the .discinfo file from your "install tree" and put it on your new iso image. Unfortunately this doesn't seem to help. The .discinfo file on my iso image does match .discinfo in our install tree, but we still see the same error. Was this true before? Before (and with SL4) there was no .discinfo . If so then you need the one from the cd image. I don't have any more luck using the .discinfo from the cd image. I even tried copying this .discinfo to our install tree, but I still get the "... does not seem to match your boot media." message. Any more suggestions would be greatly appreciated. Many thanks, Devin
Re: SL5 kickstart boot CD
Hi Connie, On May 18, 2007, at 12:40 PM, Connie Sieh wrote: I assume you are installing from a "mirror" of the ftp.scientificlinux.org site. Yes, we created a local mirror by following your "Mirroring Scientific Linux" documentation. We see the same error regardless of whether we use our local mirror or other public mirrors. The .buildstamp(in initrd) and .discinfo (in top level dir) have to "agree" So you probably need to get the .discinfo file from your "install tree" and put it on your new iso image. Unfortunately this doesn't seem to help. The .discinfo file on my iso image does match .discinfo in our install tree, but we still see the same error. Many thanks for your reply. Any other suggestions would be greatly appreciated. Devin
SL5 kickstart boot CD
Hi All, With SL4, we used the the following procedure (for example) to create a kickstart boot CD from the first SL4 installation disk. Our kickstart file installs from our local mirror via http. mount -o loop /tmp/SL.43.050806.i386.disc1.iso /mnt/tmp mkdir /tmp/SL43 cp -r /mnt/tmp/isolinux /tmp/SL43 cd /tmp/SL43 cp /tmp/ks.cfg isolinux/ks.cfg edit the isolinux.cfg file to automatically start the kickstart installation. change timeout 600 to timeout 1 add ks=cdrom:/ks.cfg to the end of the "append" line of the first "label linux" section mkisofs -o /tmp/SL43.iso -b isolinux.bin -c boot.cat -no-emul-boot - boot-load-size 4 -boot-info-table -R -J -v -T isolinux/ When we do the same procedure with SL5, we get the following error: "The Scientific Linux installation tree in that directory does not seem to match your boot media." We get the same error regardless of which SL mirror we use. However, we can install from our local mirror by using "linux askmethod" and the normal SL5 installation CD. Any suggestions would be greatly appreciated. Many thanks, Devin -- Devin Bougie Laboratory for Elementary-Particle Physics [EMAIL PROTECTED]
Re: NFS attribute cache problem in SL4
Hi All, Just to let you know, the 2.6.9-55 kernels released with Update 5 do fix this bug. Regards, Devin
Re: NFS attribute cache problem in SL4
On Apr 12, 2007, at 8:34 PM, Devin Bougie wrote: here's a bugzilla report with TUV: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=236308 This report now contains test kernels that appear to fix this bug. They should be included in 4.5 which will be released "real soon now." Devin
Re: NFS attribute cache problem in SL4
Many thanks for the reply, Miles. On Apr 12, 2007, at 11:16 PM, Miles O'Neal wrote: We've run into serious NFS problems with SL3 with our NetApp filer and NFS. We solved the problem (or at least greatly reduced it) with options like these: foo:/vol/vol1/bar on /export/bar type nfs (rw,vers=3,hard,intr,bg,rsize=32768,wsize=32768,tcp,timeo=10,retrans=5 ,retry=5,actimeo=3,addr=www.xxx.yyy.zzz) The low actimeo and big ?size options were key. We also found performance unacceptable with noac. Unfortunately, these options don't seem to help with the bug we're seeing. Thanks again, Devin
NFS attribute cache problem in SL4
Hi All, It looks like we’ve run into an NFS client bug in SL4. We stumbled upon this while trying to checkout code from a subversion repository to an nfs directory. Our NFS servers are SL3, and we only see this bug with SL4 clients. Things work when mounting the directory using ‘noac’, but we can’t live with the performance hit. We get the following error when running an svn checkout on SL4 from an nfs-mounted directory: REPORT request failed on '/svn/!svn/vcc/default' svn: REPORT of '/svn/!svn/vcc/default': 400 Bad Request (https:// accserv) In looking at system call traces for SL3 and SL4 clients checking out --here's where we think it's going pear-shaped, about 1800 syscalls in: open("bmad/.svn/tmp/tempfile.tmp", O_RDWR|O_CREAT|O_EXCL, 0666) = 3 [...] write(3, "svn code. (We suppose it's possible this is timing related, in which case the second stat might not *reliably* fix the problem...) Here is a test program to demonstrate the bug. dsr_lnxcu9% cat svnbug.c #include #include #include #include #include #include #ifndef TESTSIZE #define TESTSIZE 52 #endif int main(int argc, char** argv) { char s[TESTSIZE+1]; struct stat st1, st2; int r; ssize_t len; int fd = open("tmpfile.xyzzy", O_RDWR|O_CREAT|O_EXCL, 0666); if (fd < 0) { perror("open"); exit(errno); } memset(s, 'x', TESTSIZE); len = write(fd, s, TESTSIZE); if (len < 0) { perror("write"); exit(errno); } r = fstat(fd, &st1); if (0 != r) { perror("fstat"); exit(errno); } r = fstat(fd, &st2); if (0 != r) { perror("fstat"); exit(errno); } printf("len = %zd, st1 = %zd, st2 = %zd\n", len, st1.st_size, st2.st_size); close(fd); return 0; } dsr_lnxcu9% gcc svnbug.c dsr_lnxcu9% ~/a.out len = 52, st1 = 52, st2 = 52 dsr_lnxcu9% cd /cdat/tem/dsr dsr_lnxcu9% ~/a.out len = 52, st1 = 0, st2 = 52 I have also posted a message to the Subversion developers mailing list, and here's a bugzilla report with TUV: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=236308 Finally, here are others with the same problem: http://www.centos.org/modules/newbb/viewtopic.php?topic_id=4875 Any suggestions or workaround would be greatly appreciated. Devin -- Devin Bougie Laboratory for Elementary-Particle Physics [EMAIL PROTECTED]