Firefox profile in NFSv4 directory

2012-06-22 Thread Devin Bougie
Hi, All.  We see periodic file system hangs when a firefox profile is stored in 
an NFSv4 directory.  Both client and server are fully updated SL6.2.

We reliably see this when running firefox with the profile stored in an NFSv4 
file system, and do not see this when the client switches to NFSv3.

To reproduce this, we simply run firefox with the profile stored in an NFSv4 
share (for example, mount your home directory using NFSv4).  Eventually the 
NFSv4 file system will wedge and all access to that FS from that client will 
block.  When this happens, "umount -f /file/system" will un-wedge the file 
system and everything continues where it left off.

[root@cesr3601 ~]# umount -f /home/rf_ctl 
umount2: Device or resource busy
umount: /home/rf_ctl: device is busy.
   (In some cases useful info about processes that use
the device is found by lsof(8) or fuser(1))
umount2: Device or resource busy

Firefox opens a bunch of sqlite files and sqlite uses flock to mediate
access, so this looks to be consistent with a problem w/flock and
NFSv4.  Doing strace on a hung firefox and then doing the 'umount -f'
to unhang shows it sitting in a futex which (presumably) gets woken
by the umount attempt:

futex(0x2b5710c9eab0, FUTEX_WAIT_PRIVATE, 2, NULL) = 0
futex(0x2b5710c9eab0, FUTEX_WAIT_PRIVATE, 2, NULL) = 0
futex(0x2b5710c9eab0, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x2b5710d8ca4c, FUTEX_CMP_REQUEUE_PRIVATE, 1, 2147483647, 0x2b570f606238, 
89100) = 1
futex(0x2b5710c9eab0, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x2b56fdd0c040, FUTEX_WAIT_PRIVATE, 2, NULL) = -1 EAGAIN (Resource 
temporarily unavailable)
futex(0x2b56fdd0c040, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x2b57083f630c, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x2b57083f6308, 
{FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1

We find lots of reports of problems with NFSv4 home directories and firefox 
with FC16 and Ubuntu:
https://bugzilla.redhat.com/show_bug.cgi?id=732748
https://bugzilla.redhat.com/show_bug.cgi?id=811138
http://thread.gmane.org/gmane.linux.nfs/48690/focus=48705
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/974664

We have now opened a report for RHEL6, but it appears it won't get much 
traction until confirmed by someone with a RH Support Contract.
https://bugzilla.redhat.com/show_bug.cgi?id=828521

Has anyone else experienced this or does anyone have any suggestions?

Thanks in advance,
Devin

Re: opengl on remote SL6 system when local system uses nvidia

2012-05-16 Thread Devin Bougie
Just incase anyone else runs into this issue, here's a link to the bug report 
where you can find a patch to Mesa that resolves this issue.

https://bugzilla.redhat.com/show_bug.cgi?id=820746

Devin

Re: opengl on remote SL6 system when local system uses nvidia

2012-04-10 Thread Devin Bougie
Hi z,

Thank you for your followup.  Unfortunately we have this problem regardless of 
whether a valid xorg.conf exists on the remote system, or whether the X server 
is started on the remote system.

We have verified this problem using both the latest nvidia driver from elrepo 
and the latest driver from nvidia.com.

Any other suggestions would be greatly appreciated.  One workaround would be to 
install the nvidia driver on all systems regardless of what their graphics card 
is, but I'm somewhat reluctant to proliferate proprietary drivers where they 
shouldn't be needed.  This certainly wasn't needed with SL5.

It would also be helpful to know if anyone else with a mixed nvidia / 
non-nvidia environment can reproduce this proble

Thanks again,
Devin

On Apr 6, 2012, at 12:12 PM, Devin Bougie wrote:

> Hi, All.  We're seeing a problem running opengl on a remote SL6 system when 
> the local system uses the proprietary nvidia drivers.  This does not seem to 
> be a problem with remote SL5 systems.
> 
> The problem seems to only be when sitting at a local system with the nvidia 
> drivers (tested with local SL5, SL6, and OS X) and running opengl on a remote 
> SL6 system that doesn't have the nvidia drivers.
> 
> I think the example below probably demonstrates this better than my 
> description.  Any suggestions would be greatly appreciated, and please let me 
> know if there is any more information we can provide.
> 
> Many thanks,
> Devin
> 
> --
> [dab66@lnx246 ~]% glxinfo |head -n 15
> name of display: :0.0
> display: :0  screen: 0
> direct rendering: Yes
> server glx vendor string: NVIDIA Corporation
> server glx version string: 1.4
> server glx extensions:
>   GLX_EXT_visual_info, GLX_EXT_visual_rating, GLX_SGIX_fbconfig, 
>   GLX_SGIX_pbuffer, GLX_SGI_video_sync, GLX_SGI_swap_control, 
>   GLX_EXT_swap_control, GLX_EXT_texture_from_pixmap, GLX_ARB_create_context, 
>   GLX_ARB_create_context_profile, GLX_EXT_create_context_es2_profile, 
>   GLX_ARB_create_context_robustness, GLX_ARB_multisample, 
>   GLX_NV_float_buffer, GLX_ARB_fbconfig_float, GLX_EXT_framebuffer_sRGB
> client glx vendor string: NVIDIA Corporation
> client glx version string: 1.4
> client glx extensions:
> 
> [dab66@lnx246 ~]% ssh sl5-no-nvidia
> [dab66@sl5-no-nvidia ~]% glxinfo |head -n 15
> name of display: localhost:11.0
> display: localhost:11  screen: 0
> direct rendering: No
> server glx vendor string: NVIDIA Corporation
> server glx version string: 1.4
> server glx extensions:
>   GLX_EXT_visual_info, GLX_EXT_visual_rating, GLX_SGIX_fbconfig, 
>   GLX_SGIX_pbuffer, GLX_SGI_video_sync, GLX_SGI_swap_control, 
>   GLX_EXT_swap_control, GLX_EXT_texture_from_pixmap, GLX_ARB_create_context, 
>   GLX_ARB_create_context_profile, GLX_EXT_create_context_es2_profile, 
>   GLX_ARB_create_context_robustness, GLX_ARB_multisample, 
>   GLX_NV_float_buffer, GLX_ARB_fbconfig_float, GLX_EXT_framebuffer_sRGB
> client glx vendor string: SGI
> client glx version string: 1.4
> client glx extensions:
> 
> [dab66@lnx246 ~]% ssh sl6-no-nvidia
> [dab66@sl6-no-nvidia ~]% glxinfo
> name of display: localhost:27.0
> Error: couldn't find RGB GLX visual or fbconfig
> 
> [dab66@lnx246 ~]% ssh sl6-nvidia
> [dab66@sl6-nvidia ~]% glxinfo |head -n 15
> NVIDIA: could not open the device file /dev/nvidiactl (Permission denied).
> name of display: localhost:10.0
> display: localhost:10  screen: 0
> direct rendering: No (If you want to find out why, try setting 
> LIBGL_DEBUG=verbose)
> server glx vendor string: NVIDIA Corporation
> server glx version string: 1.4
> server glx extensions:
>   GLX_EXT_visual_info, GLX_EXT_visual_rating, GLX_SGIX_fbconfig, 
>   GLX_SGIX_pbuffer, GLX_SGI_video_sync, GLX_SGI_swap_control, 
>   GLX_EXT_swap_control, GLX_EXT_texture_from_pixmap, GLX_ARB_create_context, 
>   GLX_ARB_create_context_profile, GLX_EXT_create_context_es2_profile, 
>   GLX_ARB_create_context_robustness, GLX_ARB_multisample, 
>   GLX_NV_float_buffer, GLX_ARB_fbconfig_float, GLX_EXT_framebuffer_sRGB
> client glx vendor string: NVIDIA Corporation
> client glx version string: 1.4
> client glx extensions:
> --

On Apr 6, 2012, at 6:49 PM, zxq9 wrote:
> Unfortunately I don't have any magical answer for you, but I can confirm that 
> I can't reproduce this between Radeon HD 6310s and 4250s.
> 
> I do have a suspicion about what might be wrong, though...
> 
> Does the nVidia driver setup create an /etc/X11/xorg.conf during a 
> post-install step? I SL6 doesn't have one by default but SL5 did. If 
> something about the chipset or other detail of the X11 setup you've got is 
> not detected correctly by default (as in, not detected correctly during the 
> bo

opengl on remote SL6 system when local system uses nvidia

2012-04-06 Thread Devin Bougie
Hi, All.  We're seeing a problem running opengl on a remote SL6 system when the 
local system uses the proprietary nvidia drivers.  This does not seem to be a 
problem with remote SL5 systems.

The problem seems to only be when sitting at a local system with the nvidia 
drivers (tested with local SL5, SL6, and OS X) and running opengl on a remote 
SL6 system that doesn't have the nvidia drivers.

I think the example below probably demonstrates this better than my 
description.  Any suggestions would be greatly appreciated, and please let me 
know if there is any more information we can provide.

Many thanks,
Devin

--
[dab66@lnx246 ~]% glxinfo |head -n 15
name of display: :0.0
display: :0  screen: 0
direct rendering: Yes
server glx vendor string: NVIDIA Corporation
server glx version string: 1.4
server glx extensions:
   GLX_EXT_visual_info, GLX_EXT_visual_rating, GLX_SGIX_fbconfig, 
   GLX_SGIX_pbuffer, GLX_SGI_video_sync, GLX_SGI_swap_control, 
   GLX_EXT_swap_control, GLX_EXT_texture_from_pixmap, GLX_ARB_create_context, 
   GLX_ARB_create_context_profile, GLX_EXT_create_context_es2_profile, 
   GLX_ARB_create_context_robustness, GLX_ARB_multisample, 
   GLX_NV_float_buffer, GLX_ARB_fbconfig_float, GLX_EXT_framebuffer_sRGB
client glx vendor string: NVIDIA Corporation
client glx version string: 1.4
client glx extensions:

[dab66@lnx246 ~]% ssh sl5-no-nvidia
[dab66@sl5-no-nvidia ~]% glxinfo |head -n 15
name of display: localhost:11.0
display: localhost:11  screen: 0
direct rendering: No
server glx vendor string: NVIDIA Corporation
server glx version string: 1.4
server glx extensions:
   GLX_EXT_visual_info, GLX_EXT_visual_rating, GLX_SGIX_fbconfig, 
   GLX_SGIX_pbuffer, GLX_SGI_video_sync, GLX_SGI_swap_control, 
   GLX_EXT_swap_control, GLX_EXT_texture_from_pixmap, GLX_ARB_create_context, 
   GLX_ARB_create_context_profile, GLX_EXT_create_context_es2_profile, 
   GLX_ARB_create_context_robustness, GLX_ARB_multisample, 
   GLX_NV_float_buffer, GLX_ARB_fbconfig_float, GLX_EXT_framebuffer_sRGB
client glx vendor string: SGI
client glx version string: 1.4
client glx extensions:

[dab66@lnx246 ~]% ssh sl6-no-nvidia
[dab66@sl6-no-nvidia ~]% glxinfo
name of display: localhost:27.0
Error: couldn't find RGB GLX visual or fbconfig

[dab66@lnx246 ~]% ssh sl6-nvidia
[dab66@sl6-nvidia ~]% glxinfo |head -n 15
NVIDIA: could not open the device file /dev/nvidiactl (Permission denied).
name of display: localhost:10.0
display: localhost:10  screen: 0
direct rendering: No (If you want to find out why, try setting 
LIBGL_DEBUG=verbose)
server glx vendor string: NVIDIA Corporation
server glx version string: 1.4
server glx extensions:
   GLX_EXT_visual_info, GLX_EXT_visual_rating, GLX_SGIX_fbconfig, 
   GLX_SGIX_pbuffer, GLX_SGI_video_sync, GLX_SGI_swap_control, 
   GLX_EXT_swap_control, GLX_EXT_texture_from_pixmap, GLX_ARB_create_context, 
   GLX_ARB_create_context_profile, GLX_EXT_create_context_es2_profile, 
   GLX_ARB_create_context_robustness, GLX_ARB_multisample, 
   GLX_NV_float_buffer, GLX_ARB_fbconfig_float, GLX_EXT_framebuffer_sRGB
client glx vendor string: NVIDIA Corporation
client glx version string: 1.4
client glx extensions:
--

allow members of group to unlock screen

2012-01-03 Thread Devin Bougie
Hi, All.  Throughout our control system, we have several SL6 terminals that 
auto-login with a dedicated control system account and launch various 
monitoring and control applications.  In general the passwords for these 
accounts are not known and never needed.

For some of these systems that are not always in adequately protected areas, we 
would like to lock the screen after a period of inactivity.  We would then like 
to give a set of users (ideally members of a unix group) the ability to unlock 
that screen using their own username and password.

We should be able to use PAM, but the xscreensaver (and kde and 
gnome-screensaver) authentication window only lets you modify the password 
field (not the "user" field).  Before we start hacking the source for one of 
these screensaver applications, we thought we'd see what solutions are in use 
at other labs.  

Any recommendations for achieving this (or suggestions of a different workflow) 
in SL6 would be greatly appreciated.

Many thanks,
Devin

--

Devin Bougie
Cornell University
Laboratory for Elementary-Particle Physics
devin.bou...@cornell.edu

Re: Intel Corporation 82572EI Gigabit Ethernet Controller (Copper) on scienfic linux 6.1

2011-08-23 Thread Devin Bougie
Hi Eero,

Yes, that's what I meant.  In your original post you said you were using the 
e1000 driver, but I now see that in the output you include shows you were 
actually using the e1000e.  While I don't have any experience with this card in 
SL6, we've had no problems with several of them in SL5.6 (using the default 
e1000e driver provided by SL).

Devin


On Aug 22, 2011, at 10:42 AM, Eero Volotinen wrote:

> 2011/8/22 Devin Bougie :
>> Hi Eero,
>> 
>> Have you tried using the e1000e driver that's provided by SL?  We haven't 
>> had any problems using 82572EI cards in SL5.6 with the e1000e driver.
>> 
>> I hope this helps,
>> Devin
> 
> You mean the default driver? yes, it cannot detect line.
> 
> --
> Eero


Re: Intel Corporation 82572EI Gigabit Ethernet Controller (Copper) on scienfic linux 6.1

2011-08-22 Thread Devin Bougie
Hi Eero,

Have you tried using the e1000e driver that's provided by SL?  We haven't had 
any problems using 82572EI cards in SL5.6 with the e1000e driver.

I hope this helps,
Devin

On Aug 22, 2011, at 10:10 AM, Eero Volotinen wrote:

> Hi,
> 
> Any ideas how to get Intel Corporation 82572EI Gigabit Ethernet
> Controller (Copper) working on scientific linux 6.1?
> Looks like e1000 driver is tool old to support this card?
> 
> Info:
> 
> 
> 02:00.0 Ethernet controller: Intel Corporation 82572EI Gigabit
> Ethernet Controller (Copper) (rev 06)
>Subsystem: Intel Corporation PRO/1000 PT Server Adapter
>Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> ParErr- Stepping- SERR- FastB2B- DisINTx+
>Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> SERR- Latency: 0, Cache Line Size: 64 bytes
>Interrupt: pin A routed to IRQ 36
>Region 0: Memory at fb64 (32-bit, non-prefetchable) [size=128K]
>Region 1: Memory at fb62 (32-bit, non-prefetchable) [size=128K]
>Region 2: I/O ports at d000 [size=32]
>Expansion ROM at fb60 [disabled] [size=128K]
>Capabilities: [c8] Power Management version 2
>Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA
> PME(D0+,D1-,D2-,D3hot+,D3cold+)
>Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
>Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
>Address: fee0f00c  Data: 41e1
>Capabilities: [e0] Express (v1) Endpoint, MSI 00
>DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s
> <512ns, L1 <64us
>ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset-
>DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+
> Unsupported+
>RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+
>MaxPayload 128 bytes, MaxReadReq 512 bytes
>DevSta: CorrErr- UncorrErr+ FatalErr- UnsuppReq+
> AuxPwr+ TransPend-
>LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s,
> Latency L0 <4us, L1 <64us
>ClockPM- Surprise- LLActRep- BwNot-
>LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
>ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
>LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train-
> SlotClk+ DLActive- BWMgmt- ABWMgmt-
>Capabilities: [100] Advanced Error Reporting
>UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt-
> UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
>UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt-
> UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt-
> UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
>CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
>CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
>AERCap: First Error Pointer: 14, GenCap- CGenEn- ChkCap- ChkEn-
>Capabilities: [140] Device Serial Number 00-1b-21-ff-ff-b0-a7-7e
>Kernel driver in use: e1000e
>Kernel modules: e1000e
> 
> What is best way to update driver? Latest source from intel supports
> this card, but is i prefer rpm way ..
> 
> --
> Eero


MRG Realtime components for SL5

2011-05-04 Thread Devin Bougie
At LEPP we are beginning to look at using the MRG Realtime components for SL5 
in the CESR control system.

We were happy to find the MRG repository at CERN, and would be very grateful to 
hear of any experience others have had with real-time (including the Real-Time 
Specification for Java) on SL(C).
http://linuxsoft.cern.ch/cern/mrg/slc5X/x86_64/RPMS/repoview/

Any quick comments on how other labs are or are considering using MRG Realtime 
would be greatly appreciated.

Many thanks,
Devin

--

Devin Bougie
Cornell University
Laboratory for Elementary-Particle Physics
devin.bou...@cornell.edu


Re: KVM on SL5.5 randomly pausing guests

2011-04-28 Thread Devin Bougie
Just to follow-up on this posting, the problem was that the filesystem the KVM 
guests were on was full.  Everything worked properly after clearing up a little 
space.

Devin

On Apr 22, 2011, at 10:22 AM, James M Pulver wrote:

I’ve noticed on one of my servers that KVM is randomly pausing guests. I’m not 
sure why it’s doing so, virt-manager gives me no real info. The two guests 
affected so far was a Windows 7 VM I was trying to install SP1 on and on the 
same host server a Server 2008 R2 guest that I wasn’t interacting with at all.

Any ideas / troubleshooting?

--
James Pulver
Information Technology Area Supervisor
LEPP Computer Group
Cornell University




End Support Date for SL5

2011-03-28 Thread Devin Bougie
Hi Troy and Connie,

If possible, any update to the End Support Date for SL5 (still listed as "at 
least until 2012-02-02" on the SL Roadmap) would be greatly appreciated.  We 
have been in the process of migrating our control systems and general 
infrastructure from 32-bit SL4 to 64-bit SL5, and are just beginning to 
evaluate SL6.

Many thanks,
Devin

------

Devin Bougie
Cornell University
Laboratory for Elementary-Particle Physics
devin.bou...@cornell.edu


Re: EVO video with nvidia drivers

2011-01-04 Thread Devin Bougie
On Jan 4, 2011, at 10:57 AM, Akemi Yagi wrote:
> The module-init-tools version on your system has a bug as detailed here:
> http://elrepo.org/tiki/Update
> You need to update module-init-tools to the current version which has
> the bug fixed.

I've now found this updated version in the sl-fastbugs repo.

Thanks,
Devin


Re: EVO video with nvidia drivers

2011-01-04 Thread Devin Bougie
On Jan 4, 2011, at 10:57 AM, Akemi Yagi wrote:
On Tue, Jan 4, 2011 at 7:45 AM, Devin Bougie 
mailto:devin.bou...@cornell.edu>> wrote:
Thanks for the suggestion, Akemi and Mark.  When I try to use the ELRepo 
packages, kmod-nvidia conflicts with module-init-tools.

The module-init-tools version on your system has a bug as detailed here:
http://elrepo.org/tiki/Update
You need to update module-init-tools to the current version which has
the bug fixed. Either that or install the one provided by ELRepo.
yum --disablerepo \* --enablerepo elrepo update module-init-tools kmod-nvidia

Thanks.  Once this replaced the module-init-tools from SL 5 base with 
module-init-tools from ELRepo (both 3.3-0.pre3.1.60.el5), I was able to install 
nvidia-drv-x11 and kmod-nvidia from ELrepo.  With this configuration, ViEVO 
works correctly.

I am a little hesitant to push module-init-tools from ELRepo to all of our SL5 
systems.  Is there any chance of incorporating the patched version of 
module-init-tools (or an nvidia driver that provides OpenGL 1.5+) into SL5?

Thanks again,
Devin


Re: EVO video with nvidia drivers

2011-01-04 Thread Devin Bougie
Thanks for the suggestion, Akemi and Mark.  When I try to use the ELRepo 
packages, kmod-nvidia conflicts with module-init-tools.

Devin

--
[r...@lnx226 ~]# yum --enablerepo=elrepo install nvidia-x11-drv 
nvidia-x11-drv-32bit
Loaded plugins: downloadonly, fastestmirror, kernel-module, priorities
Loading mirror speeds from cached hostfile
 * elrepo: elrepo.org
 * sl-base: repos.lepp.cornell.edu
 * sl-security: repos.lepp.cornell.edu
16 packages excluded due to repository priority protections
Setting up Install Process
Resolving Dependencies
--> Running transaction check
---> Package nvidia-x11-drv.x86_64 0:260.19.29-1.el5.elrepo set to be updated
--> Processing Dependency: nvidia-kmod = 260.19.29-1.el5.elrepo for package: 
nvidia-x11-drv
--> Processing Dependency: nvidia-kmod = 260.19.29-1.el5.elrepo for package: 
nvidia-x11-drv
---> Package nvidia-x11-drv-32bit.x86_64 0:260.19.29-1.el5.elrepo set to be 
updated
--> Running transaction check
---> Package kmod-nvidia.x86_64 0:260.19.29-1.el5.elrepo set to be updated
--> Processing Conflict: kmod-nvidia conflicts module-init-tools = 
3.3-0.pre3.1.60.el5
--> Finished Dependency Resolution
kmod-nvidia-260.19.29-1.el5.elrepo.x86_64 from elrepo has depsolving problems
  --> kmod-nvidia conflicts with module-init-tools
Beginning Kernel Module Plugin
Finished Kernel Module Plugin
Error: kmod-nvidia conflicts with module-init-tools
 You could try using --skip-broken to work around the problem
 You could try running: package-cleanup --problems
package-cleanup --dupes
rpm -Va --nofiles --nodigest
--


On Jan 3, 2011, at 5:55 PM, Akemi Yagi wrote:

> On Mon, Jan 3, 2011 at 9:00 AM, Devin Bougie  wrote:
>> And, here is the reply we've received from the EVO support folks.
>> 
>> --
>> The reason is in OpenGL version. You need at least v.1.5+ to be able to
>> use the most recent ViEVO. As you can see from your logs, you have 3.3
>> in case when you can successfully use it and 1.4 when you cannot.
>> Perhaps RPMs are messed up. Please use drivers from nvidia web site,
>> until SLC5 will resolve this issue.
>> --
> 
> I, too, recommend installing ELRepo's nvidia packages (OK, I'm
> biased). They get updated in a timely manner (so far at least). If you
> install them now, it is version 260.19.29 (OpenGL 3.3.0). Unlike the
> Nvidia's installer, you do not need development software and there is
> no need to reinstall upon kernel updates (kABI-tracking kernel
> module).
> 
> Akemi


Re: EVO video with nvidia drivers

2011-01-03 Thread Devin Bougie
And, here is the reply we've received from the EVO support folks.

--
The reason is in OpenGL version. You need at least v.1.5+ to be able to 
use the most recent ViEVO. As you can see from your logs, you have 3.3 
in case when you can successfully use it and 1.4 when you cannot. 
Perhaps RPMs are messed up. Please use drivers from nvidia web site, 
until SLC5 will resolve this issue.
--

Thanks again,
Devin

On Jan 3, 2011, at 11:39 AM, Devin Bougie wrote:
> Hi, All.  We have several SL5 systems using the proprietary nvidia driver 
> from sl-contrib.  We did not have any trouble with these systems until we 
> tried to view video in EVO (ViEVO).  We have no problem using EVO with the 
> latest 260.19.29 drivers downloaded and installed from nvidia.com.  However, 
> we are unable to view video when using nvidia-x11-drv from sl-contrib or 
> nvidia-graphics260.19.21 from atrpms.  In all cases, xorg.conf is essentially 
> unchanged.
> 
> Please see below for example output from unsuccessful and successful attempts 
> at running ViEVO.  Any suggestions for fixing this using the sl-contrib (or 
> atrpms) drivers would be greatly appreciated.
> 
> Many thanks,
> Devin
> 
> Here is the output from attempting to run ViEVO while using nvidia-x11-drv 
> from sl-contrib:
> --
> [da...@lnx226 ~]% rpm -qa |grep -i nvidia
> nvidia-x11-drv-195.36.24-1.0.x86_64
> nvidia-x11-drv-32bit-195.36.24-1.0.x86_64
> [da...@lnx226 ~]% uname -a
> Linux lnx226.lns.cornell.edu 2.6.18-194.26.1.el5 #1 SMP Tue Nov 9 12:46:16 
> EST 2010 x86_64 x86_64 x86_64 GNU/Linux
> [da...@lnx226 ViEVO]% ./ViEVO 127.0.0.1/:2224
> Settings error: .preferences.xml: cannot open file
> Got Doublebuffered Visual!
> glX-Version 1.4
> Sorry, no Direct Rendering possible!
> System name - Linux 
> Nodename - lnx226.lns.cornell.edu 
> Release - 2.6.18-194.26.1.el5 
> Version - #1 SMP Tue Nov 9 12:46:16 EST 2010 
> Architecture - x86_64 
> Error when oppening BMPimage file: earth.bkg 
> /// VERSION 
> /
> 
>  Vievo 2.0 (Build: linux 32-bit PROD Date: 13.05.2010 )
> 
> ///  OpenGL Support Protocol  
> ///
>  OpenGL version:  1.4 (2.1.2 NVIDIA 195.36.24)
>  Vendor:  NVIDIA Corporation
>  Renderer:  Quadro NVS 290/PCI/SSE2
>  ---
>  YUV_TEXTURES_SUPPORTED : 0 
>  
>  GLEE_VERSION_2_0 : 0 
>  GLEE_ARB_shader_objects :   0 
>  GLEE_ARB_vertex_shader :0 
>  GLEE_ARB_fragment_shader :  0 
>  GLEE_ARB_shading_language_100 : 0 
>  -
>  PIXEL_BUFFER_OBJECTS_ARB_SUPPORTED : 0 
>  OPEN_GL_2_1_SUPPORTED : 0 
> /  END  
> /
> 
> libDeckLinkAPI.so: cannot open shared object file: No such file or directory
> No DeckLink drivers
> TCP Channel: Not Connected: Connection refused
> X Error of failed request:  BadLength (poly request too large or internal 
> Xlib length error)
>  Major opcode of failed request:  143 (GLX)
>  Minor opcode of failed request:  1 (X_GLXRender)
>  Serial number of failed request:  410
>  Current serial number in output stream:  411
> --
> 
> And, here is the output from a successful run using 260.19.29 from nvidia.com:
> --
> [da...@lnx226 ~]% uname -a
> Linux lnx226.lns.cornell.edu 2.6.18-194.26.1.el5 #1 SMP Tue Nov 9 12:46:16 
> EST 2010 x86_64 x86_64 x86_64 GNU/Linux
> [da...@lnx226 ~]% grep GLX /var/log/Xorg.0.log
> (II) NVIDIA GLX Module  260.19.29  Wed Dec  8 12:24:30 PST 2010
> [da...@lnx226 ViEVO]% ./ViEVO 127.0.0.1/:2224
> Got Doublebuffered Visual!
> glX-Version 1.4
> Congrats, you have Direct Rendering!
> System name - Linux 
> Nodename - lnx226.lns.cornell.edu 
> Release - 2.6.18-194.26.1.el5 
> Version - #1 SMP Tue Nov 9 12:46:16 EST 2010 
> Architecture - x86_64 
> Error when oppening BMPimage file: earth.bkg 
> /// VERSION 
> /
> 
>  Vievo 2.0 (Build: linux 32-bit PROD Date: 13.05.2010 )
> 
> ///  OpenGL Support Protocol  
> ///
>  OpenGL version:  3.3.0 NVIDIA 260.19.29
>  Vendor:  NVIDIA Corporation
>  Renderer:  Quadro NVS 290/PCI/SSE2
>  ---
>  YUV_TEXTURES_SUPPORTED : 1 
>  
>  GLEE_VERSION_2_0 : 1 
>  GLEE_ARB_shader_objects :   1 
>  GLEE_ARB_vertex_shader :1 
>  GLEE_ARB_fragment_shader :  1 
>  GLEE_ARB_shad

EVO video with nvidia drivers

2011-01-03 Thread Devin Bougie
Hi, All.  We have several SL5 systems using the proprietary nvidia driver from 
sl-contrib.  We did not have any trouble with these systems until we tried to 
view video in EVO (ViEVO).  We have no problem using EVO with the latest 
260.19.29 drivers downloaded and installed from nvidia.com.  However, we are 
unable to view video when using nvidia-x11-drv from sl-contrib or 
nvidia-graphics260.19.21 from atrpms.  In all cases, xorg.conf is essentially 
unchanged.

Please see below for example output from unsuccessful and successful attempts 
at running ViEVO.  Any suggestions for fixing this using the sl-contrib (or 
atrpms) drivers would be greatly appreciated.

Many thanks,
Devin

Here is the output from attempting to run ViEVO while using nvidia-x11-drv from 
sl-contrib:
--
[da...@lnx226 ~]% rpm -qa |grep -i nvidia
nvidia-x11-drv-195.36.24-1.0.x86_64
nvidia-x11-drv-32bit-195.36.24-1.0.x86_64
[da...@lnx226 ~]% uname -a
Linux lnx226.lns.cornell.edu 2.6.18-194.26.1.el5 #1 SMP Tue Nov 9 12:46:16 EST 
2010 x86_64 x86_64 x86_64 GNU/Linux
[da...@lnx226 ViEVO]% ./ViEVO 127.0.0.1/:2224
Settings error: .preferences.xml: cannot open file
Got Doublebuffered Visual!
glX-Version 1.4
Sorry, no Direct Rendering possible!
System name - Linux 
Nodename - lnx226.lns.cornell.edu 
Release - 2.6.18-194.26.1.el5 
Version - #1 SMP Tue Nov 9 12:46:16 EST 2010 
Architecture - x86_64 
Error when oppening BMPimage file: earth.bkg 
/// VERSION 
/

  Vievo 2.0 (Build: linux 32-bit PROD Date: 13.05.2010 )

///  OpenGL Support Protocol  
///
  OpenGL version:  1.4 (2.1.2 NVIDIA 195.36.24)
  Vendor:  NVIDIA Corporation
  Renderer:  Quadro NVS 290/PCI/SSE2
  ---
  YUV_TEXTURES_SUPPORTED : 0 
  
  GLEE_VERSION_2_0 : 0 
  GLEE_ARB_shader_objects :   0 
  GLEE_ARB_vertex_shader :0 
  GLEE_ARB_fragment_shader :  0 
  GLEE_ARB_shading_language_100 : 0 
  -
  PIXEL_BUFFER_OBJECTS_ARB_SUPPORTED : 0 
  OPEN_GL_2_1_SUPPORTED : 0 
/  END  
/

libDeckLinkAPI.so: cannot open shared object file: No such file or directory
No DeckLink drivers
TCP Channel: Not Connected: Connection refused
X Error of failed request:  BadLength (poly request too large or internal Xlib 
length error)
  Major opcode of failed request:  143 (GLX)
  Minor opcode of failed request:  1 (X_GLXRender)
  Serial number of failed request:  410
  Current serial number in output stream:  411
--

And, here is the output from a successful run using 260.19.29 from nvidia.com:
--
[da...@lnx226 ~]% uname -a
Linux lnx226.lns.cornell.edu 2.6.18-194.26.1.el5 #1 SMP Tue Nov 9 12:46:16 EST 
2010 x86_64 x86_64 x86_64 GNU/Linux
[da...@lnx226 ~]% grep GLX /var/log/Xorg.0.log
(II) NVIDIA GLX Module  260.19.29  Wed Dec  8 12:24:30 PST 2010
[da...@lnx226 ViEVO]% ./ViEVO 127.0.0.1/:2224
Got Doublebuffered Visual!
glX-Version 1.4
Congrats, you have Direct Rendering!
System name - Linux 
Nodename - lnx226.lns.cornell.edu 
Release - 2.6.18-194.26.1.el5 
Version - #1 SMP Tue Nov 9 12:46:16 EST 2010 
Architecture - x86_64 
Error when oppening BMPimage file: earth.bkg 
/// VERSION 
/

  Vievo 2.0 (Build: linux 32-bit PROD Date: 13.05.2010 )

///  OpenGL Support Protocol  
///
  OpenGL version:  3.3.0 NVIDIA 260.19.29
  Vendor:  NVIDIA Corporation
  Renderer:  Quadro NVS 290/PCI/SSE2
  ---
  YUV_TEXTURES_SUPPORTED : 1 
  
  GLEE_VERSION_2_0 : 1 
  GLEE_ARB_shader_objects :   1 
  GLEE_ARB_vertex_shader :1 
  GLEE_ARB_fragment_shader :  1 
  GLEE_ARB_shading_language_100 : 1 
  -
  PIXEL_BUFFER_OBJECTS_ARB_SUPPORTED : 1 
  OPEN_GL_2_1_SUPPORTED : 1 
/  END  
/

libDeckLinkAPI.so: cannot open shared object file: No such file or directory
No DeckLink drivers
TCP Channel: Not Connected: Connection refused
Exiting sanely...
222
--


Re: Using SAMBA to share CUPS printers to Windows 7 pushed via GPP

2010-11-23 Thread Devin Bougie
Hi, All.  Thank you for the suggestions.  We were able to fix our problem after 
upgrading from "samba" to "samba3x" and making some configuration changes on 
both the Linux and Windows side.  On Linux, in order for "cupsaddsmb" to work 
we needed to map the AD user that runs "cupsaddsmb" to root.  I'm not sure 
exactly what was needed for our GPO to work properly, but we should be able to 
post more information if others are interested.

Thanks again for the help,
Devin

On Nov 17, 2010, at 1:18 PM, Jon Peatfield wrote:

> On Wed, 17 Nov 2010, James M Pulver wrote:
> 
>> I'm not really sure where the right place to ask this question as it 
>> touches on so many disparate technologies, but here goes. I'm trying to 
>> set up Windows 7 clients to print to printers served from SL5.5 CUPS 
>> server using SAMBA to provide windows print sharing. We've got it 
>> working with the default CUPS postscript drivers, but it requires Admin 
>> on the Windows clients to install the driver.
> ...
> 
> Just to add to what Andrew already said... I look after two samba servers 
> which support printing and both are using the CUPS Windows printer drivers 
> (hence using cupsaddsmb to get samba to offer them to Windows clients).
> 
> On one system which is the 'PDC' for a domain we are using samba-3.0.x (it 
> is sl4 so we can't run the newer version).  The windows clients which are 
> part of the domain can add printers without needing admin rights - though 
> there may be some magic happening to allow this (I know that there _is_ 
> some magic I just don't know what it is).  In this case all the clients 
> are XP atm...
> 
> On another server which is for end-users laptops etc we used to run the 
> samba-3.0.x version and that certainly had some problems with Win-7 
> clients and would not support 64-bit drivers.
> 
> A few months ago we updated that to run sl5 with the samba3x-3.3.x 
> packages.  That does support installing 64-bit drivers - and Win-7 clients 
> generally seem to work more reliably.
> 
> However for this server I *think* that the clients do need admin rights to 
> install the drivers which for us isn't a big problem since there are 
> users' own machines so they usually will have rights on them (most of them 
> seem to run logged in as an admin anyway)...
> 
> BTW the CUPS windows 64-bit drivers have not actually been released but it 
> isn't hard to find the binaries (they were attached to the cups bug-report 
> about the lack of 64-bit drivers).  Google (or other search engines) 
> should find them easily enough...
> 
>  -- Jon


Re: I/O delays

2010-10-20 Thread Devin Bougie
Just incase anyone was interested in the resolution to this problem, it appears 
that the poor drive performance was caused by vibrations from the chassis fans. 
 After replacing about every component in this system (hard drive, motherboard, 
SATA cable, memory, and CPU's) to no avail, we discovered that our chassis had 
a mix of 6.8W and 2.8W fans.  When we corrected this so that all five chassis 
fans were of the 2.8W variety, our disk performance problems appeared to 
disappear.

For what it's worth, a quick search produces a few reports of vibrations 
degrading disk performance. For example:
http://www.dtc.umn.edu/publications/reports/2005_08.pdf
http://etbe.coker.com.au/2009/04/22/vibration-strange-sata-performance/

Finally to quantify the improvement in disk speed, here are the results 
(showing the average and range) of each test from three runs of bonnie++:
bonnie++ -d /mnt/scratch/test_dab66 -n 0 -s 98304 -f

ORIGINAL FAN CONFIGURATION (two 2.8W and three 6.8W)
Writes AVG = 5348; 4327 - 6369
Rewrites AVG = 2489.5; 2284 - 2695
Reads AVG = 9575.5; 7778 - 11373
Random Seeks AVG = 68.4; 65.3 - 71.5

CORRECT FAN CONFIGURATION (five 2.8W)
Writes AVG = 53901.333; 43143 - 60430
Rewrites AVG = 27684; 20178 - 31942
Reads AVG = 75541.333; 65704 - 80717
Random Seeks AVG = 181.333; 117.9 - 218.6

Many thanks for everyone's time and help.

Sincerely,
Devin


Re: I/O delays

2010-09-09 Thread Devin Bougie
Hi Stijn,

Thank you very much for your reply.

On Sep 9, 2010, at 2:13 AM, Stijn De Weirdt wrote:
> so there is only 1 disk in your system according to dmesg (a 500GB SATA
> disk) and dd reports speeds > 1GB/s. you are either the lucky owner of a
> wonder SATA disk or you are measuring linux caching ;) 

Yes, indeed =).

> when you run the tests, can you also run
> watch grep Dirty /proc/meminfo
> and check if it starts increasing, what the maximum is and when it
> starts decreasing.

It appears to start decreasing at 1GB.  With /proc/sys/vm/dirty_ratio set to 
40, the maximum I've seen is 1030708 kB.  With it set to 2, the max I've seen 
is 991704 kB.

> if you have default SL settings  /proc/sys/vm/dirty_ratio is 40, so
> that's 19GB of dirty memory allowed. with your dd you won't reach that,
> but default /proc/sys/vm/dirty_expire_centisecs is 3000, so after 30
> seconds, the system will start to write the dirty memory away, which is
> probably when the "reduced performance" starts to hit you.
> 
> you can safely lower /proc/sys/vm/dirty_ratio to 2 (which is still
> almost 1GB of dirty memory) to get a more stable performance (i think
> recent kernel do this automated based on total size of memory)
> 
> and if you want to measure your disk performance, use iozone/bonnie/IOR
> with sync options (or add 'sync' to you timing stats)

The system is currently in use and I haven't yet had a chance to run any real 
disk benchmarks.  For what it's worth, here's the dd output with a subsequent 
"time sync".  During the sync, the system as a whole becomes sluggish (as we 
only have a single drive used both for the OS and user data) and it takes 
several seconds for commands like "top" to return.  We also see comparable 
behavior when using ext3 or XFS.

[da...@lnx4103 ~]% time dd if=/dev/zero of=/mnt/scratch/test bs=1M count=1K ; 
time sync
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 0.928643 seconds, 1.2 GB/s

real0m1.077s
user0m0.000s
sys 0m1.068s

real1m35.341s
user0m0.000s
sys 0m0.182s

> btw, try dstat (sort of iostat and vmstat combined, with colors ;)

Yes, dstat is very nice!

We will let you know whether or not things improve after replacing the disk.

Thanks again,
Devin


Re: I/O delays

2010-09-08 Thread Devin Bougie
  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


On Sep 8, 2010, at 6:46 PM, Konstantin Olchanski wrote:

> On Wed, Sep 08, 2010 at 04:42:43PM -0400, Devin Bougie wrote:
>> Hi, All.  We are seeing periodic I/O delays on a new large compute node ...
> 
> 
> Hi, you do not say what your disks are (model number as reported by smartctl),
> but I have seen same symptoms as you describe with some WDC 2TB "advanced 
> format" disks.
> The problem vanished with the next shipment of these 2TB disks, which also 
> happen
> to have a slightly different model number.
> 
> So I say, if you see disk delays, try different disks (different maker, 
> different
> model, different capacity).
> 
> 
> -- 
> Konstantin Olchanski
> Data Acquisition Systems: The Bytes Must Flow!
> Email: olchansk-at-triumf-dot-ca
> Snail mail: 4004 Wesbrook Mall, TRIUMF, Vancouver, B.C., V6T 2A3, Canada


Re: I/O delays

2010-09-08 Thread Devin Bougie
Hi Steve,

On Sep 8, 2010, at 5:03 PM, Steven Timm wrote:
> Devin--what is the output of the command "dmesg"
> Are there any kernel traces or bugs in that output?

Unfortunately I don't see anything unusual in syslog or the output of dmesg.

Thanks,
Devin



dmesg.log
Description: dmesg.log


Re: Note - Firefox 3.6 comming today for SL5

2010-06-25 Thread Devin Bougie
Hi David,

On Jun 25, 2010, at 4:15 PM, Kinzel, David wrote:
>> In our case, home directories are also on NFS shares.  The 
>> problem described above happened when ~/.mozilla/firefox was a 
>> link to a folder in a different NFS share.  Sorry for the confusion.
> Out of curiosity, why are people doing this? What benefit are you gaining?

The user in question apparently did this to avoid overfilling their home disk 
quota (thanks to Firefox's urlclassifier3.sqlite).

Devin


Re: Note - Firefox 3.6 comming today for SL5

2010-06-25 Thread Devin Bougie
Hi Chris,

On Jun 25, 2010, at 4:05 PM, Chris Jones wrote:
> On 25 Jun 2010, at 6:05pm, Devin Bougie wrote:
>> Just incase it helps, we saw this same problem with the Lazarus extension 
>> for one of our users.  In this case, the problem appears to have been that 
>> ~/.mozilla/firefox was a link to an NFS share.  When I removed the link and 
>> moved the firefox directory back into ~/.mozilla/, Firefox 3.6 with Lazarus 
>> works fine.  
>> 
>> Firefox 3.0 with Lazarus works fine when ~/.mozilla/firefox is a link to an 
>> NFS share, but Firefox 3.6 with Lazarus does not.
> 
> In my case my home directory is itself an NFS mount, nothing can be done 
> about that. ~/.mozilla/firefox is though a regular directory, not a link. 
> Still, $HOME/.mozilla/firefox is ultimately on an NFS share. Is that the 
> problem I wonder...

In our case, home directories are also on NFS shares.  The problem described 
above happened when ~/.mozilla/firefox was a link to a folder in a different 
NFS share.  Sorry for the confusion.

Devin


Re: Note - Firefox 3.6 comming today for SL5

2010-06-25 Thread Devin Bougie
Hi Jon and Chris,

On Jun 24, 2010, at 5:26 PM, Jon Peatfield wrote:
> ... But one of our users has reported a problem with a particular extension 
> which seems odd - the extension works the first time that 3.6.4 uses it 
> but not after that. ...

Just incase it helps, we saw this same problem with the Lazarus extension for 
one of our users.  In this case, the problem appears to have been that 
~/.mozilla/firefox was a link to an NFS share.  When I removed the link and 
moved the firefox directory back into ~/.mozilla/, Firefox 3.6 with Lazarus 
works fine.  

Firefox 3.0 with Lazarus works fine when ~/.mozilla/firefox is a link to an NFS 
share, but Firefox 3.6 with Lazarus does not.

Devin

------
Devin Bougie
Cornell University
Laboratory for Elementary-Particle Physics
devin.bou...@cornell.edu


Re: KVM troubles on SL5.4

2010-02-22 Thread Devin Bougie
Hi Artem,

On Feb 22, 2010, at 4:13 AM, Artem Trunov wrote:
Have you had a good experience with KVM in the latest SL5.4? Is it 
production-worthy?

Yes, we have had very positive experiences with KVM for production Windows XP 
guests and test Windows 7, Windows Server 2008 R2, SL4, and SL5 guests.  We 
recently deployed our first production server, and are preparing additional 
servers to replace our old VMWare Server 1.0.9 systems.  Of course we 
investigated ESXi, etc. and found KVM to be the best performing (and cheapest) 
solution for us.

1. Boot is only possible from ide bus, and drives are named hda. It seems this 
is a feature of KVM?

IDE is sufficient for our needs.

2. Virtual machine can not use more than 1 host core, even if 2 virtual cpus 
are configured for it.

We haven't noticed this.

3. some times virtual machines lock up and go into a cpu curning loop, 
consuming 100% of a host core. The solution is to kill qemu process.

We haven't seen this either, but have noticed a tendency for Win7 and Server 
R2008 R2 to bluescreen on KVM occasionally (roughly once a week).  We haven't 
had time to investigate what's causing this, but haven't seen it on test 
hardware using the same images.  We have also noticed significant time drift in 
our Windows guests that wasn't seen on VMWare Server, though this has been 
sufficiently corrected using NTP.

4. virt-manager crashes shortly after first start up and few operations. 
Subsequent restart of virt-manager doesn't lead to more crashes..

virt-manager has never crashed for us.

5. It's slow, presumably because of disk operations. I see some 5MB/s max 
read/write. I run a qemu native image format, stored on gpfs.

We've been using the RAW format which performs very well for us.  The qcow 
format was slow (and reportedly buggy).  Our images are stored on an ext3 
filesystem using 15K SAS drives in software RAID5.

My host system has all latest kernels and patches.

Our host system is also running a fully updated x86_64 SL5.4.

I hope this helps,
Devin

--
Devin Bougie
Cornell University
Laboratory for Elementary-Particle Physics
devin.bou...@cornell.edu<mailto:devin.bou...@cornell.edu>


lam-devel and lam-libs after upgrade from SL4.5 to SL4.8

2010-01-27 Thread Devin Bougie
Hi, All.  For what it's worth, our systems appear to have two versions of 
lam-devel and lam-libs after upgrading from SL4.5 to SL4.8:

--
[r...@lnx100 ~]# rpm -qi lam-devel
Name: lam-develRelocations: (not relocatable)
Version : 7.1.2 Vendor: Scientific Linux
Release : 8 Build Date: Fri 04 May 2007 
01:50:22 AM EDT
Install Date: Tue 21 Aug 2007 04:28:45 AM EDT  Build Host: yort.fnal.gov
Group   : Development/Libraries Source RPM: lam-7.1.2-8.src.rpm
Size: 5043907  License: BSD
Signature   : DSA/SHA1, Tue 22 May 2007 04:47:48 PM EDT, Key ID 25dbef78a7048f8d
URL : http://www.lam-mpi.org/
Summary : Development files for LAM
Description :
Contains development headers and libraries for LAM
Name: lam-develRelocations: (not relocatable)
Version : 7.1.2 Vendor: Scientific Linux
Release : 15.el4Build Date: Sat 26 Jul 2008 
10:59:07 PM EDT
Install Date: Mon 04 Jan 2010 11:36:23 AM EST  Build Host: yort1.fnal.gov
Group   : Development/Libraries Source RPM: lam-7.1.2-15.el4.src.rpm
Size: 5376440  License: BSD
Signature   : DSA/SHA1, Sat 26 Jul 2008 08:25:26 PM EDT, Key ID da6ad00882fd17b2
URL : http://www.lam-mpi.org/
Summary : Development files for LAM
Description :
Contains development headers and libraries for LAM
[r...@lnx100 ~]# rpm -qi lam-libs
Name: lam-libs Relocations: (not relocatable)
Version : 7.1.2 Vendor: Scientific Linux
Release : 8 Build Date: Fri 04 May 2007 
01:50:22 AM EDT
Install Date: Tue 21 Aug 2007 04:28:29 AM EDT  Build Host: yort.fnal.gov
Group   : System/Libraries  Source RPM: lam-7.1.2-8.src.rpm
Size: 1051049  License: BSD
Signature   : DSA/SHA1, Tue 22 May 2007 04:47:48 PM EDT, Key ID 25dbef78a7048f8d
URL : http://www.lam-mpi.org/
Summary : Libraries for LAM
Description :
Runtime libraries for LAM
Name: lam-libs Relocations: (not relocatable)
Version : 7.1.2 Vendor: Scientific Linux
Release : 15.el4Build Date: Sat 26 Jul 2008 
10:59:07 PM EDT
Install Date: Mon 04 Jan 2010 11:32:16 AM EST  Build Host: yort1.fnal.gov
Group   : System/Libraries  Source RPM: lam-7.1.2-15.el4.src.rpm
Size: 2495001  License: BSD
Signature   : DSA/SHA1, Sat 26 Jul 2008 08:25:27 PM EDT, Key ID da6ad00882fd17b2
URL : http://www.lam-mpi.org/
Summary : Libraries for LAM
Description :
Runtime libraries for LAM
--

I am unable to remove the older packages via yum, and instead need to use "rpm 
-e --nopreun".
--
[r...@lnx100 ~]# yum remove lam-devel-7.1.2-8 lam-libs-7.1.2-8
Loading "kernel-module" plugin
Setting up Remove Process
Resolving Dependencies
--> Populating transaction set with selected packages. Please wait.
---> Package lam-devel.i386 2:7.1.2-8 set to be erased
---> Package lam-libs.i386 2:7.1.2-8 set to be erased
--> Running transaction check
Beginning Kernel Module Plugin
Finished Kernel Module Plugin

Dependencies Resolved

=
 Package Arch   Version  RepositorySize 
=
Removing:
 lam-devel   i386   2:7.1.2-8installed 4.8 M
 lam-libsi386   2:7.1.2-8installed 1.0 M

Transaction Summary
=
Install  0 Package(s) 
Update   0 Package(s) 
Remove   2 Package(s) 
Total download size: 0 
Is this ok [y/N]: y
Downloading Packages:
Running Transaction Test
Finished Transaction Test
Transaction Test Succeeded
Running Transaction
error: %preun(lam-libs-7.1.2-8.i386) scriptlet failed, exit status 2
error: %preun(lam-devel-7.1.2-8.i386) scriptlet failed, exit status 2

Removed: lam-devel.i386 2:7.1.2-8 lam-libs.i386 2:7.1.2-8
Complete!
[r...@lnx100 ~]# rpm -qi lam-devel-7.1.2-8
Name: lam-develRelocations: (not relocatable)
Version : 7.1.2 Vendor: Scientific Linux
Release : 8 Build Date: Fri 04 May 2007 
01:50:22 AM EDT
Install Date: Tue 21 Aug 2007 04:28:45 AM EDT  Build Host: yort.fnal.gov
Group   : Development/Libraries Source RPM: lam-7.1.2-8.src.rpm
Size: 5043907  License: BSD
Signature   : DSA/SHA1, Tue 22 May 2007 04:47:48 PM EDT, Key ID 25dbef78a7048f8d
URL : http://www.lam-mpi.org/
Summary : Develo

Infortrend EonStor iSCSI storage systems

2010-01-26 Thread Devin Bougie
Hi, All.  We are researching iSCSI storage devices for use in a new 
high-availability SL server cluster using the RH Cluster Suite.  Our basic 
requirements are for a reliable and scalable solution with redundant and 
hot-swapable power supplies and RAID controllers.  We are currently looking at 
Infortrend's EonStor storage systems (most likely the S16E-R1130 or the 
S16E-R1240).

We would greatly appreciate feedback from anyone who has experience with 
Infortrend or their EonStor iSCSI systems.  Of course, we would also greatly 
appreciate suggestions of other devices or vendors to consider.

Many thanks,
Devin

--
Devin Bougie
Cornell University
Laboratory for Elementary-Particle Physics
devin.bou...@cornell.edu<mailto:devin.bou...@cornell.edu>


unstable XDMCP sessions

2009-08-28 Thread Devin Bougie

Hi, All.

In our control room, we use XDMCP on SL4.4 to log into a variety of  
control system Alphas running OpenVMS 8.3 (update 8), with Multinet  
TCP/IP v5.2.  This has worked very well for the past few years.


As of about three weeks ago, we started seeing X sessions being  
killed, typically a few 10s of seconds after they connect to the VMS  
system and present the VMS XDM login screen.  When this occurs, the  
session quietly ends without any error messages, regardless of whether  
or not you actually log into the VMS system.  While the linux systems  
all behave identically, the problem seems to shift around between VMS  
systems.  For example, yesterday morning no linux systems were able to  
remain logged into cesr28, while in the afternoon cesr28 was fine but  
the problem had shifted to cesr27.


The problem has been seen both with Linux display systems and with NCD  
X terminals.  When it happens with the NCDs, they generate a popup  
menu asking if the session should be killed, giving the user the  
ability to remain logged in. The Linux display systems, however,  
simply kill all the windows and redisplay the XDMCP chooser. We have  
not found any option on the linux side to prevent this.


For reliability, we have all updates disabled for these linux  
systems.  As the linux xterms have not changed since their deployment  
several years ago and this problem first surfaced roughly three weeks  
ago, it seems as though the linux systems are sensitive to something  
that is happening on the VMS side or on the network.


We have been testing this using the gdm chooser and, for example, the  
following command:

/usr/X11R6/bin/X -ac -once -query cesr27

If anyone cares to look, I can send the packets captured during both  
successful and failed sessions using tshark.  We have also started a  
thread in the HP IT Resource Forum:

http://forums13.itrc.hp.com/service/forums/questionanswer.do?threadId=1362175

Any suggestions for fixing this problem would be greatly appreciated.

Many thanks,
Devin


--
Devin Bougie
Laboratory for Elementary-Particle Physics
Cornell University
devin.bou...@cornell.edu


apr-devel requires gcc 3.4.5 on SL4

2009-08-18 Thread Devin Bougie
Hi, All.  We are unable to update apr on an SL4.5 system because apr- 
devel requires gcc 3.4.5.


--
[r...@lnx209 ~]# yum update apr
Loading "kernel-module" plugin
Setting up Update Process
Setting up repositories
Reading repository metadata in from local files
Resolving Dependencies
--> Populating transaction set with selected packages. Please wait.
---> Package apr.i386 0:0.9.4-24.9.el4_8.2 set to be updated
--> Running transaction check
--> Processing Dependency: apr = 0.9.4-24.5 for package: apr-devel
--> Restarting Dependency Resolution with new changes.
--> Populating transaction set with selected packages. Please wait.
---> Package apr-devel.i386 0:0.9.4-24.9.el4_8.2 set to be updated
--> Running transaction check
--> Processing Dependency: gcc = 3.4.5 for package: apr-devel
--> Finished Dependency Resolution
Beginning Kernel Module Plugin
Finished Kernel Module Plugin
Error: Missing Dependency: gcc = 3.4.5 is needed by package apr-devel
--

It looks like this has been reported in the past, but I'm not sure if  
there was ever any resolution.


Sincerely,
Devin

--
Devin Bougie
Laboratory for Elementary-Particle Physics
Cornell University
da...@cornell.edu


quota on ext3 with SL4

2009-07-23 Thread Devin Bougie
Hi, All.  I recently ran into the following under "Limitations  
specific to S.L. 4.8."  I am considering how to proceed with a few  
SL4.5 systems that are using quotas on ext3 file systems, and haven't  
yet found much more about this problem.  Any more details or advice  
would be greatly appreciated.


---
Quota on EXT3 file systems
Scientific Linux discourages the use of quota on EXT3 file systems.  
This is because in some cases, doing so can cause a deadlock.
Testing has revealed that kjournald can sometimes block some EXT3- 
specific callouts that are used when quota is running. As such,  
Scientific Linux does not plan to fix this issue in Scientific Linux  
4, as the modifications required would be too invasive.

Note that this issue is not present in Scientific Linux 5.

---

Many thanks,
Devin

--
Devin Bougie
Laboratory for Elementary-Particle Physics
Cornell University
da...@cornell.edu


Re: samba on SL5

2009-02-04 Thread Devin Bougie

Hi Stephen,

On Feb 4, 2009, at 3:36 AM, Stephen J. Gowdy wrote:
Wouldn't it be best to run SAMBA on the machines that actually have  
the disks?


Our home disk server is currently running Solaris 10.  When we moved  
the home disks from an Alpha running Tru64 to this new Sun box, our  
Solaris Admin attempted to get Samba running on the Solaris box.  Of  
course I don't know the details, but he was never able to make it  
work.  He is no longer employed by the lab and was our sol "Solaris  
admin" (with many years of experience supporting Solaris), so I am not  
hopeful about making Samba work when he couldn't.  Moreover, until we  
get the home disks migrated over to a linux server (who knows when  
we'll have time for that), we are inclined to touch the current home  
disk server as little as possible.


After moving our home disks from the aging Alpha to the Sun box, we  
continued to make the home disks accessible over Samba using the Alpha  
(which accessed the disks over NFS).  This Alpha recently died,  
prompting us to finally move the samba service over to linux.


In addition, we were hoping to limit the proliferation of Samba  
servers by having a single "samba" server that users could browse to  
for access to all of the required unix filesystems.


Unless there are any other suggestions, we will move the samba service  
to an SL4 box where everything just works (using the same smb.conf as  
on SL5).


Thanks,
Devin


Re: samba on SL5

2009-02-03 Thread Devin Bougie

On Feb 3, 2009, at 4:10 PM, Dr Andrew C Aitchison wrote:

On Tue, 3 Feb 2009, Devin Bougie wrote:
Hi, All.  We are using samba on an SL5.2 server to share  
filesystems with our Windows systems.  Everything works properly  
when sharing a filesystem that is local to the samba server.  When  
sharing an nfs filesystem, however, we see locking issues with  
*some* file types.

Which kernel are you using on the *NFS* server ?
We had NFS locking issues with the kernel shipped with SL5.2
and some of the updates.
We don't seem to get them with 2.6.18-92.1.22.el5

It may be that your macs don't see the problem because they are
being less picky with the locks.


We see the same problem with all of the NFS shares we've tried, and  
none of the NFS servers are running SL5.  One is Solaris 10, one is  
SL4 with the 2.6.9-67.0.1.ELsmp kernel, and several are SL3 with the  
2.4.21-37.EL.XFSsmp kernel.


If anyone else is serving nfs shares over samba on SL5 and isn't  
seeing this problem, I would be very grateful for a glimpse of your  
smb.conf file.  If anyone is curious, I would be happy to provide ours.


Many thanks for the reply,
Devin



For us the broken locking made mutt very unhappy with an NFS
mounted /var/spool/mail but we found no problems with pine.

--
Dr. Andrew C. Aitchison Computer Officer, DPMMS, Cambridge
a.c.aitchi...@dpmms.cam.ac.uk   http://www.dpmms.cam.ac.uk/~werdna


samba on SL5

2009-02-03 Thread Devin Bougie
Hi, All.  We are using samba on an SL5.2 server to share filesystems  
with our Windows systems.  Everything works properly when sharing a  
filesystem that is local to the samba server.  When sharing an nfs  
filesystem, however, we see locking issues with *some* file types.


For example, we can only open doc or ppt files read-only using MS  
Office on Windows, and get erroneous errors saying that the file is  
locked or already being modified.  If you save the file to your  
desktop and than drag it back to the samba share, it copies over  
without problem.  We can also open txt files using word and edit and  
save them directly onto the share without incident.  We only see this  
problem on Windows clients, as a Mac can edit doc and ppt files  
directly on the share.


I am not able to reproduce this on an SL4 samba server using the exact  
same smb.conf.  In addition, the problem remains after updating samba  
on SL5 to 3.0.33-3.7 from 5rolling.


So far the only "solutions" we've found, neither of which I'm  
comfortable with, are to disable locking ("locking = no") or enable  
fake oplocks ("fake oplocks = yes").


Has anyone else encountered this?  Any suggestions would be greatly  
appreciated.


Thanks,
Devin

--
Devin Bougie
Laboratory for Elementary-Particle Physics
da...@cornell.edu


kernel panic after receiving openafs-1.4.7-68

2008-11-25 Thread Devin Bougie
Hi, All.  Just incase anyone else runs into this, we had a handful of  
SL4.5 machines kernel panic after receiving the openafs-1.4.7-68  
update.  Here are the relevant entries in syslog, and please let me  
know if there is any more information I can provide.


Many thanks,
Devin

--

Nov 25 04:44:16 lnx201 yum: Updated: openafs.i386 1.4.7-68.SL4
Nov 25 04:44:17 lnx201 yum: Updated: openafs-client.i386 1.4.7-68.SL4
Nov 25 04:44:17 lnx201 yum: Updated: openafs-krb5.i386 1.4.7-68.SL4
Nov 25 04:44:23 lnx201 kernel: Failed to invalidate all pages on inode  
0xc5987580
Nov 25 04:44:24 lnx201 kernel: WARM shutting down of: CB... afs...  
BkG... CTrunc... AFSDB... RxEvent... UnmaskRxkSignals... RxListener...
Nov 25 04:44:24 lnx201 kernel: VFS: Busy inodes after unmount. Self- 
destruct in 5 seconds.  Have a nice day...
Nov 25 04:44:24 lnx201 kernel: slab error in kmem_cache_destroy():  
cache `afs_inode_cache': Can't free all objects
Nov 25 04:44:24 lnx201 kernel:  [] kmem_cache_destroy 
+0x99/0x132
Nov 25 04:44:24 lnx201 kernel:  [] cleanup_module+0x1e/0x84  
[libafs]
Nov 25 04:44:24 lnx201 kernel:  [] sys_delete_module+0x13b/ 
0x184

Nov 25 04:44:24 lnx201 kernel:  [] unmap_vma_list+0xe/0x17
Nov 25 04:44:24 lnx201 kernel:  [] do_munmap+0x108/0x116
Nov 25 04:44:24 lnx201 kernel:  [] do_page_fault+0x0/0x5c6
Nov 25 04:44:24 lnx201 kernel:  [] syscall_call+0x7/0xb
Nov 25 04:44:26 lnx201 kernel: libafs: Ignoring new-style parameters  
in presence of obsolete ones
Nov 25 04:44:26 lnx201 kernel: Found system call table at 0xc03289bc  
(pattern scan)
Nov 25 04:44:26 lnx201 kernel: kmem_cache_create: duplicate cache  
afs_inode_cache

Nov 25 04:44:26 lnx201 kernel: [ cut here ]
Nov 25 04:44:26 lnx201 kernel: kernel BUG at mm/slab.c:1453!
Nov 25 04:44:26 lnx201 kernel: invalid operand:  [#1]
Nov 25 04:44:26 lnx201 kernel: SMP
Nov 25 04:44:26 lnx201 kernel: Modules linked in: libafs(U) nfs lockd  
nfs_acl parport_pc lp parport autofs4 i2c_dev i2c_core sunrpc md5 ipv6  
loop dm_mirror button battery ac uhci_hcd ehci_hcd hw_random pcspkr  
tg3 sg ext3 jbd raid1 dm_m
od ata_piix libata mptscsih mptsas mptspi mptscsi mptbase sd_mod  
scsi_mod

Nov 25 04:44:26 lnx201 kernel: CPU:3
Nov 25 04:44:26 lnx201 kernel: EIP:0060:[]Tainted:  
PF VLI

Nov 25 04:44:26 lnx201 kernel: EFLAGS: 00010202   (2.6.9-67.0.1.ELsmp)
Nov 25 04:44:26 lnx201 kernel: EIP is at kmem_cache_create+0x4b3/0x526
Nov 25 04:44:26 lnx201 kernel: eax: 0033   ebx: f718d874   ecx:  
c044382c   edx: c02ec519
Nov 25 04:44:26 lnx201 kernel: esi: f8d9ae8c   edi: f8d9ae9c   ebp:  
f70ae880   esp: e1dcff78

Nov 25 04:44:26 lnx201 kernel: ds: 007b   es: 007b   ss: 0068
Nov 25 04:44:26 lnx201 kernel: Process modprobe (pid: 6553,  
threadinfo=e1dcf000 task=f73aa0b0)
Nov 25 04:44:26 lnx201 kernel: Stack: f8dacb60 c000 c032dba8  
f8d9ae8c 0080 c032dbc8 f8dad380 c032dba8
Nov 25 04:44:26 lnx201 kernel:e1dcf000 f8d8321b 00022000  
f8d831f2  f889f01d c013876d b7dfd008
Nov 25 04:44:26 lnx201 kernel:09562120 0035e63a c02d8613  
b7dfd008 0009bc84 09562120 09562120 0035e63a

Nov 25 04:44:26 lnx201 kernel: Call Trace:
Nov 25 04:44:26 lnx201 kernel:  [] afs_init_inodecache+0x1d/ 
0x2e [libafs]

Nov 25 04:44:26 lnx201 kernel:  [] init_once+0x0/0xc [libafs]
Nov 25 04:44:26 lnx201 kernel:  [] init_module+0x1d/0x3d  
[libafs]

Nov 25 04:44:26 lnx201 kernel:  [] sys_init_module+0xf8/0x21a
Nov 25 04:44:26 lnx201 kernel:  [] syscall_call+0x7/0xb
Nov 25 04:44:26 lnx201 kernel: Code: 04 19 c0 0c 01 85 c0 75 2a ff 74  
24 0c 68 19 c5 2e c0 e8 9c af fd ff 59 b9 2c 38 44 c0 5e f0 ff 05 2c  
38 44 c0 0f 8e 64 15 00 00 <0f> 0b ad 05 96 c4 2e c0 8b 1b eb 84 8b 54  
24 04 b8 00 f0 ff ff

Nov 25 04:44:26 lnx201 kernel:  <0>Fatal exception: panic in 5 seconds


Re: aklog on login (was Re: openssh-server in sl-contrib has Kerberos enabled?)

2008-09-04 Thread Devin Bougie

On Aug 26, 2008, at 6:13 PM, Devin Bougie wrote:

On Aug 22, 2008, at 3:27 PM, Troy Dawson wrote:
Anyway ... the real problem is that annoying message about doing  
aklog when you log in isn't it?
I remember another lab having that problem and we fixed it for  
them ... I think.  It might have been changing the aklog stuff in / 
etc/krb5.conf ... but let me check.
I would also greatly appreciate hearing if anyone has a solution for  
this.  After upgrading from openssh-server-3.9p1-8.SL.4.22 to  
openssh-server-3.9p1-22.SL.4.22, we see "aklog: Can't get  
information about cell ..." when logging in (we need the afs client  
running but do not yet have our own cell).


For now, our workaround is to build our own openssh rpms from ftp://ftp.scientificlinux.org/linux/scientific/47/i386/contrib/SRPMS/openssh/openssh-3.9p1-22.SL.4.22.src.rpm 
, with "--with-afs-krb5" removed from line 373 of openssh.SL4.spec.


Any other suggestions would still be greatly appreciated.

Thanks,
Devin


aklog on login (was Re: openssh-server in sl-contrib has Kerberos enabled?)

2008-08-26 Thread Devin Bougie

Hello, All.

On Aug 22, 2008, at 3:27 PM, Troy Dawson wrote:

...
Anyway ... the real problem is that annoying message about doing  
aklog when you log in isn't it?
I remember another lab having that problem and we fixed it for  
them ... I think.  It might have been changing the aklog stuff in / 
etc/krb5.conf ... but let me check.


I would also greatly appreciate hearing if anyone has a solution for  
this.  After upgrading from openssh-server-3.9p1-8.SL.4.22 to openssh- 
server-3.9p1-22.SL.4.22, we see "aklog: Can't get information about  
cell ..." when logging in (we need the afs client running but do not  
yet have our own cell).


Many thanks,
Devin


Re: SL5 kickstart boot CD

2007-05-22 Thread Devin Bougie

On May 18, 2007, at 4:48 PM, Connie Sieh wrote:

On Fri, 18 May 2007, Devin Bougie wrote:

The .buildstamp(in initrd) and .discinfo (in top level dir) have to
"agree"
So you probably need to get the .discinfo file from your "install
tree" and put it on your new iso image.


Unfortunately this doesn't seem to help.  The .discinfo file on my
iso image does match .discinfo in our install tree, but we still see
the same error.


Was this true before?


Before (and with SL4) there was no .discinfo .


  If so then you need the one from the cd image.


I don't have any more luck using the .discinfo from the cd image.  I  
even tried copying this .discinfo to our install tree, but I still  
get the "... does not seem to match your boot media." message.


Any more suggestions would be greatly appreciated.

Many thanks,
Devin


Re: SL5 kickstart boot CD

2007-05-18 Thread Devin Bougie

Hi Connie,

On May 18, 2007, at 12:40 PM, Connie Sieh wrote:
I assume you are installing from a "mirror" of the  
ftp.scientificlinux.org site.


Yes, we created a local mirror by following your "Mirroring  
Scientific Linux" documentation.  We see the same error regardless of  
whether we use our local mirror or other public mirrors.


The .buildstamp(in initrd) and .discinfo (in top level dir) have to  
"agree"
So you probably need to get the .discinfo file from your "install  
tree" and put it on your new iso image.


Unfortunately this doesn't seem to help.  The .discinfo file on my  
iso image does match .discinfo in our install tree, but we still see  
the same error.


Many thanks for your reply.  Any other suggestions would be greatly  
appreciated.


Devin


SL5 kickstart boot CD

2007-05-18 Thread Devin Bougie

Hi All,

With SL4, we used the the following procedure (for example) to create  
a kickstart boot CD from the first SL4 installation disk.  Our  
kickstart file installs from our local mirror via http.

mount -o loop /tmp/SL.43.050806.i386.disc1.iso /mnt/tmp
mkdir /tmp/SL43
cp -r /mnt/tmp/isolinux /tmp/SL43
cd /tmp/SL43
cp /tmp/ks.cfg isolinux/ks.cfg
edit the isolinux.cfg file to automatically start the kickstart  
installation.

change timeout 600 to timeout 1
	add ks=cdrom:/ks.cfg to the end of the "append" line of the first  
"label linux" section
mkisofs -o /tmp/SL43.iso -b isolinux.bin -c boot.cat -no-emul-boot - 
boot-load-size 4 -boot-info-table -R -J -v -T isolinux/


When we do the same procedure with SL5, we get the following error:
"The Scientific Linux installation tree in that directory does not  
seem to match your boot media."


We get the same error regardless of which SL mirror we use.  However,  
we can install from our local mirror by using "linux askmethod" and  
the normal SL5 installation CD.


Any suggestions would be greatly appreciated.

Many thanks,
Devin

--
Devin Bougie
Laboratory for Elementary-Particle Physics
[EMAIL PROTECTED]


Re: NFS attribute cache problem in SL4

2007-05-03 Thread Devin Bougie

Hi All,

Just to let you know, the 2.6.9-55 kernels released with Update 5 do  
fix this bug.


Regards,
Devin


Re: NFS attribute cache problem in SL4

2007-04-16 Thread Devin Bougie

On Apr 12, 2007, at 8:34 PM, Devin Bougie wrote:

here's a bugzilla report with TUV:
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=236308


This report now contains test kernels that appear to fix this bug.   
They should be included in 4.5 which will be released "real soon now."


Devin


Re: NFS attribute cache problem in SL4

2007-04-13 Thread Devin Bougie

Many thanks for the reply, Miles.

On Apr 12, 2007, at 11:16 PM, Miles O'Neal wrote:

We've run into serious NFS problems with SL3 with our
NetApp filer and NFS.  We solved the problem (or at
least greatly reduced it) with options like these:

foo:/vol/vol1/bar on /export/bar type nfs  
(rw,vers=3,hard,intr,bg,rsize=32768,wsize=32768,tcp,timeo=10,retrans=5 
,retry=5,actimeo=3,addr=www.xxx.yyy.zzz)


The low actimeo and big ?size options were key.  We
also found performance unacceptable with noac.


Unfortunately, these options don't seem to help with the bug we're  
seeing.


Thanks again,
Devin


NFS attribute cache problem in SL4

2007-04-12 Thread Devin Bougie

Hi All,

It looks like we’ve run into an NFS client bug in SL4.  We stumbled  
upon this while trying to checkout code from a subversion repository  
to an nfs directory.  Our NFS servers are SL3, and we only see this  
bug with SL4 clients.  Things work when mounting the directory using  
‘noac’, but we can’t live with the performance hit.


We get the following error when running an svn checkout on SL4 from  
an nfs-mounted directory:

REPORT request failed on '/svn/!svn/vcc/default'
svn: REPORT of '/svn/!svn/vcc/default': 400 Bad Request (https:// 
accserv)


In looking at system call traces for SL3 and SL4 clients checking
out --here's where we think it's going pear-shaped, about 1800
syscalls in:

open("bmad/.svn/tmp/tempfile.tmp", O_RDWR|O_CREAT|O_EXCL, 0666) = 3
[...]
write(3, "svn code.  (We suppose it's possible this is timing related, in which  
case the

second stat might not *reliably* fix the problem...)

Here is a test program to demonstrate the bug.

dsr_lnxcu9% cat svnbug.c
#include 
#include 
#include 
#include 
#include 
#include 

#ifndef TESTSIZE
#define TESTSIZE 52
#endif

int main(int argc, char** argv)
{
char s[TESTSIZE+1];
struct stat st1, st2;
int r;
ssize_t len;

int fd = open("tmpfile.xyzzy", O_RDWR|O_CREAT|O_EXCL, 0666);

if (fd < 0) {
   perror("open");
   exit(errno);
}
memset(s, 'x', TESTSIZE);
len = write(fd, s, TESTSIZE);
if (len < 0) {
   perror("write");
   exit(errno);
}
r = fstat(fd, &st1);
if (0 != r) {
   perror("fstat");
   exit(errno);
}
r = fstat(fd, &st2);
if (0 != r) {
   perror("fstat");
   exit(errno);
}
printf("len = %zd, st1 = %zd, st2 = %zd\n",
   len, st1.st_size, st2.st_size);

close(fd);

return 0;
}
dsr_lnxcu9% gcc svnbug.c
dsr_lnxcu9% ~/a.out
len = 52, st1 = 52, st2 = 52
dsr_lnxcu9% cd /cdat/tem/dsr
dsr_lnxcu9% ~/a.out
len = 52, st1 = 0, st2 = 52

I have also posted a message to the Subversion developers mailing  
list, and here's a bugzilla report with TUV:

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=236308

Finally, here are others with the same problem:
http://www.centos.org/modules/newbb/viewtopic.php?topic_id=4875

Any suggestions or workaround would be greatly appreciated.

Devin

--
Devin Bougie
Laboratory for Elementary-Particle Physics
[EMAIL PROTECTED]