Re: Nvidia woes...
Alan Bartlett wrote: On 28 April 2010 14:10, Alan Bartlett wrote: On 27 April 2010 21:54, Mark Stodola wrote: Hey everyone, I currently have deployed a number of SL 5.2 i386 machines. Mark, Further to my earlier message, I have re-read your initial sentence above. A SL 5.2 system will be using a kernel from the 2.6.18-92.x.y.el5 series. Unfortunately the kmod-nvidia[-*] packages that are available from ELRepo, although being kABI tracking, will only weak-link back to the 2.6.18-128.x.y.el5 kernel series (i.e. SL 5.3) and not, like many of the other packages, back to the original 2.6.18-8.x.y.el5 kernels. So having raised your hopes, I now have to dash them. Sorry. Regards, Alan. Correct for the current driver (kmod-nvidia) and 173 series (kmod-nvidia-173xx) legacy driver, but the older kmod-nvidia-96xx legacy driver is currently kABI compliant will all current EL5 kernels :)
Re: Nvidia woes...
On 28 April 2010 15:41, Jaroslaw Polok wrote: > We do use some NVIDIA's but mostly with 96xx legacy series > driver (however we may start using the 195 current series > on our future hardware): > > So far we use nvidia packages we maintain ourselves but it would > be interesting for us to change this situation... Hello Jarek, As long as you are using SL 5.3 or above, you should find that the packages will fulfil your requirements. The current (as distinct from the legacy) package is regularly rebuilt when nVidia releases a new version. > Speaking of which I'm little bit confused about nvidia kernel > modules and kABI: checking http://dup.et.redhat.com/ > and using kABI testing script gives following result on your > > kmod-nvidia-190.53-1.el5.elrepo.x86_64.rpm, nvidia.ko: > > ./abi_check.py ./lib/modules/2.6.18-164.el5/extra/nvidia/nvidia.ko > Red Hat Enterprise Linux 5 ABI Checker > -- > > ABI Checker version: 1.2 > > Module: ./lib/modules/2.6.18-164.el5/extra/nvidia/nvidia.ko > Kernel: 2.6.18-194.el5 > Whitelist: /usr/src/kernels/2.6.18-194.el5-x86_64/kabi_whitelist > > WARNING: The following symbols are used by your module > WARNING: and are not on the ABI whitelist. > > symbol: acpi_walk_namespace > symbol: agp_bridges > symbol: acpi_get_handle > symbol: acpi_os_wait_events_complete > symbol: acpi_evaluate_object > symbol: acpi_bus_get_device > symbol: acpi_install_notify_handler > symbol: acpi_evaluate_integer > symbol: acpi_remove_notify_handler That is a known issue and as I am not one of the team maintaining the nVidia packages, it would be best if I do not go into great details but just refer you to the relevant bug tracker entries [1][2]. The whole issue of the kernel ABI whitelist and the requirements of certain kmod packages requiring non-listed symbols is something that we have discussed with Jon Masters, of Red Hat. Although I know that other members of the ELRepo Admin Team are subscribed to this list, I think it might be best to transfer this discussion to the ELRepo mailing list [3] where the more appropriate audience can be found. Regards, Alan. [1] http://elrepo.org/bugs/view.php?id=30 [2] https://bugzilla.redhat.com/show_bug.cgi?id=520891 [3] http://lists.elrepo.org/mailman/listinfo/elrepo
Re: Nvidia woes...
On Wed, 28 Apr 2010, Alan Bartlett wrote: On 28 April 2010 14:10, Alan Bartlett wrote: On 27 April 2010 21:54, Mark Stodola wrote: Hey everyone, I currently have deployed a number of SL 5.2 i386 machines. Mark, Further to my earlier message, I have re-read your initial sentence above. A SL 5.2 system will be using a kernel from the 2.6.18-92.x.y.el5 series. It is possible to update any SL5.x system to the latest kernel that is available, so you should be able to get around that. i.e. I am running kernels from the 5.4 series on 5.3 systems all the time and did 5.3 on 5.2 also. Steve Unfortunately the kmod-nvidia[-*] packages that are available from ELRepo, although being kABI tracking, will only weak-link back to the 2.6.18-128.x.y.el5 kernel series (i.e. SL 5.3) and not, like many of the other packages, back to the original 2.6.18-8.x.y.el5 kernels. So having raised your hopes, I now have to dash them. Sorry. Regards, Alan. -- -- Steven C. Timm, Ph.D (630) 840-8525 t...@fnal.gov http://home.fnal.gov/~timm/ Fermilab Computing Division, Scientific Computing Facilities, Grid Facilities Department, FermiGrid Services Group, Assistant Group Leader.
Re: Nvidia woes...
On 28 April 2010 14:10, Alan Bartlett wrote: > On 27 April 2010 21:54, Mark Stodola wrote: >> Hey everyone, >> >> I currently have deployed a number of SL 5.2 i386 machines. Mark, Further to my earlier message, I have re-read your initial sentence above. A SL 5.2 system will be using a kernel from the 2.6.18-92.x.y.el5 series. Unfortunately the kmod-nvidia[-*] packages that are available from ELRepo, although being kABI tracking, will only weak-link back to the 2.6.18-128.x.y.el5 kernel series (i.e. SL 5.3) and not, like many of the other packages, back to the original 2.6.18-8.x.y.el5 kernels. So having raised your hopes, I now have to dash them. Sorry. Regards, Alan.
Re: Nvidia woes...
Alan Bartlett wrote: [...] > I don't use nVidia graphics cards and also should mention my > connection to the ELRepo Project "up front" but have you tried using > the kernel independent, kABI tracking kmod packages that the ELRepo > Project provides? [1] We do use some NVIDIA's but mostly with 96xx legacy series driver (however we may start using the 195 current series on our future hardware): So far we use nvidia packages we maintain ourselves but it would be interesting for us to change this situation... Speaking of which I'm little bit confused about nvidia kernel modules and kABI: checking http://dup.et.redhat.com/ and using kABI testing script gives following result on your kmod-nvidia-190.53-1.el5.elrepo.x86_64.rpm, nvidia.ko: ./abi_check.py ./lib/modules/2.6.18-164.el5/extra/nvidia/nvidia.ko Red Hat Enterprise Linux 5 ABI Checker -- ABI Checker version: 1.2 Module:./lib/modules/2.6.18-164.el5/extra/nvidia/nvidia.ko Kernel:2.6.18-194.el5 Whitelist: /usr/src/kernels/2.6.18-194.el5-x86_64/kabi_whitelist WARNING: The following symbols are used by your module WARNING: and are not on the ABI whitelist. symbol: acpi_walk_namespace symbol: agp_bridges symbol: acpi_get_handle symbol: acpi_os_wait_events_complete symbol: acpi_evaluate_object symbol: acpi_bus_get_device symbol: acpi_install_notify_handler symbol: acpi_evaluate_integer symbol: acpi_remove_notify_handler Are you using a different ABI Checker script ? .. since the version 1.2 does not seem to be happy about nvidia kernel modules here Best Regards Jarek __ --- _ Jaroslaw_Polok __ CERN - IT/OIS/ODS _ _ http://home.cern.ch/~jpolok ___ tel_+41_22_767_1834 _ _ +41_78_792_0795 _
Re: Nvidia woes...
On 27 April 2010 21:54, Mark Stodola wrote: > Hey everyone, > > I currently have deployed a number of SL 5.2 i386 machines. Due to the > circumstances, I'm not in a position to upgrade them to the latest 5.x with > ease. Lately I've been having trouble with systems locking up hard that are > running an nvidia card using the 190.42 or 195.36.15 proprietary drivers. > Dual monitors connected via DVI, twinview. > > I've tried a GeForce 9600GT as well as a Quadro NVS 290 with varying > success. The Quadro seems to have lasted about a month before locking, > while the 9600GT is much more often, daily/weekly. I'm running the stock > 5.2 kernel (2.6.18-92.1.6.el5) and Xorg (xorg-x11-server > 1.1.1-48.41.el5_2.1). The systems are generally idle when it happens. I'm > having no luck capturing log data or kdump data. > > The strange part is, having identical hardware in several locations, only > some experience the issue. > > Hardware: > Intel DG43NB motherboards (bios revision doesn't seem to matter at this > point, running 98,99,104, or 105) > ^- hardware revision is the same for all of them: AAE34877-402 > Areca ARC-1200 SATA RAID card (latest firmware, 1.48), running 2 320G > seagates > Additional PCI-e NIC, Intel PRO/1000, running e1000e v0.4.1.12-NAPI > Single stick, 1GB DDR2 (800) memory > PS/2 Keyboard/mouse > > I'm curious if anyone else has run into similar problems such as this, and > if they have found a solution. I'm looking at trying the 185.18.31 drivers, > which seem to be "certified" for linux by a few software vendors, according > to nvidia's website. > > What driver versions and/or card make/models are people using successfully? > Any help or pointers are greatly appreciated. > > As I said, not all of them are misbehaving, and I have several with the same > config minus the video card running fine on SL5.2 and Windows XP Pro (SP3). > > Getting desperate, > Mark Mark, I don't use nVidia graphics cards and also should mention my connection to the ELRepo Project "up front" but have you tried using the kernel independent, kABI tracking kmod packages that the ELRepo Project provides? [1] There are three different packages available, kmod-nvidia [2], kmod-nvidia-96xx [3] and kmod-nvidia-173xx [4]. If you would like to discuss the usage before trying any one of them, there is a ELRepo users' mailing list [5] and, if there should be a problem, the ELRepo bug tracker [6]. Regards, Alan. [1] http://elrepo.org [2] http://elrepo.org/tiki/kmod-nvidia [3] http://elrepo.org/tiki/kmod-nvidia-96xx [4] http://elrepo.org/tiki/kmod-nvidia-173xx [5] http://lists.elrepo.org/mailman/listinfo/elrepo [6] http://elrepo.org/bugs/main_page.php
RE: Nvidia woes...
Mark, I had a problem like that with a (now decommissioned) SL5.3 box with a GeForce FX5000 series card. I seem to recall that after installing the nVidia 190.53 drivers, the issue disappeared. Under 190.42, the machine would randomly lock up then reboot--it was really frustrating. I never really found a cause since updating the drivers made the issue go away. Good luck! -Mike -Original Message- From: owner-scientific-linux-us...@listserv.fnal.gov [mailto:owner-scientific-linux-us...@listserv.fnal.gov] On Behalf Of Mark Stodola Sent: Tuesday, April 27, 2010 4:55 PM To: SCIENTIFIC-LINUX-USERS@listserv.fnal.gov Subject: Nvidia woes... Hey everyone, I currently have deployed a number of SL 5.2 i386 machines. Due to the circumstances, I'm not in a position to upgrade them to the latest 5.x with ease. Lately I've been having trouble with systems locking up hard that are running an nvidia card using the 190.42 or 195.36.15 proprietary drivers. Dual monitors connected via DVI, twinview. I've tried a GeForce 9600GT as well as a Quadro NVS 290 with varying success. The Quadro seems to have lasted about a month before locking, while the 9600GT is much more often, daily/weekly. I'm running the stock 5.2 kernel (2.6.18-92.1.6.el5) and Xorg (xorg-x11-server 1.1.1-48.41.el5_2.1). The systems are generally idle when it happens. I'm having no luck capturing log data or kdump data. The strange part is, having identical hardware in several locations, only some experience the issue. Hardware: Intel DG43NB motherboards (bios revision doesn't seem to matter at this point, running 98,99,104, or 105) ^- hardware revision is the same for all of them: AAE34877-402 Areca ARC-1200 SATA RAID card (latest firmware, 1.48), running 2 320G seagates Additional PCI-e NIC, Intel PRO/1000, running e1000e v0.4.1.12-NAPI Single stick, 1GB DDR2 (800) memory PS/2 Keyboard/mouse I'm curious if anyone else has run into similar problems such as this, and if they have found a solution. I'm looking at trying the 185.18.31 drivers, which seem to be "certified" for linux by a few software vendors, according to nvidia's website. What driver versions and/or card make/models are people using successfully? Any help or pointers are greatly appreciated. As I said, not all of them are misbehaving, and I have several with the same config minus the video card running fine on SL5.2 and Windows XP Pro (SP3). Getting desperate, Mark -- Mr. Mark V. Stodola Digital Systems Engineer National Electrostatics Corp. P.O. Box 620310 Middleton, WI 53562-0310 USA Phone: (608) 831-7600 Fax: (608) 831-9591
Re: Nvidia woes...
Hi, SL5.4 with nVidia 185.18.36 on NV286 & FX370 . See http://www.mail-archive.com/scientific-linux-users@listserv.fnal.gov/msg05399.html for the gory details... Cheers, Sergio On 27 Apr 2010, at 23:17, Mark Stodola wrote: > Sergio, > > I haven't noticed any memory leaks, but I also haven't been actively hunting > them down. There don't seem to be any signs of dwindling performance before > this happens. Most times, it is just idling overnight. At most, there is a > small amount of network traffic on an isolated LAN of no more than 5 or so > devices, mostly Win XP or SL5.2 systems (often running off a custom livecd > based on Urs' scripts). > > What card/config/drivers are you running? > > Cheers, > Mark > > Sergio Ballestrero wrote: >> Hello Mark, >> we are having problems with X11 slowly leaking memory, which then leads to a >> system crash. Do you see anything similar? >> My attempts at using valgrind have been inconclusive (if not confusing) up >> to now... >> >> Cheers, >> Sergio >> >> On 27 Apr 2010, at 22:54, Mark Stodola wrote: >> >> >>> Hey everyone, >>> >>> I currently have deployed a number of SL 5.2 i386 machines. Due to the >>> circumstances, I'm not in a position to upgrade them to the latest 5.x with >>> ease. Lately I've been having trouble with systems locking up hard that >>> are running an nvidia card using the 190.42 or 195.36.15 proprietary >>> drivers. Dual monitors connected via DVI, twinview. >>> >>> I've tried a GeForce 9600GT as well as a Quadro NVS 290 with varying >>> success. The Quadro seems to have lasted about a month before locking, >>> while the 9600GT is much more often, daily/weekly. I'm running the stock >>> 5.2 kernel (2.6.18-92.1.6.el5) and Xorg (xorg-x11-server >>> 1.1.1-48.41.el5_2.1). The systems are generally idle when it happens. I'm >>> having no luck capturing log data or kdump data. >>> >>> The strange part is, having identical hardware in several locations, only >>> some experience the issue. >>> >>> Hardware: >>> Intel DG43NB motherboards (bios revision doesn't seem to matter at this >>> point, running 98,99,104, or 105) >>> ^- hardware revision is the same for all of them: AAE34877-402 >>> Areca ARC-1200 SATA RAID card (latest firmware, 1.48), running 2 320G >>> seagates >>> Additional PCI-e NIC, Intel PRO/1000, running e1000e v0.4.1.12-NAPI >>> Single stick, 1GB DDR2 (800) memory >>> PS/2 Keyboard/mouse >>> >>> I'm curious if anyone else has run into similar problems such as this, and >>> if they have found a solution. I'm looking at trying the 185.18.31 >>> drivers, which seem to be "certified" for linux by a few software vendors, >>> according to nvidia's website. >>> >>> What driver versions and/or card make/models are people using successfully? >>> Any help or pointers are greatly appreciated. >>> >>> As I said, not all of them are misbehaving, and I have several with the >>> same config minus the video card running fine on SL5.2 and Windows XP Pro >>> (SP3). >>> >>> Getting desperate, >>> Mark >>> >>> -- >>> Mr. Mark V. Stodola >>> Digital Systems Engineer >>> >>> National Electrostatics Corp. >>> P.O. Box 620310 >>> Middleton, WI 53562-0310 USA >>> Phone: (608) 831-7600 >>> Fax: (608) 831-9591 >>> >> >> > > > -- > Mr. Mark V. Stodola > Digital Systems Engineer > > National Electrostatics Corp. > P.O. Box 620310 > Middleton, WI 53562-0310 USA > Phone: (608) 831-7600 > Fax: (608) 831-9591 > -- Sergio Ballestrero - http://physics.uj.ac.za/psiwiki/Ballestrero University of Johannesburg, Physics Department ATLAS TDAQ sysadmin group - Office:75240 OnCall:164851
Re: Nvidia woes...
Sergio, I haven't noticed any memory leaks, but I also haven't been actively hunting them down. There don't seem to be any signs of dwindling performance before this happens. Most times, it is just idling overnight. At most, there is a small amount of network traffic on an isolated LAN of no more than 5 or so devices, mostly Win XP or SL5.2 systems (often running off a custom livecd based on Urs' scripts). What card/config/drivers are you running? Cheers, Mark Sergio Ballestrero wrote: Hello Mark, we are having problems with X11 slowly leaking memory, which then leads to a system crash. Do you see anything similar? My attempts at using valgrind have been inconclusive (if not confusing) up to now... Cheers, Sergio On 27 Apr 2010, at 22:54, Mark Stodola wrote: Hey everyone, I currently have deployed a number of SL 5.2 i386 machines. Due to the circumstances, I'm not in a position to upgrade them to the latest 5.x with ease. Lately I've been having trouble with systems locking up hard that are running an nvidia card using the 190.42 or 195.36.15 proprietary drivers. Dual monitors connected via DVI, twinview. I've tried a GeForce 9600GT as well as a Quadro NVS 290 with varying success. The Quadro seems to have lasted about a month before locking, while the 9600GT is much more often, daily/weekly. I'm running the stock 5.2 kernel (2.6.18-92.1.6.el5) and Xorg (xorg-x11-server 1.1.1-48.41.el5_2.1). The systems are generally idle when it happens. I'm having no luck capturing log data or kdump data. The strange part is, having identical hardware in several locations, only some experience the issue. Hardware: Intel DG43NB motherboards (bios revision doesn't seem to matter at this point, running 98,99,104, or 105) ^- hardware revision is the same for all of them: AAE34877-402 Areca ARC-1200 SATA RAID card (latest firmware, 1.48), running 2 320G seagates Additional PCI-e NIC, Intel PRO/1000, running e1000e v0.4.1.12-NAPI Single stick, 1GB DDR2 (800) memory PS/2 Keyboard/mouse I'm curious if anyone else has run into similar problems such as this, and if they have found a solution. I'm looking at trying the 185.18.31 drivers, which seem to be "certified" for linux by a few software vendors, according to nvidia's website. What driver versions and/or card make/models are people using successfully? Any help or pointers are greatly appreciated. As I said, not all of them are misbehaving, and I have several with the same config minus the video card running fine on SL5.2 and Windows XP Pro (SP3). Getting desperate, Mark -- Mr. Mark V. Stodola Digital Systems Engineer National Electrostatics Corp. P.O. Box 620310 Middleton, WI 53562-0310 USA Phone: (608) 831-7600 Fax: (608) 831-9591 -- Mr. Mark V. Stodola Digital Systems Engineer National Electrostatics Corp. P.O. Box 620310 Middleton, WI 53562-0310 USA Phone: (608) 831-7600 Fax: (608) 831-9591
Re: Nvidia woes...
Hello Mark, we are having problems with X11 slowly leaking memory, which then leads to a system crash. Do you see anything similar? My attempts at using valgrind have been inconclusive (if not confusing) up to now... Cheers, Sergio On 27 Apr 2010, at 22:54, Mark Stodola wrote: > Hey everyone, > > I currently have deployed a number of SL 5.2 i386 machines. Due to the > circumstances, I'm not in a position to upgrade them to the latest 5.x with > ease. Lately I've been having trouble with systems locking up hard that are > running an nvidia card using the 190.42 or 195.36.15 proprietary drivers. > Dual monitors connected via DVI, twinview. > > I've tried a GeForce 9600GT as well as a Quadro NVS 290 with varying success. > The Quadro seems to have lasted about a month before locking, while the > 9600GT is much more often, daily/weekly. I'm running the stock 5.2 kernel > (2.6.18-92.1.6.el5) and Xorg (xorg-x11-server 1.1.1-48.41.el5_2.1). The > systems are generally idle when it happens. I'm having no luck capturing log > data or kdump data. > > The strange part is, having identical hardware in several locations, only > some experience the issue. > > Hardware: > Intel DG43NB motherboards (bios revision doesn't seem to matter at this > point, running 98,99,104, or 105) > ^- hardware revision is the same for all of them: AAE34877-402 > Areca ARC-1200 SATA RAID card (latest firmware, 1.48), running 2 320G seagates > Additional PCI-e NIC, Intel PRO/1000, running e1000e v0.4.1.12-NAPI > Single stick, 1GB DDR2 (800) memory > PS/2 Keyboard/mouse > > I'm curious if anyone else has run into similar problems such as this, and if > they have found a solution. I'm looking at trying the 185.18.31 drivers, > which seem to be "certified" for linux by a few software vendors, according > to nvidia's website. > > What driver versions and/or card make/models are people using successfully? > Any help or pointers are greatly appreciated. > > As I said, not all of them are misbehaving, and I have several with the same > config minus the video card running fine on SL5.2 and Windows XP Pro (SP3). > > Getting desperate, > Mark > > -- > Mr. Mark V. Stodola > Digital Systems Engineer > > National Electrostatics Corp. > P.O. Box 620310 > Middleton, WI 53562-0310 USA > Phone: (608) 831-7600 > Fax: (608) 831-9591 -- Sergio Ballestrero - http://physics.uj.ac.za/psiwiki/Ballestrero University of Johannesburg, Physics Department ATLAS TDAQ sysadmin group - Office:75240 OnCall:164851
Nvidia woes...
Hey everyone, I currently have deployed a number of SL 5.2 i386 machines. Due to the circumstances, I'm not in a position to upgrade them to the latest 5.x with ease. Lately I've been having trouble with systems locking up hard that are running an nvidia card using the 190.42 or 195.36.15 proprietary drivers. Dual monitors connected via DVI, twinview. I've tried a GeForce 9600GT as well as a Quadro NVS 290 with varying success. The Quadro seems to have lasted about a month before locking, while the 9600GT is much more often, daily/weekly. I'm running the stock 5.2 kernel (2.6.18-92.1.6.el5) and Xorg (xorg-x11-server 1.1.1-48.41.el5_2.1). The systems are generally idle when it happens. I'm having no luck capturing log data or kdump data. The strange part is, having identical hardware in several locations, only some experience the issue. Hardware: Intel DG43NB motherboards (bios revision doesn't seem to matter at this point, running 98,99,104, or 105) ^- hardware revision is the same for all of them: AAE34877-402 Areca ARC-1200 SATA RAID card (latest firmware, 1.48), running 2 320G seagates Additional PCI-e NIC, Intel PRO/1000, running e1000e v0.4.1.12-NAPI Single stick, 1GB DDR2 (800) memory PS/2 Keyboard/mouse I'm curious if anyone else has run into similar problems such as this, and if they have found a solution. I'm looking at trying the 185.18.31 drivers, which seem to be "certified" for linux by a few software vendors, according to nvidia's website. What driver versions and/or card make/models are people using successfully? Any help or pointers are greatly appreciated. As I said, not all of them are misbehaving, and I have several with the same config minus the video card running fine on SL5.2 and Windows XP Pro (SP3). Getting desperate, Mark -- Mr. Mark V. Stodola Digital Systems Engineer National Electrostatics Corp. P.O. Box 620310 Middleton, WI 53562-0310 USA Phone: (608) 831-7600 Fax: (608) 831-9591