i added the mailing list to this since you did not hit reply to all and i have been the only one getting the replies. i think that is not fair and you should be allowed to contact the manufacturer directly. i did that with corsair cuz of some fault ram and im rma ing the paried set that i have back to them. in all honesty i would contact the manufacturer and bypass the vendor all together.
On Fri, Jan 16, 2009 at 10:11 PM, Francesco Pietra <[email protected]>wrote: > To conclude, as it will be uninteresting to subscribers from here on, > in Europe the customer can only contact the vendor of the Supermicro > product. That gave no useful hint and the vendor does not answer any > more. I asked which kind of test he wants to have in order to accept > the mainboard for repair and he did not answer. Therefore, it could be > a waste of time replacing the CPU (I have a spare one) unless it is > just the CPU faulty, which (I believe) it is unlikely. If I prove that > it was no faulty CPU, I could inform Beowulf and some friends here > around about that discovery, or start a legal international action. > Therefore, unless the CPU can be fully tested by software (and if > faulty be replaced), I do nothing else that looking for another > mainboard and assemble a new machine, this time for 16 logical > processors. The more I have, the faster is the work. I understand that > suggestions about the brand (obviously Supermicro is ruled out) can't > be expected here. > Thanks for all > francesco > > On Fri, Jan 16, 2009 at 8:10 PM, Jon Aquilina <[email protected]> > wrote: > > in that case you need to contact them by phone and request an rma > > > > On Fri, Jan 16, 2009 at 3:48 PM, Francesco Pietra <[email protected] > > > > wrote: > >> > >> That already tried. The slots from the bad bank are OK an another > >> motherboard. Vice versa, good slots from another mainboard do not work > >> on the bad bank. > >> > >> I am no system expert, just a chemist, but I can only figure that the > >> memory controller of the CPU is damaged. Otherwise the fault has > >> arosen in the motherboard (voltage controller or something else). > >> > >> francesco > >> > >> On Fri, Jan 16, 2009 at 10:10 AM, Jon Aquilina <[email protected]> > >> wrote: > >> > dunno bout another type of motherboard but do you have another stick > of > >> > ram > >> > you can try in those sockets instead. if so it could be that you just > >> > have > >> > bad ram. > >> > > >> > On Fri, Jan 16, 2009 at 9:46 AM, Francesco Pietra > >> > <[email protected]> > >> > wrote: > >> >> > >> >> Hi: > >> >> Running memtest86+ v. 2.11 is the first test I carried out, > repeatedly > >> >> and until completion. It did not detect the slots at the faulty bank > >> >> and did not show errors for the remaining RAM (18GB). Otherwise, the > >> >> 6GB at the faulty bank are OK. I would like to test via software the > >> >> memory controller of the CPU at the faulty bank, which I believe is > >> >> the last chance for the mainboard not being damaged. All CPUs have > >> >> correct hypertransport and I have replaced two 1GB slots with 2GB > >> >> slots. Though, the 20GB come short for some of my calculations. > >> >> > >> >> As the Supermicro mainbord is only 8 months old (during which period > >> >> it managed all 24GB RAM), I expected that Supermicro Europe takes > >> >> action in some way. They simply stopped answering after having > >> >> suggested something totally uninteresting. > >> >> > >> >> Therefore, in assembling a new 4 quad-core UMA system, I am looking > >> >> for another brand of mainboards. Suggestions? > >> >> > >> >> francesco > >> >> > >> >> On Thu, Jan 15, 2009 at 10:21 PM, Jon Aquilina < > [email protected]> > >> >> wrote: > >> >> > try running memtest+86 its a cd that you boot on to that tests the > >> >> > memory > >> >> > leave it running for a few hrs to makes sure it is the ram or > >> >> > sockets. i > >> >> > am > >> >> > not sure about how to test the cpu. > >> >> > > >> >> > On Tue, Jan 13, 2009 at 10:26 AM, Francesco Pietra > >> >> > <[email protected]> wrote: > >> >> >> > >> >> >> Hi: > >> >> >> > >> >> >> I am posting here from a suggestion on the Debian amd64 site. My > >> >> >> original posting to the mainboard factory/vendor in Europe only > >> >> >> resulted in uninteresting suggestions, and they did not answer any > >> >> >> more. > >> >> >> > >> >> >> My question is directed to the attention of users familiar with > >> >> >> multisocket UMA-type mainboards based on 875 dual opteron AMD CPU. > >> >> >> My > >> >> >> own is Supermicro H8QC8 with chipset nVidia CK804 and AMD 8132, > >> >> >> driven > >> >> >> by Debian Linux amd64 lenny. > >> >> >> > >> >> >> One of the CPUs has suddenly lost viability to its > >> >> >> 4-slots memory bank (shut down the machine in order, the problem > >> >> >> arose > >> >> >> on > >> >> >> next > >> >> >> loading Linux). Still, the CPU cores are OK, hypertransport links > >> >> >> are > >> >> >> fully working, parallelization to both Amber 10 and NWChem 5.1 is > >> >> >> fully provided, but one of the CPUs must be slower, having to > borrow > >> >> >> memory from the other > >> >> >> banks. The hardware status, after a period of complete darkness, > is > >> >> >> described in the attached lshw_deb64_7Jan2009.txt. > >> >> >> > >> >> >> As each bank of Kingston DDR1 is filled 2+2+1+1 GB, I identified > the > >> >> >> faulty bank, removed all slots from there, and replaced the 1+1 GB > >> >> >> slots at another bank with 2 + 2 GB from the faulty bank, so that > >> >> >> now > >> >> >> the computer is at 20GB. The situation is described in the > attached > >> >> >> lshw_deb64_lessCPU2_scrambling1G_2G_CPU4_7Jan2009.txt. Actually, > >> >> >> identification of the CPU (CPU2) related to the faulty mem bank is > >> >> >> insecure: I just considered the nearest CPU to the faulty bank. > The > >> >> >> manual is not helpful to this regard . > >> >> >> > >> >> >> I understand that, in order to remove non-mainboard causes, I > should > >> >> >> be certain that a CPU has not lost memory control. Since replacing > >> >> >> (I > >> >> >> have one spare second-hand CPU) or scrambling, the CPUs is quite > >> >> >> troublesome, and risky, in my context (there is very little space > >> >> >> around the mainboard in the rack that I engineered to accept the > >> >> >> mainboard). Ventilation is excellent, however. > >> >> >> > >> >> >> Therefore, is it any software way to check if the CPUs are fully > in > >> >> >> order, including the memory controller? lshw and other software > >> >> >> provided only partial help in my hands. > >> >> >> > >> >> >> Also any other suggestion would be greatly appreciated. > >> >> >> > >> >> >> Thanks for your kind attention > >> >> >> > >> >> >> francesco pietra > >> >> >> _______________________________________________ > >> >> >> Beowulf mailing list, [email protected] > >> >> >> To change your subscription (digest mode or unsubscribe) visit > >> >> >> http://www.beowulf.org/mailman/listinfo/beowulf > >> >> > > >> >> > > >> >> > > >> >> > -- > >> >> > Jonathan Aquilina > >> >> > > >> > > >> > > >> > > >> > -- > >> > Jonathan Aquilina > >> > > > > > > > > > -- > > Jonathan Aquilina > > > -- Jonathan Aquilina
_______________________________________________ Beowulf mailing list, [email protected] To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
