Hi, memtest86+ from memtest.org will detect most common memory issues - though it may need to run for a couple of days. Since everything used to work fine, maybe it is a good idea to focus on the new hardware. It is not unusual for brand new equipment to be faulty.
Cheers, Dimitris On Thu, Oct 2, 2014 at 11:17 PM, Jörg Saßmannshausen < [email protected]> wrote: > Hi Dimitris > > thanks for the feedback. > > I can rule out the front end as I was using that with a different disc > array > without any problems. So I am somewhat confident that the front end and the > controller are ok. > > As for the disc array: I got a new controller here so one would assume that > one is working ok. I am in touch with the manufacturer to see if there is a > problem with that. > > I done some stress testing in terms of copying the files over from the old > server to the new server and I did not see any problems here when I was > using > a test board, i.e. a different front end with a different controller. > > Having said that: I cannot really rule out that the controller I am > currently > using might have a problem as: it is a dual controller (two scsi > connections) > and one of the boxes which was connected there had a slower transfer rate. > What I do not know is whether then the controller is stepping down and > hence > any problems will be masked due to the slower transfer rate. > > Unfortunately, like so often, the hardware is in use and needed so I cannot > take it offline too often and then hamper people's work. > > Talking about memtest: which one do you suggest? memtest or memtester? I > have > heard different opinions about them. > > All the best from a mild London > > Jörg > > On Donnerstag 02 Oktober 2014 you wrote: > > Hello, > > > > RAM somewhere could also be faulty. Have a look at the logs for any ECC > > errors (both system memory and RAID controller) and memtest the boxes > > involved for a couple of days. I would suggest some stress testing of the > > new server if not done already. > > > > Best regards, > > > > Dimitris > > > > > > > > On Sun, Sep 21, 2014 at 3:22 PM, Jörg Saßmannshausen < > > > > [email protected]> wrote: > > > Dear all, > > > > > > I got a rather strange problem with one of my file servers which I > > > recently have upgraded in order to accommodate more disc space. > > > > > > The problem: I have copies the files from the old file space to a > > > temporary disc > > > storage space using this rsync command: > > > > > > rsync -vrltH -pgo --stats -D --numeric-ids -x oldserver:foo > > > tempspace:baa > > > > > > I am doing this now for some years and never had any problems. > > > > > > As always, I am running md5sum afterwards to be sure ther is not a > > > problem later and the user is loosing data. This time around a rather > > > large file (around 16 GB) the md5sum failed after I moved the files > from > > > the temp space > > > back to the new destination using the same command as above. > > > > > > Having still access to the old file space, I decided to move this file > > > from the > > > old file space. Strangely enough, rsync does not sync the file again > so I > > > had to > > > delete the file. Even after deleting the file and re-sync it from the > old > > > source, the md5sum is wrong. > > > > > > Copying the file to a different file space did not cause these problem, > > > i.e. the > > > md5sum is correct. > > > As it is a tar.gz file, I simply decided to decompress the original > file > > > on the > > > different file server. That worked. The file where the md5sum is wrong > > > did not > > > decompress on the different file server but crashed with an error > message > > > when I > > > executed gunzip. So the file is broken. > > > > > > The setup: > > > > > > Originally I was using an old Infortrand box which had old PATA discs > in > > > it. > > > This box is connected via scsi to a frontend server which exports the > > > file space via iscsi. The backend for that, i.e. the one the user is > > > accessing is > > > on a different physical machine and it is a XEN guest. The reason > behind > > > that > > > setting is as the frontend is acting as a backup server and I don't > want > > > people to have access to it. > > > I then exchanged the Infortrend box with a more recent model which got > > > SATA capeabilities but still got scsi connection to the frontend. The > > > frontend is > > > the same. I got a new controller for that box as the old one was > broken. > > > There is no changes in the backend, that is still the same XEN guest on > > > the same hardware. > > > > > > What I cannot work out is why the old Infortrend box does not have any > > > problems with the new file, the newer one has a problem here. Also, > when > > > I have > > > copied over some files (again using the rsync command above) a few > files > > > did not > > > copy correctly (again md5sum) in the first instance but done so later. > > > > > > I find that highly alarming as that means that at least for larger > and/or > > > some > > > binary files there seems to be a problem. However, I am not sure there > to > > > look > > > at it as I am out of ideas. > > > > > > Could it be there is a problem with the 'new' controller? > > > In all cases I was using ext4 as a file system and I did not have any > > > problems > > > with that. > > > > > > Anybody got some sentiments here? > > > > > > All the best from a sunny London > > > > > > Jörg > > > > > > P.S. To make things worse I am off on a work related trip from Monday > > > onwards > > > and I am working on that problem since Friday evening. > > > > > > > > > > > > -- > > > ************************************************************* > > > Dr. Jörg Saßmannshausen, MRSC > > > University College London > > > Department of Chemistry > > > Gordon Street > > > London > > > WC1H 0AJ > > > > > > email: [email protected] > > > web: http://sassy.formativ.net > > > > > > Please avoid sending me Word or PowerPoint attachments. > > > See http://www.gnu.org/philosophy/no-word-attachments.html > > > > > > _______________________________________________ > > > Beowulf mailing list, [email protected] sponsored by Penguin > Computing > > > To change your subscription (digest mode or unsubscribe) visit > > > http://www.beowulf.org/mailman/listinfo/beowulf > > > -- > ************************************************************* > Dr. Jörg Saßmannshausen, MRSC > University College London > Department of Chemistry > Gordon Street > London > WC1H 0AJ > > email: [email protected] > web: http://sassy.formativ.net > > Please avoid sending me Word or PowerPoint attachments. > See http://www.gnu.org/philosophy/no-word-attachments.html > > _______________________________________________ > Beowulf mailing list, [email protected] sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > >
_______________________________________________ Beowulf mailing list, [email protected] sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
