On Fri, Aug 12, 2016 at 11:34 AM, Adrian Klaver <adrian.kla...@aklaver.com> wrote:
> On 08/12/2016 08:30 AM, Ioana Danes wrote: > >> >> >> On Fri, Aug 12, 2016 at 11:26 AM, Adrian Klaver >> <adrian.kla...@aklaver.com <mailto:adrian.kla...@aklaver.com>> wrote: >> >> On 08/12/2016 08:10 AM, Ioana Danes wrote: >> >> >> >> On Fri, Aug 12, 2016 at 10:47 AM, Francisco Olarte >> <fola...@peoplecall.com <mailto:fola...@peoplecall.com> >> <mailto:fola...@peoplecall.com <mailto:fola...@peoplecall.com>>> >> wrote: >> >> CCing to the list... >> >> Thanks >> >> >> On Fri, Aug 12, 2016 at 4:10 PM, Ioana Danes >> <ioanada...@gmail.com <mailto:ioanada...@gmail.com> >> <mailto:ioanada...@gmail.com <mailto:ioanada...@gmail.com>>> >> wrote: >> >> given 318220 and 318216 are just a bit away ( 4db08/4db0c >> ), and it >> >> repeats sporadically, have you ruled out ( by having page >> checksums or >> >> other mechanism ) a potential disk read/write error ? >> >> >> >> >> >> > Also the index is correct on db3 as the record in case >> (with >> drawid = >> >> > 318216) is retrieved if I filter by drawid = 318220 >> >> >> >> Specially if this happens, you may have some slightly bad >> disks/ram/ >> >> leading to this kind of problems. >> >> >> > >> > Could be. I also had some issues with an rsync between db3 >> and >> drdb a week >> > ago that did not complete for bigger files (> 200MB) and >> gave me some >> > corruption messages. Then the system was revbooted and >> everything >> seemed >> > fine but apparently it is not. >> > I am planning to drop & create the table from a good >> backup and if >> that does >> > not fix the issue then I will rebuild the server. >> >> I would check whatever logs you can ( syslog or eventlog, >> smart log, >> etc.. ) hunting for disk errors ( sometimes they are >> reported ). This >> kind of problems, with programs as tested as postgres and >> rsync, tend >> to indicate controller/RAM/disk going bad ( in your case it >> could be >> caused by a single bit getting flipped in a sector for the >> data >> portion of the table, and not being propagated either because >> it >> happened after your sync of drdb or because it was synced >> from the WAL >> and not the table, or because it was read from the disk cache >> ). >> >> I agree, unfortunately I did not find any clues about corruption >> or any >> anomalies in the logs. >> I will work tonight to rebuild that table and see where I go >> from there. >> >> >> The db3 database is on a different machine from all the other >> databases you set up, correct? >> >> Yes, they are all different vms first 3 dbs are on the same cluster but >> drdb is a remote machine, >> > > Aah, another player in the mix. > > What virtualization technology are you using? > kvm > > >> Thank you >> >> >> >> Thanks, >> ioana >> >> Francisco Olarte. >> >> >> >> >> -- >> Adrian Klaver >> adrian.kla...@aklaver.com <mailto:adrian.kla...@aklaver.com> >> >> >> > > -- > Adrian Klaver > adrian.kla...@aklaver.com >