On Fri, Aug 12, 2016 at 11:34 AM, Adrian Klaver <adrian.kla...@aklaver.com>
wrote:

> On 08/12/2016 08:30 AM, Ioana Danes wrote:
>
>>
>>
>> On Fri, Aug 12, 2016 at 11:26 AM, Adrian Klaver
>> <adrian.kla...@aklaver.com <mailto:adrian.kla...@aklaver.com>> wrote:
>>
>>     On 08/12/2016 08:10 AM, Ioana Danes wrote:
>>
>>
>>
>>         On Fri, Aug 12, 2016 at 10:47 AM, Francisco Olarte
>>         <fola...@peoplecall.com <mailto:fola...@peoplecall.com>
>>         <mailto:fola...@peoplecall.com <mailto:fola...@peoplecall.com>>>
>>         wrote:
>>
>>             CCing to the list...
>>
>>         Thanks
>>
>>
>>             On Fri, Aug 12, 2016 at 4:10 PM, Ioana Danes
>>         <ioanada...@gmail.com <mailto:ioanada...@gmail.com>
>>             <mailto:ioanada...@gmail.com <mailto:ioanada...@gmail.com>>>
>>         wrote:
>>             >> given 318220 and 318216 are just a bit away ( 4db08/4db0c
>>         ), and it
>>             >> repeats sporadically, have you ruled out ( by having page
>>             checksums or
>>             >> other mechanism ) a potential disk read/write error ?
>>             >>
>>             >>
>>             >> > Also the index is correct on db3 as the record in case
>>         (with
>>             drawid =
>>             >> > 318216) is retrieved if I filter by drawid = 318220
>>             >>
>>             >> Specially if this happens, you may have some slightly bad
>>         disks/ram/
>>             >> leading to this kind of problems.
>>             >>
>>             >
>>             > Could be. I also had some issues with an rsync between db3
>> and
>>             drdb a week
>>             > ago that did not complete for bigger files (> 200MB) and
>>         gave me some
>>             > corruption messages. Then the system was revbooted and
>>         everything
>>             seemed
>>             > fine but apparently it is not.
>>             > I am planning to drop & create the table from a good
>>         backup and if
>>             that does
>>             > not fix the issue then I will rebuild the server.
>>
>>             I would check whatever logs you can ( syslog or eventlog,
>>         smart log,
>>             etc.. ) hunting for disk errors ( sometimes they are
>>         reported ). This
>>             kind of problems, with programs as tested as postgres and
>>         rsync, tend
>>             to indicate controller/RAM/disk going bad ( in your case it
>>         could be
>>             caused by a single bit getting flipped in a sector for the
>> data
>>             portion of the table, and not being propagated either because
>> it
>>             happened after your sync of drdb or because it was synced
>>         from the WAL
>>             and not the table, or because it was read from the disk cache
>> ).
>>
>>         I agree, unfortunately I did not find any clues about corruption
>>         or any
>>         anomalies in the logs.
>>         I will work tonight to rebuild that table and see where I go
>>         from there.
>>
>>
>>     The db3 database is on a different machine from all the other
>>     databases you set up, correct?
>>
>> Yes, they are all different vms first 3 dbs are on the same cluster but
>> drdb is a remote machine,
>>
>
> Aah, another player in the mix.
>
> What virtualization technology are you using?
>

kvm



>
>
>> Thank you
>>
>>
>>
>>         Thanks,
>>         ioana
>>
>>             Francisco Olarte.
>>
>>
>>
>>
>>     --
>>     Adrian Klaver
>>     adrian.kla...@aklaver.com <mailto:adrian.kla...@aklaver.com>
>>
>>
>>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>

Reply via email to