Re: Datafile Corruption

2019-08-14 Thread Patrick McFadin
If you hadn't mentioned the fact you are using physical disk I would have
guessed you were using virtual disks on a SAN. I've seen this sort of thing
happen a lot there. Are there any virtual layers between the cassandra
process and the hardware? Just a reminder, fsync can be a liar and the
virtual layer can mock the response back to user land while the actual bits
can be dropped before hitting the disk.

If not, you should be looking hard at your disk options. fstab, schedulers,
etc. In that case, you need this:
https://tobert.github.io/pages/als-cassandra-21-tuning-guide.html


Patrick

On Wed, Aug 14, 2019 at 2:03 PM Forkalsrud, Erik  wrote:

> The dmesg command will usually show information about hardware errors.
>
> An example from a spinning disk:
> sd 0:0:10:0: [sdi] Unhandled sense code
> sd 0:0:10:0: [sdi] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> sd 0:0:10:0: [sdi] Sense Key : Medium Error [current]
> Info fld=0x6fc72
> sd 0:0:10:0: [sdi] Add. Sense: Unrecovered read error
> sd 0:0:10:0: [sdi] CDB: Read(10): 28 00 00 06 fc 70 00 00 08 00
>
>
> Also, you can read the file like
> "cat  /data/ssd2/data/KeyspaceMetadata/x-x/lb-26203-big-Data.db >
> /dev/null"
> If you get an error message, it's probably a hardware issue.
>
> - Erik -
>
> --
> *From:* Philip Ó Condúin 
> *Sent:* Thursday, August 8, 2019 09:58
> *To:* user@cassandra.apache.org 
> *Subject:* Re: Datafile Corruption
>
> Hi Jon,
>
> Good question, I'm not sure if we're using NVMe, I don't see /dev/nvme but
> we could still be using it.
> We using *Cisco UCS C220 M4 SFF* so I'm just going to check the spec.
>
> Our Kernal is the following, we're using REDHAT so I'm told we can't
> upgrade the version until the next major release anyway.
> root@cass 0 17:32:28 ~ # uname -r
> 3.10.0-957.5.1.el7.x86_64
>
> Cheers,
> Phil
>
> On Thu, 8 Aug 2019 at 17:35, Jon Haddad  wrote:
>
> Any chance you're using NVMe with an older Linux kernel?  I've seen a
> *lot* filesystem errors from using older CentOS versions.  You'll want to
> be using a version > 4.15.
>
> On Thu, Aug 8, 2019 at 9:31 AM Philip Ó Condúin 
> wrote:
>
> *@Jeff *- If it was hardware that would explain it all, but do you think
> it's possible to have every server in the cluster with a hardware issue?
> The data is sensitive and the customer would lose their mind if I sent it
> off-site which is a pity cause I could really do with the help.
> The corruption is occurring irregularly on every server and instance and
> column family in the cluster.  Out of 72 instances, we are getting maybe 10
> corrupt files per day.
> We are using vnodes (256) and it is happening in both DC's
>
> *@Asad *- internode compression is set to ALL on every server.  I have
> checked the packets for the private interconnect and I can't see any
> dropped packets, there are dropped packets for other interfaces, but not
> for the private ones, I will get the network team to double-check this.
> The corruption is only on the application schema, we are not getting
> corruption on any system or cass keyspaces.  Corruption is happening in
> both DC's.  We are getting corruption for the 1 application schema we have
> across all tables in the keyspace, it's not limited to one table.
> Im not sure why the app team decided to not use default compression, I
> must ask them.
>
>
>
> I have been checking the /var/log/messages today going back a few weeks
> and can see a serious amount of broken pipe errors across all servers and
> instances.
> Here is a snippet from one server but most pipe errors are similar:
>
> Jul  9 03:00:08  cassandra: INFO  02:00:08 Writing
> Memtable-sstable_activity@1126262628(43.631KiB serialized bytes, 18072
> ops, 0%/0% of on/off-heap limit)
> Jul  9 03:00:13  kernel: fnic_handle_fip_timer: 8 callbacks suppressed
> Jul  9 03:00:19  kernel: fnic_handle_fip_timer: 8 callbacks suppressed
> Jul  9 03:00:22  cassandra: ERROR 02:00:22 Got an IOException during write!
> Jul  9 03:00:22  cassandra: java.io.IOException: Broken pipe
> Jul  9 03:00:22  cassandra: at sun.nio.ch.FileDispatcherImpl.write0(Native
> Method) ~[na:1.8.0_172]
> Jul  9 03:00:22  cassandra: at
> sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) ~[na:1.8.0_172]
> Jul  9 03:00:22  cassandra: at
> sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) ~[na:1.8.0_172]
> Jul  9 03:00:22  cassandra: at sun.nio.ch.IOUtil.write(IOUtil.java:65)
> ~[na:1.8.0_172]
> Jul  9 03:00:22  cassandra: at
> sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471)
> ~[na:1.8.0_172]
> Jul  9 03:00:22  cassandra: at
> org.apache.thrift.transport.TNonblockingSocket.write(TNonblockingSocket.java:165)
> ~[libthrift-0.9.2.jar:0.9.2]
> Jul  9 03:00:22  cassandra: at
> com.thinkaurelius.thrift.util.mem.Buffer.writeTo(Buffer.java:104)
> ~[thrift-server-0.3.7.jar:na]
> Jul  9 03:00:22  cassandra: at
> 

Re: Datafile Corruption

2019-08-14 Thread Forkalsrud, Erik
The dmesg command will usually show information about hardware errors.

An example from a spinning disk:
sd 0:0:10:0: [sdi] Unhandled sense code
sd 0:0:10:0: [sdi] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 0:0:10:0: [sdi] Sense Key : Medium Error [current]
Info fld=0x6fc72
sd 0:0:10:0: [sdi] Add. Sense: Unrecovered read error
sd 0:0:10:0: [sdi] CDB: Read(10): 28 00 00 06 fc 70 00 00 08 00


Also, you can read the file like  "cat  
/data/ssd2/data/KeyspaceMetadata/x-x/lb-26203-big-Data.db > /dev/null"
If you get an error message, it's probably a hardware issue.

- Erik -


From: Philip Ó Condúin 
Sent: Thursday, August 8, 2019 09:58
To: user@cassandra.apache.org 
Subject: Re: Datafile Corruption

Hi Jon,

Good question, I'm not sure if we're using NVMe, I don't see /dev/nvme but we 
could still be using it.
We using Cisco UCS C220 M4 SFF so I'm just going to check the spec.

Our Kernal is the following, we're using REDHAT so I'm told we can't upgrade 
the version until the next major release anyway.
root@cass 0 17:32:28 ~ # uname -r
3.10.0-957.5.1.el7.x86_64

Cheers,
Phil

On Thu, 8 Aug 2019 at 17:35, Jon Haddad 
mailto:j...@jonhaddad.com>> wrote:
Any chance you're using NVMe with an older Linux kernel?  I've seen a *lot* 
filesystem errors from using older CentOS versions.  You'll want to be using a 
version > 4.15.

On Thu, Aug 8, 2019 at 9:31 AM Philip Ó Condúin 
mailto:philipocond...@gmail.com>> wrote:
@Jeff - If it was hardware that would explain it all, but do you think it's 
possible to have every server in the cluster with a hardware issue?
The data is sensitive and the customer would lose their mind if I sent it 
off-site which is a pity cause I could really do with the help.
The corruption is occurring irregularly on every server and instance and column 
family in the cluster.  Out of 72 instances, we are getting maybe 10 corrupt 
files per day.
We are using vnodes (256) and it is happening in both DC's

@Asad - internode compression is set to ALL on every server.  I have checked 
the packets for the private interconnect and I can't see any dropped packets, 
there are dropped packets for other interfaces, but not for the private ones, I 
will get the network team to double-check this.
The corruption is only on the application schema, we are not getting corruption 
on any system or cass keyspaces.  Corruption is happening in both DC's.  We are 
getting corruption for the 1 application schema we have across all tables in 
the keyspace, it's not limited to one table.
Im not sure why the app team decided to not use default compression, I must ask 
them.



I have been checking the /var/log/messages today going back a few weeks and can 
see a serious amount of broken pipe errors across all servers and instances.
Here is a snippet from one server but most pipe errors are similar:

Jul  9 03:00:08  cassandra: INFO  02:00:08 Writing 
Memtable-sstable_activity@1126262628(43.631KiB serialized bytes, 18072 ops, 
0%/0% of on/off-heap limit)
Jul  9 03:00:13  kernel: fnic_handle_fip_timer: 8 callbacks suppressed
Jul  9 03:00:19  kernel: fnic_handle_fip_timer: 8 callbacks suppressed
Jul  9 03:00:22  cassandra: ERROR 02:00:22 Got an IOException during write!
Jul  9 03:00:22  cassandra: java.io.IOException: Broken pipe
Jul  9 03:00:22  cassandra: at sun.nio.ch.FileDispatcherImpl.write0(Native 
Method) ~[na:1.8.0_172]
Jul  9 03:00:22  cassandra: at 
sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) ~[na:1.8.0_172]
Jul  9 03:00:22  cassandra: at 
sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) ~[na:1.8.0_172]
Jul  9 03:00:22  cassandra: at sun.nio.ch.IOUtil.write(IOUtil.java:65) 
~[na:1.8.0_172]
Jul  9 03:00:22  cassandra: at 
sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471) ~[na:1.8.0_172]
Jul  9 03:00:22  cassandra: at 
org.apache.thrift.transport.TNonblockingSocket.write(TNonblockingSocket.java:165)
 ~[libthrift-0.9.2.jar:0.9.2]
Jul  9 03:00:22  cassandra: at 
com.thinkaurelius.thrift.util.mem.Buffer.writeTo(Buffer.java:104) 
~[thrift-server-0.3.7.jar:na]
Jul  9 03:00:22  cassandra: at 
com.thinkaurelius.thrift.util.mem.FastMemoryOutputTransport.streamTo(FastMemoryOutputTransport.java:112)
 ~[thrift-server-0.3.7.jar:na]
Jul  9 03:00:22  cassandra: at 
com.thinkaurelius.thrift.Message.write(Message.java:222) 
~[thrift-server-0.3.7.jar:na]
Jul  9 03:00:22  cassandra: at 
com.thinkaurelius.thrift.TDisruptorServer$SelectorThread.handleWrite(TDisruptorServer.java:598)
 [thrift-server-0.3.7.jar:na]
Jul  9 03:00:22  cassandra: at 
com.thinkaurelius.thrift.TDisruptorServer$SelectorThread.processKey(TDisruptorServer.java:569)
 [thrift-server-0.3.7.jar:na]
Jul  9 03:00:22  cassandra: at 
com.thinkaurelius.thrift.TDisruptorServer$AbstractSelectorThread.select(TDisruptorServer.java:423)
 [thrift-server-0.3.7.jar:na]
Jul  9 03:00:22  cassandra: at