If you hadn't mentioned the fact you are using physical disk I would have
guessed you were using virtual disks on a SAN. I've seen this sort of thing
happen a lot there. Are there any virtual layers between the cassandra
process and the hardware? Just a reminder, fsync can be a liar and the
virtual layer can mock the response back to user land while the actual bits
can be dropped before hitting the disk.
If not, you should be looking hard at your disk options. fstab, schedulers,
etc. In that case, you need this:
https://tobert.github.io/pages/als-cassandra-21-tuning-guide.html
Patrick
On Wed, Aug 14, 2019 at 2:03 PM Forkalsrud, Erik wrote:
> The dmesg command will usually show information about hardware errors.
>
> An example from a spinning disk:
> sd 0:0:10:0: [sdi] Unhandled sense code
> sd 0:0:10:0: [sdi] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> sd 0:0:10:0: [sdi] Sense Key : Medium Error [current]
> Info fld=0x6fc72
> sd 0:0:10:0: [sdi] Add. Sense: Unrecovered read error
> sd 0:0:10:0: [sdi] CDB: Read(10): 28 00 00 06 fc 70 00 00 08 00
>
>
> Also, you can read the file like
> "cat /data/ssd2/data/KeyspaceMetadata/x-x/lb-26203-big-Data.db >
> /dev/null"
> If you get an error message, it's probably a hardware issue.
>
> - Erik -
>
> --
> *From:* Philip Ó Condúin
> *Sent:* Thursday, August 8, 2019 09:58
> *To:* user@cassandra.apache.org
> *Subject:* Re: Datafile Corruption
>
> Hi Jon,
>
> Good question, I'm not sure if we're using NVMe, I don't see /dev/nvme but
> we could still be using it.
> We using *Cisco UCS C220 M4 SFF* so I'm just going to check the spec.
>
> Our Kernal is the following, we're using REDHAT so I'm told we can't
> upgrade the version until the next major release anyway.
> root@cass 0 17:32:28 ~ # uname -r
> 3.10.0-957.5.1.el7.x86_64
>
> Cheers,
> Phil
>
> On Thu, 8 Aug 2019 at 17:35, Jon Haddad wrote:
>
> Any chance you're using NVMe with an older Linux kernel? I've seen a
> *lot* filesystem errors from using older CentOS versions. You'll want to
> be using a version > 4.15.
>
> On Thu, Aug 8, 2019 at 9:31 AM Philip Ó Condúin
> wrote:
>
> *@Jeff *- If it was hardware that would explain it all, but do you think
> it's possible to have every server in the cluster with a hardware issue?
> The data is sensitive and the customer would lose their mind if I sent it
> off-site which is a pity cause I could really do with the help.
> The corruption is occurring irregularly on every server and instance and
> column family in the cluster. Out of 72 instances, we are getting maybe 10
> corrupt files per day.
> We are using vnodes (256) and it is happening in both DC's
>
> *@Asad *- internode compression is set to ALL on every server. I have
> checked the packets for the private interconnect and I can't see any
> dropped packets, there are dropped packets for other interfaces, but not
> for the private ones, I will get the network team to double-check this.
> The corruption is only on the application schema, we are not getting
> corruption on any system or cass keyspaces. Corruption is happening in
> both DC's. We are getting corruption for the 1 application schema we have
> across all tables in the keyspace, it's not limited to one table.
> Im not sure why the app team decided to not use default compression, I
> must ask them.
>
>
>
> I have been checking the /var/log/messages today going back a few weeks
> and can see a serious amount of broken pipe errors across all servers and
> instances.
> Here is a snippet from one server but most pipe errors are similar:
>
> Jul 9 03:00:08 cassandra: INFO 02:00:08 Writing
> Memtable-sstable_activity@1126262628(43.631KiB serialized bytes, 18072
> ops, 0%/0% of on/off-heap limit)
> Jul 9 03:00:13 kernel: fnic_handle_fip_timer: 8 callbacks suppressed
> Jul 9 03:00:19 kernel: fnic_handle_fip_timer: 8 callbacks suppressed
> Jul 9 03:00:22 cassandra: ERROR 02:00:22 Got an IOException during write!
> Jul 9 03:00:22 cassandra: java.io.IOException: Broken pipe
> Jul 9 03:00:22 cassandra: at sun.nio.ch.FileDispatcherImpl.write0(Native
> Method) ~[na:1.8.0_172]
> Jul 9 03:00:22 cassandra: at
> sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) ~[na:1.8.0_172]
> Jul 9 03:00:22 cassandra: at
> sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) ~[na:1.8.0_172]
> Jul 9 03:00:22 cassandra: at sun.nio.ch.IOUtil.write(IOUtil.java:65)
> ~[na:1.8.0_172]
> Jul 9 03:00:22 cassandra: at
> sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471)
> ~[na:1.8.0_172]
> Jul 9 03:00:22 cassandra: at
> org.apache.thrift.transport.TNonblockingSocket.write(TNonblockingSocket.java:165)
> ~[libthrift-0.9.2.jar:0.9.2]
> Jul 9 03:00:22 cassandra: at
> com.thinkaurelius.thrift.util.mem.Buffer.writeTo(Buffer.java:104)
> ~[thrift-server-0.3.7.jar:na]
> Jul 9 03:00:22 cassandra: at
>