RE: NVMe - SSD shredding due to Lucene :-)

Uwe Schindler Sat, 31 Aug 2019 14:34:13 -0700

Hi Mike,

you are right we have the special NIO.2 filesystem that makes fsync a no-op in 
90% of all cases. This works fine with Lucene, but as Solr does not use the 
virtual filesystem and instead just copies the path name of the temp directory 
as a string and puts it into the default directory factory through its 
solrconfig.xml file, there is no way to capture fsyncs, as Solr uses plain 
default filesystem.

We should work on a solution for this, as it may speed up tests dramatically.

In the meantime I did “apt install eatmydata” 
(http://manpages.ubuntu.com/manpages/bionic/man1/eatmydata.1.html 
<http://manpages.ubuntu.com/manpages/trusty/man1/eatmydata.1.html> ). This 
makes it easy to hide all fsyncs. We can just add this to Jenkins config for 
new jobs in the job environment plugin, so all jenkins jobs don’t fsync:

LD_PRELOAD=libeatmydata.so

This trick may be interesting for others, too. Steve Rowe?

To test the difference, I will now run the jenkins server for a day, measure 
number of reads/writes from smart output and then enable this for the linux 
jobs (it’s easy in the groovy file that selects the random JVM).

The VMs for Windows, Mac, Solaris have the virtual disk already configured to 
ignore any device syncs.

Uwe

-----

Uwe Schindler

Achterdiek 19, D-28357 Bremen

https://www.thetaphi.de

eMail: u...@thetaphi.de

From: Michael McCandless <luc...@mikemccandless.com> 
Sent: Saturday, August 31, 2019 1:32 PM
To: Lucene/Solr dev <dev@lucene.apache.org>
Subject: Re: NVMe - SSD shredding due to Lucene :-)

Nice to know :)  Thanks for upgrading Uwe.

I thought we randomly disable fsync in tests just to protect our precious SSDs?

Mike McCandless

http://blog.mikemccandless.com

On Sat, Aug 31, 2019 at 6:20 AM Uwe Schindler <u...@thetaphi.de 
<mailto:u...@thetaphi.de> > wrote:

Hi all,

I just wanted to inform you that I asked the provider of the Policeman Jenkins 
Server to replace the first of two NVMe SSDs, because it failed with fatal 
warnings due to too many writes and no more spare sectors:

> root@serv1 ~ # nvme smart-log /dev/nvme0
> Smart Log for NVME device:nvme0 namespace-id:ffffffff
> critical_warning                    : 0x1
> temperature                         : 76 C
> available_spare                     : 2%
> available_spare_threshold           : 10%
> percentage_used                     : 67%
> data_units_read                     : 62,129,054
> data_units_written                  : 648,788,135
> host_read_commands                  : 6,426,997,226
> host_write_commands                 : 5,582,107,803
> controller_busy_time                : 86,754
> power_cycles                        : 21
> power_on_hours                      : 20,252
> unsafe_shutdowns                    : 16
> media_errors                        : 0
> num_err_log_entries                 : 0
> Warning Temperature Time            : 7855
> Critical Composite Temperature Time : 0
> Temperature Sensor 1                : 76 C
> Thermal Management T1 Trans Count   : 0
> Thermal Management T2 Trans Count   : 0
> Thermal Management T1 Total Time    : 0
> Thermal Management T2 Total Time    : 0

The second one looks a bit better, but will be changed later, too. I have no 
idea what a data unit is (512 bytes, 2048 bytes,... - I think one LBA).

So we are really shredding SSDs with Lucene tests 😊

Uwe

P.S.: The replacement is currently going on...
-----
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail: u...@thetaphi.de <mailto:u...@thetaphi.de> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org 
<mailto:dev-unsubscr...@lucene.apache.org> 
For additional commands, e-mail: dev-h...@lucene.apache.org 
<mailto:dev-h...@lucene.apache.org>

RE: NVMe - SSD shredding due to Lucene :-)

Reply via email to