Re: btrfs issue with mariadb incremental backup
Hi Chris, Sorry for the misunderstanding and inappropriate attached the file (I thought that too long text in the mail message so I used the attached file). The backup script has the btrfs sync command since Aug 3 I sent the following rsync command as you said (sorry for the misunderstanding with the rsync command that I'd never used before) The snapshot has been different since snapshot mysql_201708090830 but the mariadb can recovery and start until mysql_201708110830 seems too much differences that can not start. [root@backuplogC7 ~]# more /root/script/backup/backupsnap.sh #Backup # user=$1 password=$2 basepath=$3 datet=$(date +%Y%m%d%H%M) snappath=${basepath}_${datet} echo "Locking databases ${datet}" mysql -u$user -p$password << EOF FLUSH TABLES WITH READ LOCK; system btrfs sub snap -r $basepath $snappath system btrfs sub sync $basepath UNLOCK TABLES; quit EOF echo "Databases unlocked ${datet}" [root@backuplogC7 ~]# ls -l /root/script/backup/backupsnap.sh -rwxr--r-- 1 root root 333 Aug 3 09:11 /root/script/backup/backupsnap.sh [root@backuplogC7 ~]# rsync -avnc /var/lib/mariadb/mysql_201708070830/ root@192.168.45.166:/var/lib/mariadb/mysql_201708070830/ sending incremental file list ./ sent 3773 bytes received 19 bytes 1083.43 bytes/sec total size is 718361496 speedup is 189441.32 (DRY RUN) [root@backuplogC7 ~]# rsync -avnc /var/lib/mariadb/mysql_201708080830/ root@192.168.45.166:/var/lib/mariadb/mysql_201708080830/ sending incremental file list ./ sent 3769 bytes received 19 bytes 841.78 bytes/sec total size is 718361496 speedup is 189641.37 (DRY RUN) [root@backuplogC7 ~]# rsync -avnc /var/lib/mariadb/mysql_201708090830/ root@192.168.45.166:/var/lib/mariadb/mysql_201708090830/ sending incremental file list ./ ib_logfile1 ibdata1 sent 3779 bytes received 25 bytes 1086.86 bytes/sec total size is 718361496 speedup is 188843.72 (DRY RUN) [root@backuplogC7 ~]# rsync -avnc /var/lib/mariadb/mysql_201708100830/ root@192.168.45.166:/var/lib/mariadb/mysql_201708100830/ sending incremental file list ./ ib_logfile1 ibdata1 sent 3779 bytes received 25 bytes 1086.86 bytes/sec total size is 718361496 speedup is 188843.72 (DRY RUN) [root@backuplogC7 ~]# rsync -avnc /var/lib/mariadb/mysql_201708110830/ root@192.168.45.166:/var/lib/mariadb/mysql_20170811830/ sending incremental file list created directory /var/lib/mariadb/mysql_20170811830 ./ aria_log.0001 aria_log_control ib_logfile0 ib_logfile1 ibdata1 mysql-bin.01 mysql-bin.index backups_db/ backups_db/db.opt backups_db/ints.frm backups_db/tpc_backup_detail.frm backups_db/tpc_backup_header.frm backups_db/tpc_backup_history.frm backups_db/tpc_backup_schedule_level_master.frm backups_db/tpc_backup_schedule_master.frm backups_db/tpc_backup_schedule_master_extra.frm backups_db/tpc_backup_schedule_old.frm backups_db/tpc_backup_server.frm backups_db/tpc_backup_step.frm backups_db/tpc_backup_user.frm backups_db/v_backup_header_status.frm backups_db/v_calendar.frm backups_db/v_dup_backup.frm mysql/ mysql/columns_priv.MYD mysql/columns_priv.MYI mysql/columns_priv.frm mysql/db.MYD mysql/db.MYI mysql/db.frm mysql/event.MYD mysql/event.MYI mysql/event.frm mysql/func.MYD mysql/func.MYI mysql/func.frm mysql/general_log.CSM mysql/general_log.CSV mysql/general_log.frm mysql/help_category.MYD mysql/help_category.MYI mysql/help_category.frm mysql/help_keyword.MYD mysql/help_keyword.MYI mysql/help_keyword.frm mysql/help_relation.MYD mysql/help_relation.MYI mysql/help_relation.frm mysql/help_topic.MYD mysql/help_topic.MYI mysql/help_topic.frm mysql/host.MYD mysql/host.MYI mysql/host.frm mysql/ndb_binlog_index.MYD mysql/ndb_binlog_index.MYI mysql/ndb_binlog_index.frm mysql/plugin.MYD mysql/plugin.MYI mysql/plugin.frm mysql/proc.MYD mysql/proc.MYI mysql/proc.frm mysql/procs_priv.MYD mysql/procs_priv.MYI mysql/procs_priv.frm mysql/proxies_priv.MYD mysql/proxies_priv.MYI mysql/proxies_priv.frm mysql/servers.MYD mysql/servers.MYI mysql/servers.frm mysql/slow_log.CSM mysql/slow_log.CSV mysql/slow_log.frm mysql/tables_priv.MYD mysql/tables_priv.MYI mysql/tables_priv.frm mysql/time_zone.MYD mysql/time_zone.MYI mysql/time_zone.frm mysql/time_zone_leap_second.MYD mysql/time_zone_leap_second.MYI mysql/time_zone_leap_second.frm mysql/time_zone_name.MYD mysql/time_zone_name.MYI mysql/time_zone_name.frm mysql/time_zone_transition.MYD mysql/time_zone_transition.MYI mysql/time_zone_transition.frm mysql/time_zone_transition_type.MYD mysql/time_zone_transition_type.MYI mysql/time_zone_transition_type.frm mysql/user.MYD mysql/user.MYI mysql/user.frm performance_schema/ performance_schema/cond_instances.frm performance_schema/db.opt performance_schema/events_waits_current.frm performance_schema/events_waits_history.frm performance_schema/events_waits_history_long.frm performance_schema/events_waits_summary_by_instance.frm performance_schema/events_waits_summary_by_thread_by_event_name.frm performance_schema/events_waits_summary_global_by_event_name.frm
Re: btrfs issue with mariadb incremental backup
On Fri, Aug 11, 2017 at 8:38 PM,wrote: > Hi Chris, > > I explain what I have done in the attached file. That is way too long. And please don't attach files directed at me, that's not appropriate. Just message the list normally, so it can be searched by others down the road. There is a problem with your rsync command, you really don't understand what I was asking. The whole point of rsync is to compare the origin and destination subvolumes to see if they are different from each other. You ran the command on a single subvolume on a single machine which is pointless. On machine A: $ rsync -avnc /mnt/snapshot6/ chris@192.168.1.116:/mnt/snapshot6/ chris@192.168.1.116's password: sending incremental file list sent 125 bytes received 12 bytes 39.14 bytes/sec total size is 30,686 speedup is 223.99 (DRY RUN) [chris@f26h ~]$ Because no files are listed, all are confirmed to be identical. If files are listed, there's a difference. > Please suggest me what I should do next? Check this. I wonder if you're running into this obscure bug where maybe your read only snapshot is not yet synced before it's being sent. Hence it *is* changing on origin machine, but not because of other snapshots being deleted. I have no idea if your scripting handles errors like the bogus stale NFS handle error, maybe it's possible it happens but you're not seeing it? Anyway, you might try just adding a single sync right after taking the snapshot, change nothing else and see if the problem reproduces itself. https://btrfs.wiki.kernel.org/index.php/Incremental_Backup#Initial_Bootstrapping I'm pretty sure this is fixed in newer kernels, I haven't run into it in a long time myself. But I don't know when it was fixed. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs issue with mariadb incremental backup
Hi Chris, I explain what I have done in the attached file. Please suggest me what I should do next? I plan to reproduce the data mannually for the daily inserted data to check the detail step by step to know exactly which steps make the btrfs send /receive result diff on the Second box (Machine Y). I will start with the first btrfs snapshot that I already had (mysql_201707210830). Do you think I should do others? Best Regards, Siranee Jaraswachirakul. > On Fri, Aug 11, 2017 at 12:00 AM,wrote: >> Sorry Chris, >> >> I forgot to send the diff from rsync -anc result >> >> the source container A start data as data on snapshot mysql_201708040830 >> >> [root@backuplogC7 tmp]# ls -l /var/lib/mariadb >> total 0 >> drwxrwxr-x+ 1 mysql mysql 260 Aug 4 13:10 mysql >> drwxrwxr-x+ 1 mysql mysql 260 Jul 12 08:29 mysql_201708040830 >> drwxrwxr-x+ 1 mysql mysql 260 Aug 4 13:10 mysql_201708050830 >> drwxrwxr-x+ 1 mysql mysql 260 Aug 4 13:10 mysql_201708060830 >> drwxrwxr-x+ 1 mysql mysql 260 Aug 4 13:10 mysql_201708070830 >> drwxrwxr-x+ 1 mysql mysql 260 Aug 4 13:10 mysql_201708080830 >> drwxrwxr-x+ 1 mysql mysql 260 Aug 4 13:10 mysql_201708090830 >> drwxrwxr-x+ 1 mysql mysql 260 Aug 4 13:10 mysql_201708100830 >> drwxrwxr-x+ 1 mysql mysql 260 Aug 4 13:10 mysql_201708110830 >> >> the destcontainer B start data as data on snapshot mysql_201708070830 >> >> [root@joytest tmp]# ls -l /var/lib/mariadb >> total 0 >> drwxrwxr-x+ 1 mysql mysql 260 Aug 11 10:24 mysql >> drwxrwxr-x+ 1 mysql mysql 260 Aug 7 09:26 mysql_201708070830 >> drwxrwxr-x+ 1 mysql mysql 260 Aug 7 09:26 mysql_201708080830 >> drwxrwxr-x+ 1 mysql mysql 260 Aug 7 09:26 mysql_201708090830 >> drwxrwxr-x+ 1 mysql mysql 260 Aug 7 09:26 mysql_201708100830 >> drwxrwxr-x+ 1 mysql mysql 260 Aug 7 09:26 mysql_201708110830 >> >> >> >> tpcorp@virtualtrust3:/tmp$ diff source_mysql_201708110830.txt >> dest_mysql_201708110830.txt >> 1c1 >> < drwxrwxr-x 260 2017/08/04 13:10:56 mysql_201708110830 >> --- >>> drwxrwxr-x 260 2017/08/07 09:26:32 mysql_201708110830 >> 5c5 >> < -rw-rwx--- 5242880 2017/08/10 07:30:35 mysql_201708110830/ib_logfile1 >> --- >>> -rw-rwx--- 5242880 2017/08/07 07:30:36 mysql_201708110830/ib_logfile1 >> > > I don't really understand this. I don't see the actual rsync -anc > command, what I see is a diff of two text files whose contents I also > don't understand. So I have to guess that ib_logfile1 is a file inside > of a snapshot on machine A, that was btrfs send -p / receive to > machine B and is now different for some reason? > > If the problem is that ib_logfile1 is wrong only on machine B after > Btrfs send receive, that suggests it might be a network problem. The > Btrfs send receive stream only checksums btrfs metadata (the internal > commands in the stream). The data is not checksummed so it is possible > an uncaught network error can inject silent data corruption which > Btrfs will not catch - it's just the normal TCP/IP network > checksumming happening. > > Anyway, I'm still confused whether the problem is a change only during > send/receive, or if there's a change happening on a machine in > isolation just when you delete other snapshots. > > > -- > Chris Murphy > Hi Chris, I would like to describe what I have done. Exactly I have 3 boxes of ubuntu 16.04 LTS in mycompany environment and the other on hosting provider. Overall steps First -> started first box in company environment and send btrfs offline (batch) to receive with scp to remote box on hosting provider. After incorrect result of incremental I had to send the new base to start over on remote box around 3 times I changed to the Second step. Second -> Setup the second box in company environment on the different network segment of the first box and still send btrfs offline (batch) to receive with scp from first box to second box Yes the result still the same as the First even different date but the incremental unusable vary (not only on the second incremental send) Third -> Change from test with 2 boxes to be only one box with 2 containers and use online send and receive btrfs as you see the current result of btrfs sub volumes. The result still incorrect some day which I had to start send the new base btrfs snapshot whenever it started to be incorrect. I suspect the send/recieve machanism then I uses the snapshot on mysql_201708040830 to be the start base on the source btrfs in container A (backuplogC7) (As I said the local without send/receive it work) until now. Since mysql_201708040830 the incremental still work until mysql_201708110830 it start to incorrect. 1. I started with the first box (Machine X called "virtualtrust3") tpcorp@virtualtrust3:~$ uname -a Linux virtualtrust3 4.4.0-79-generic #100-Ubuntu SMP Wed May 17 19:58:14 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux tpcorp@virtualtrust3:~$ cat /etc/*release* DISTRIB_ID=Ubuntu DISTRIB_RELEASE=16.04 DISTRIB_CODENAME=xenial
Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
Qu Wenruo wrote: >Although Btrfs can disable data CoW, nodatacow also disables data >checksum, which is another main feature for btrfs. Then decoupling of the two should probably decoupled and support for notdatacow+checksumming be implemented?! I'm not an expert, but I wouldn't see why this shouldn't be possible (especially since metadata is AFAIC anyway *always* CoWed + checksummed). Nearly a year ago I had some off-list mails exchanged with CM and AFAIU he said it would technically be possible... What's the worst thing that can happen?! IMO, that noCoWed data would have been correctly written on a crash, but not the checksum, thereby the (bad) checksum would invalidate the actually good data. How likely is that compared to the other way round? I'd guess not so much. And even if, it's IMO still better to have then false positives (which the higher application layers should take care of anyway) than to not notice silent data corruption at all. Of course checksuming would possibly impact performance, but anyway could still use nodatacow+nochecksum (or any other fs) if he focuses more on performance than data integrity. But all those who focus on integrity would get that, even in the nodatacow case. IIRC, CM brought as an argument, that some people rather get the bad data than nothing at all (respectively EIO)... but for those btrfs is probably anyway a bad choice (at least in the normal non-nodatacow case),... also any application should properly deal with EIO... and last but not least, one could still provide a special tool that, after crash (with possibly non-matching data/csum) allows a user to find such cases and decide what to do,... so a user/admin who rather takes the bad data an tries for forensical recovery could be given a tool like btrfs csum --recompute-invalid-csums (or some better name), in which either all (or just some paths) csums are re-written in case they don't match. Cheers, Chris. smime.p7s Description: S/MIME cryptographic signature
Re: [PATCH] Btrfs: fix out of bounds array access while reading extent buffer
Hi Liu, [auto build test WARNING on v4.13-rc4] [also build test WARNING on next-20170811] [cannot apply to btrfs/next] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Liu-Bo/Btrfs-fix-out-of-bounds-array-access-while-reading-extent-buffer/20170810-235607 config: x86_64-randconfig-a0-08120433 (attached as .config) compiler: gcc-4.4 (Debian 4.4.7-8) 4.4.7 reproduce: # save the attached .config to linux build tree make ARCH=x86_64 All warnings (new ones prefixed by >>): fs//btrfs/extent_io.c: In function 'read_extent_buffer': >> fs//btrfs/extent_io.c:5419: warning: unused variable 'num_pages' vim +/num_pages +5419 fs//btrfs/extent_io.c 5407 5408 void read_extent_buffer(struct extent_buffer *eb, void *dstv, 5409 unsigned long start, 5410 unsigned long len) 5411 { 5412 size_t cur; 5413 size_t offset; 5414 struct page *page; 5415 char *kaddr; 5416 char *dst = (char *)dstv; 5417 size_t start_offset = eb->start & ((u64)PAGE_SIZE - 1); 5418 unsigned long i = (start_offset + start) >> PAGE_SHIFT; > 5419 unsigned long num_pages = num_extent_pages(eb->start, eb->len); 5420 5421 if (start + len > eb->len) { 5422 WARN(1, KERN_ERR "btrfs bad mapping eb start %llu len %lu, wanted %lu %lu\n", 5423 eb->start, eb->len, start, len); 5424 memset(dst, 0, len); 5425 return; 5426 } 5427 5428 offset = (start_offset + start) & (PAGE_SIZE - 1); 5429 5430 while (len > 0) { 5431 ASSERT(i < num_pages); 5432 page = eb->pages[i]; 5433 5434 cur = min(len, (PAGE_SIZE - offset)); 5435 kaddr = page_address(page); 5436 memcpy(dst, kaddr + offset, cur); 5437 5438 dst += cur; 5439 len -= cur; 5440 offset = 0; 5441 i++; 5442 } 5443 } 5444 --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
Re: [PATCH v3 46/49] fs/btrfs: convert to bio_for_each_segment_all_sp()
Hi Ming, [auto build test WARNING on linus/master] [also build test WARNING on v4.13-rc4 next-20170810] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Ming-Lei/block-support-multipage-bvec/20170810-110521 config: x86_64-randconfig-b0-08112217 (attached as .config) compiler: gcc-4.4 (Debian 4.4.7-8) 4.4.7 reproduce: # save the attached .config to linux build tree make ARCH=x86_64 All warnings (new ones prefixed by >>): fs/btrfs/raid56.c: In function 'find_logical_bio_stripe': >> fs/btrfs/raid56.c:1368: warning: unused variable 'bia' vim +/bia +1368 fs/btrfs/raid56.c 1356 1357 /* 1358 * helper to find the stripe number for a given 1359 * bio (before mapping). Used to figure out which stripe has 1360 * failed. This looks up based on logical block numbers. 1361 */ 1362 static int find_logical_bio_stripe(struct btrfs_raid_bio *rbio, 1363 struct bio *bio) 1364 { 1365 u64 logical = bio->bi_iter.bi_sector; 1366 u64 stripe_start; 1367 int i; > 1368 struct bvec_iter_all bia; 1369 1370 logical <<= 9; 1371 1372 for (i = 0; i < rbio->nr_data; i++) { 1373 stripe_start = rbio->bbio->raid_map[i]; 1374 if (logical >= stripe_start && 1375 logical < stripe_start + rbio->stripe_len) { 1376 return i; 1377 } 1378 } 1379 return -1; 1380 } 1381 --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
Re: btrfs issue with mariadb incremental backup
On Fri, Aug 11, 2017 at 12:00 AM,wrote: > Sorry Chris, > > I forgot to send the diff from rsync -anc result > > the source container A start data as data on snapshot mysql_201708040830 > > [root@backuplogC7 tmp]# ls -l /var/lib/mariadb > total 0 > drwxrwxr-x+ 1 mysql mysql 260 Aug 4 13:10 mysql > drwxrwxr-x+ 1 mysql mysql 260 Jul 12 08:29 mysql_201708040830 > drwxrwxr-x+ 1 mysql mysql 260 Aug 4 13:10 mysql_201708050830 > drwxrwxr-x+ 1 mysql mysql 260 Aug 4 13:10 mysql_201708060830 > drwxrwxr-x+ 1 mysql mysql 260 Aug 4 13:10 mysql_201708070830 > drwxrwxr-x+ 1 mysql mysql 260 Aug 4 13:10 mysql_201708080830 > drwxrwxr-x+ 1 mysql mysql 260 Aug 4 13:10 mysql_201708090830 > drwxrwxr-x+ 1 mysql mysql 260 Aug 4 13:10 mysql_201708100830 > drwxrwxr-x+ 1 mysql mysql 260 Aug 4 13:10 mysql_201708110830 > > the destcontainer B start data as data on snapshot mysql_201708070830 > > [root@joytest tmp]# ls -l /var/lib/mariadb > total 0 > drwxrwxr-x+ 1 mysql mysql 260 Aug 11 10:24 mysql > drwxrwxr-x+ 1 mysql mysql 260 Aug 7 09:26 mysql_201708070830 > drwxrwxr-x+ 1 mysql mysql 260 Aug 7 09:26 mysql_201708080830 > drwxrwxr-x+ 1 mysql mysql 260 Aug 7 09:26 mysql_201708090830 > drwxrwxr-x+ 1 mysql mysql 260 Aug 7 09:26 mysql_201708100830 > drwxrwxr-x+ 1 mysql mysql 260 Aug 7 09:26 mysql_201708110830 > > > > tpcorp@virtualtrust3:/tmp$ diff source_mysql_201708110830.txt > dest_mysql_201708110830.txt > 1c1 > < drwxrwxr-x 260 2017/08/04 13:10:56 mysql_201708110830 > --- >> drwxrwxr-x 260 2017/08/07 09:26:32 mysql_201708110830 > 5c5 > < -rw-rwx--- 5242880 2017/08/10 07:30:35 mysql_201708110830/ib_logfile1 > --- >> -rw-rwx--- 5242880 2017/08/07 07:30:36 mysql_201708110830/ib_logfile1 > I don't really understand this. I don't see the actual rsync -anc command, what I see is a diff of two text files whose contents I also don't understand. So I have to guess that ib_logfile1 is a file inside of a snapshot on machine A, that was btrfs send -p / receive to machine B and is now different for some reason? If the problem is that ib_logfile1 is wrong only on machine B after Btrfs send receive, that suggests it might be a network problem. The Btrfs send receive stream only checksums btrfs metadata (the internal commands in the stream). The data is not checksummed so it is possible an uncaught network error can inject silent data corruption which Btrfs will not catch - it's just the normal TCP/IP network checksumming happening. Anyway, I'm still confused whether the problem is a change only during send/receive, or if there's a change happening on a machine in isolation just when you delete other snapshots. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs issue with mariadb incremental backup
On Thu, Aug 10, 2017 at 10:40 PM,wrote: > Hi Chris, > The kernel version that I test is "4.4.0-89-generic" as I tested on ubuntu > lxd If I > want to change the kernel version I have to upgrade the host box. I can't parse what kernel that really is compared to upstream, which is at 4.4.80. > As you suggest the rsync to compare the subvolumes. I found the point. > the subvolumes are different only after I start to del old subvolumes on > machine A > the steps are You're saying a read only snapshot's contents are changing after deleting other (child) snapshots? If you have a subvolume "subvolume" and you make read only snapshots of it over time "snapshot1" "snapshot2" "snapshot3" "snapshot4" "snapshot5" "snapshot6" on both machine A and machine B. And rsync -anc machineA/snapshot6 machineB/snapshot6 they are identical. And then you delete "snapshot1" "snapshot2" "snapshot3" "snapshot4" on machine A. And rsync -anc machineA/snapshot6 machineB/snapshot6 they are no longer identical? I've never heard of this before. > > 30 08 * * * root /root/script/backup/backupsnap.sh root password > /var/lib/mariadb/mysql >> /var/log/btrfs_snap.log > 05 09 * * * root /root/script/backupbtrfs_inc.sh /var/lib/mariadb > 192.168.45.166 > /var/lib/mariadb >> /var/log/btrfs_send.log > 30 19 * * * root /root/script/delete_btrfs_sub_snap_volume.sh > /var/lib/mariadb 7 >> > /var/log/btrfs_del.log I have no idea what this means or why it's relevant. > > > The following script maintain snapshot to currently only 7 snapshots. > [root@backuplogC7 ~]# cat /root/script/delete_btrfs_sub_snap_volume.sh > basepath=$1 > keepcount=$2 > havecount=`btrfs sub list -s ${basepath}|cut -d' ' -f14|wc -l` > delcount=$[$keepcount-$havecount]; > datet=$(date +%Y%m%d%H%M) > echo "Start Delete ${datet}" > if [ $delcount -lt 0 ]; then > # list only snapshot subvolume > for i in `btrfs sub list -s ${basepath}|cut -d' ' -f14|head ${delcount}` > do > echo "btrfs sub delete ${basepath}/$i" > btrfs sub delete ${basepath}/$i > btrfs sub sync ${basepath} Seems sane. > Does it mean my delete script is not the properly way of the btrfs purge old > snapshot on source? I don't think so. But I still don't understand the problem, and what exactly has changed and when, and it might be my confusion. But if you have an ro snapshot that's correct at one moment, but then changes when some other (child) snapshots are deleted, that's pretty remarkable so I have to assume I'm confused. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH stable v4.12 backport] Btrfs: fix early ENOSPC due to delalloc
On 08/11/2017 10:44 AM, Chris Mason wrote: Hmpf, forgot to put the sha in Linus' tree: 17024ad0a0fdfcfe53043afb969b813d3e020c21 And Nikolay just reminded me this is already in Greg's queue. Whoops. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH stable v4.12 backport] Btrfs: fix early ENOSPC due to delalloc
Hmpf, forgot to put the sha in Linus' tree: 17024ad0a0fdfcfe53043afb969b813d3e020c21 -chris On 08/11/2017 10:41 AM, Chris Mason wrote: From: Omar SandovalIf a lot of metadata is reserved for outstanding delayed allocations, we rely on shrink_delalloc() to reclaim metadata space in order to fulfill reservation tickets. However, shrink_delalloc() has a shortcut where if it determines that space can be overcommitted, it will stop early. This made sense before the ticketed enospc system, but now it means that shrink_delalloc() will often not reclaim enough space to fulfill any tickets, leading to an early ENOSPC. (Reservation tickets don't care about being able to overcommit, they need every byte accounted for.) Fix it by getting rid of the shortcut so that shrink_delalloc() reclaims all of the metadata it is supposed to. This fixes early ENOSPCs we were seeing when doing a btrfs receive to populate a new filesystem, as well as early ENOSPCs Christoph saw when doing a big cp -r onto Btrfs. Fixes: 957780eb2788 ("Btrfs: introduce ticketed enospc infrastructure") Tested-by: Christoph Anton Mitterer Cc: sta...@vger.kernel.org Reviewed-by: Josef Bacik Signed-off-by: Omar Sandoval Signed-off-by: David Sterba Signed-off-by: Chris Mason --- fs/btrfs/extent-tree.c | 4 1 file changed, 4 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 33d979e9..83eecd3 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -4776,10 +4776,6 @@ static void shrink_delalloc(struct btrfs_root *root, u64 to_reclaim, u64 orig, else flush = BTRFS_RESERVE_NO_FLUSH; spin_lock(_info->lock); - if (can_overcommit(root, space_info, orig, flush)) { - spin_unlock(_info->lock); - break; - } if (list_empty(_info->tickets) && list_empty(_info->priority_tickets)) { spin_unlock(_info->lock); -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: FAILED: patch "[PATCH] Btrfs: fix early ENOSPC due to delalloc" failed to apply to 4.12-stable tree
On 08/04/2017 03:29 PM, Christoph Anton Mitterer wrote: > Hey. > > Could someone of the devs put some attention on this...? > > Thanks, > Chris :-) Done, you can also grab it here: https://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git/commit/?h=for-stable-4.12=354850ad1948af13248031e5180d495044d05aa5 -chris > > > On Mon, 2017-07-31 at 18:06 -0700, gre...@linuxfoundation.org wrote: >> The patch below does not apply to the 4.12-stable tree. >> If someone wants it applied there, or to any other stable or longterm >> tree, then please email the backport, including the original git >> commit >> id to. >> >> thanks, >> >> greg k-h >> >> -- original commit in Linus's tree -- >> >> From 17024ad0a0fdfcfe53043afb969b813d3e020c21 Mon Sep 17 00:00:00 >> 2001 >> From: Omar Sandoval >> Date: Thu, 20 Jul 2017 15:10:35 -0700 >> Subject: [PATCH] Btrfs: fix early ENOSPC due to delalloc >> >> If a lot of metadata is reserved for outstanding delayed allocations, >> we >> rely on shrink_delalloc() to reclaim metadata space in order to >> fulfill >> reservation tickets. However, shrink_delalloc() has a shortcut where >> if >> it determines that space can be overcommitted, it will stop early. >> This >> made sense before the ticketed enospc system, but now it means that >> shrink_delalloc() will often not reclaim enough space to fulfill any >> tickets, leading to an early ENOSPC. (Reservation tickets don't care >> about being able to overcommit, they need every byte accounted for.) >> >> Fix it by getting rid of the shortcut so that shrink_delalloc() >> reclaims >> all of the metadata it is supposed to. This fixes early ENOSPCs we >> were >> seeing when doing a btrfs receive to populate a new filesystem, as >> well >> as early ENOSPCs Christoph saw when doing a big cp -r onto Btrfs. >> >> Fixes: 957780eb2788 ("Btrfs: introduce ticketed enospc >> infrastructure") >> Tested-by: Christoph Anton Mitterer > me> >> Cc: sta...@vger.kernel.org >> Reviewed-by: Josef Bacik >> Signed-off-by: Omar Sandoval >> Signed-off-by: David Sterba >> >> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c >> index a6635f07b8f1..e3b0b4196d3d 100644 >> --- a/fs/btrfs/extent-tree.c >> +++ b/fs/btrfs/extent-tree.c >> @@ -4825,10 +4825,6 @@ static void shrink_delalloc(struct >> btrfs_fs_info *fs_info, u64 to_reclaim, >> else >> flush = BTRFS_RESERVE_NO_FLUSH; >> spin_lock(_info->lock); >> -if (can_overcommit(fs_info, space_info, orig, flush, >> false)) { >> -spin_unlock(_info->lock); >> -break; >> -} >> if (list_empty(_info->tickets) && >> list_empty(_info->priority_tickets)) { >> spin_unlock(_info->lock); -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH stable v4.12 backport] Btrfs: fix early ENOSPC due to delalloc
From: Omar SandovalIf a lot of metadata is reserved for outstanding delayed allocations, we rely on shrink_delalloc() to reclaim metadata space in order to fulfill reservation tickets. However, shrink_delalloc() has a shortcut where if it determines that space can be overcommitted, it will stop early. This made sense before the ticketed enospc system, but now it means that shrink_delalloc() will often not reclaim enough space to fulfill any tickets, leading to an early ENOSPC. (Reservation tickets don't care about being able to overcommit, they need every byte accounted for.) Fix it by getting rid of the shortcut so that shrink_delalloc() reclaims all of the metadata it is supposed to. This fixes early ENOSPCs we were seeing when doing a btrfs receive to populate a new filesystem, as well as early ENOSPCs Christoph saw when doing a big cp -r onto Btrfs. Fixes: 957780eb2788 ("Btrfs: introduce ticketed enospc infrastructure") Tested-by: Christoph Anton Mitterer Cc: sta...@vger.kernel.org Reviewed-by: Josef Bacik Signed-off-by: Omar Sandoval Signed-off-by: David Sterba Signed-off-by: Chris Mason --- fs/btrfs/extent-tree.c | 4 1 file changed, 4 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 33d979e9..83eecd3 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -4776,10 +4776,6 @@ static void shrink_delalloc(struct btrfs_root *root, u64 to_reclaim, u64 orig, else flush = BTRFS_RESERVE_NO_FLUSH; spin_lock(_info->lock); - if (can_overcommit(root, space_info, orig, flush)) { - spin_unlock(_info->lock); - break; - } if (list_empty(_info->tickets) && list_empty(_info->priority_tickets)) { spin_unlock(_info->lock); -- 2.9.3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 2/5] lib: Add zstd modules
On 08/10/2017 03:25 PM, Hugo Mills wrote: On Thu, Aug 10, 2017 at 01:41:21PM -0400, Chris Mason wrote: On 08/10/2017 04:30 AM, Eric Biggers wrote: Theses benchmarks are misleading because they compress the whole file as a single stream without resetting the dictionary, which isn't how data will typically be compressed in kernel mode. With filesystem compression the data has to be divided into small chunks that can each be decompressed independently. That eliminates one of the primary advantages of Zstandard (support for large dictionary sizes). I did btrfs benchmarks of kernel trees and other normal data sets as well. The numbers were in line with what Nick is posting here. zstd is a big win over both lzo and zlib from a btrfs point of view. It's true Nick's patches only support a single compression level in btrfs, but that's because btrfs doesn't have a way to pass in the compression ratio. It could easily be a mount option, it was just outside the scope of Nick's initial work. Could we please not add more mount options? I get that they're easy to implement, but it's a very blunt instrument. What we tend to see (with both nodatacow and compress) is people using the mount options, then asking for exceptions, discovering that they can't do that, and then falling back to doing it with attributes or btrfs properties. Could we just start with btrfs properties this time round, and cut out the mount option part of this cycle. In the long run, it'd be great to see most of the btrfs-specific mount options get deprecated and ultimately removed entirely, in favour of attributes/properties, where feasible. It's a good point, and as was commented later down I'd just do mount -o compress=zstd:3 or something. But I do prefer properties in general for this. My big point was just that next step is outside of Nick's scope. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel BUG at /build/linux-H5UzH8/linux-4.10.0/fs/btrfs/extent_io.c:2318
On 2017-08-11 05:57, Piotr Pawłow wrote: Hello, So 4.10 isn't /too/ far out of range yet, but I'd strongly consider upgrading (or downgrading to 4.9 LTS) as soon as it's reasonably convenient, before 4.13 in any case. Unless you prefer to go the distro support route, of course. I used to stick to latest kernels back when btrfs wasn't as stable and there were frequently important bug fixes. Nowadays I had no problems since such a long time, that I forgot to change the kernel after upgrading the distro. Besides, I thought RAID-1 was stable years ago (except the degraded mount issue). So, just my thoughts on this in particular. Either you are staying absolutely up to date (which I would still recommend if possible, there are new fixes going in regularly still, and we've been seing some performance improvements recently too), or you're not. If you're not, you should either: 1. Stick to upstream (kernel.org) LTS releases (and still ideally stay up to date within that release) and get reasonably reliable help from the ML. 2. Stick to your distribution's kernels, and get help from your distribution's usual support channels. As far as raid1 mode, it generally is reasonably stable, but there are still edge cases we with bugs that haven't been found (as you found out here). Anyway, after I took care of bad blocks by remapping them, scrub fixed all corruptions without any problems. Fsck comes out clean and everything seems fine. Glad to hear everything is working now! -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 3/5] btrfs: Add zstd support
On 2017-08-09 22:39, Nick Terrell wrote: Add zstd compression and decompression support to BtrFS. zstd at its fastest level compresses almost as well as zlib, while offering much faster compression and decompression, approaching lzo speeds. I benchmarked btrfs with zstd compression against no compression, lzo compression, and zlib compression. I benchmarked two scenarios. Copying a set of files to btrfs, and then reading the files. Copying a tarball to btrfs, extracting it to btrfs, and then reading the extracted files. After every operation, I call `sync` and include the sync time. Between every pair of operations I unmount and remount the filesystem to avoid caching. The benchmark files can be found in the upstream zstd source repository under `contrib/linux-kernel/{btrfs-benchmark.sh,btrfs-extract-benchmark.sh}` [1] [2]. I ran the benchmarks on a Ubuntu 14.04 VM with 2 cores and 4 GiB of RAM. The VM is running on a MacBook Pro with a 3.1 GHz Intel Core i7 processor, 16 GB of RAM, and a SSD. The first compression benchmark is copying 10 copies of the unzipped Silesia corpus [3] into a BtrFS filesystem mounted with `-o compress-force=Method`. The decompression benchmark times how long it takes to `tar` all 10 copies into `/dev/null`. The compression ratio is measured by comparing the output of `df` and `du`. See the benchmark file [1] for details. I benchmarked multiple zstd compression levels, although the patch uses zstd level 1. | Method | Ratio | Compression MB/s | Decompression speed | |-|---|--|-| | None| 0.99 | 504 | 686 | | lzo | 1.66 | 398 | 442 | | zlib| 2.58 | 65 | 241 | | zstd 1 | 2.57 | 260 | 383 | | zstd 3 | 2.71 | 174 | 408 | | zstd 6 | 2.87 | 70 | 398 | | zstd 9 | 2.92 | 43 | 406 | | zstd 12 | 2.93 | 21 | 408 | | zstd 15 | 3.01 | 11 | 354 | The next benchmark first copies `linux-4.11.6.tar` [4] to btrfs. Then it measures the compression ratio, extracts the tar, and deletes the tar. Then it measures the compression ratio again, and `tar`s the extracted files into `/dev/null`. See the benchmark file [2] for details. | Method | Tar Ratio | Extract Ratio | Copy (s) | Extract (s)| Read (s) | ||---|---|--||--| | None | 0.97 | 0.78 |0.981 | 5.501 |8.807 | | lzo| 2.06 | 1.38 |1.631 | 8.458 |8.585 | | zlib | 3.40 | 1.86 |7.750 | 21.544 | 11.744 | | zstd 1 | 3.57 | 1.85 |2.579 | 11.479 |9.389 | [1] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/btrfs-benchmark.sh [2] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/btrfs-extract-benchmark.sh [3] http://sun.aei.polsl.pl/~sdeor/index.php?page=silesia [4] https://cdn.kernel.org/pub/linux/kernel/v4.x/linux-4.11.6.tar.xz zstd source repository: https://github.com/facebook/zstd Signed-off-by: Nick TerrellConsidering how things went with the previous patch, I've been a bit more aggressive testing this one, but after now almost 72 hours of combined runtime for each of the architectures I have tests set up for with nothing breaking (well, nothing breaking that wasn't already breaking before this patch set, some of the raid56 tests are still failing semi-reliably as expected), I'd say this is reasonably stable and looks good overall. --- v2 -> v3: - Port upstream BtrFS commits e1ddce71d6, 389a6cfc2a, and 6acafd1eff - Change default compression level for BtrFS to 3 v3 -> v4: - Add missing includes, which fixes the aarch64 build - Fix minor linter warnings fs/btrfs/Kconfig | 2 + fs/btrfs/Makefile | 2 +- fs/btrfs/compression.c | 1 + fs/btrfs/compression.h | 6 +- fs/btrfs/ctree.h | 1 + fs/btrfs/disk-io.c | 2 + fs/btrfs/ioctl.c | 6 +- fs/btrfs/props.c | 6 + fs/btrfs/super.c | 12 +- fs/btrfs/sysfs.c | 2 + fs/btrfs/zstd.c| 432 + include/uapi/linux/btrfs.h | 8 +- 12 files changed, 468 insertions(+), 12 deletions(-) create mode 100644 fs/btrfs/zstd.c diff --git a/fs/btrfs/Kconfig b/fs/btrfs/Kconfig index 80e9c18..a26c63b 100644 --- a/fs/btrfs/Kconfig +++ b/fs/btrfs/Kconfig @@ -6,6 +6,8 @@ config BTRFS_FS select ZLIB_DEFLATE select LZO_COMPRESS select LZO_DECOMPRESS + select ZSTD_COMPRESS + select ZSTD_DECOMPRESS select RAID6_PQ select XOR_BLOCKS select SRCU diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile index 128ce17..962a95a 100644 ---
Re: kernel BUG at /build/linux-H5UzH8/linux-4.10.0/fs/btrfs/extent_io.c:2318
Hello, > So 4.10 isn't /too/ far out of range yet, but I'd strongly consider > upgrading (or downgrading to 4.9 LTS) as soon as it's reasonably > convenient, before 4.13 in any case. Unless you prefer to go the > distro support route, of course. I used to stick to latest kernels back when btrfs wasn't as stable and there were frequently important bug fixes. Nowadays I had no problems since such a long time, that I forgot to change the kernel after upgrading the distro. Besides, I thought RAID-1 was stable years ago (except the degraded mount issue). Anyway, after I took care of bad blocks by remapping them, scrub fixed all corruptions without any problems. Fsck comes out clean and everything seems fine. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs issue with mariadb incremental backup
Sorry Chris, I forgot to send the diff from rsync -anc result the source container A start data as data on snapshot mysql_201708040830 [root@backuplogC7 tmp]# ls -l /var/lib/mariadb total 0 drwxrwxr-x+ 1 mysql mysql 260 Aug 4 13:10 mysql drwxrwxr-x+ 1 mysql mysql 260 Jul 12 08:29 mysql_201708040830 drwxrwxr-x+ 1 mysql mysql 260 Aug 4 13:10 mysql_201708050830 drwxrwxr-x+ 1 mysql mysql 260 Aug 4 13:10 mysql_201708060830 drwxrwxr-x+ 1 mysql mysql 260 Aug 4 13:10 mysql_201708070830 drwxrwxr-x+ 1 mysql mysql 260 Aug 4 13:10 mysql_201708080830 drwxrwxr-x+ 1 mysql mysql 260 Aug 4 13:10 mysql_201708090830 drwxrwxr-x+ 1 mysql mysql 260 Aug 4 13:10 mysql_201708100830 drwxrwxr-x+ 1 mysql mysql 260 Aug 4 13:10 mysql_201708110830 the destcontainer B start data as data on snapshot mysql_201708070830 [root@joytest tmp]# ls -l /var/lib/mariadb total 0 drwxrwxr-x+ 1 mysql mysql 260 Aug 11 10:24 mysql drwxrwxr-x+ 1 mysql mysql 260 Aug 7 09:26 mysql_201708070830 drwxrwxr-x+ 1 mysql mysql 260 Aug 7 09:26 mysql_201708080830 drwxrwxr-x+ 1 mysql mysql 260 Aug 7 09:26 mysql_201708090830 drwxrwxr-x+ 1 mysql mysql 260 Aug 7 09:26 mysql_201708100830 drwxrwxr-x+ 1 mysql mysql 260 Aug 7 09:26 mysql_201708110830 tpcorp@virtualtrust3:/tmp$ diff source_mysql_201708110830.txt dest_mysql_201708110830.txt 1c1 < drwxrwxr-x 260 2017/08/04 13:10:56 mysql_201708110830 --- > drwxrwxr-x 260 2017/08/07 09:26:32 mysql_201708110830 5c5 < -rw-rwx--- 5242880 2017/08/10 07:30:35 mysql_201708110830/ib_logfile1 --- > -rw-rwx--- 5242880 2017/08/07 07:30:36 mysql_201708110830/ib_logfile1 Best Regards, Siranee Jaraswachirakul. > Hi Chris, > The kernel version that I test is "4.4.0-89-generic" as I tested on ubuntu > lxd If I > want to change the kernel version I have to upgrade the host box. > > As you suggest the rsync to compare the subvolumes. I found the point. > the subvolumes are different only after I start to del old subvolumes on > machine A > the steps are > > 30 08 * * * root /root/script/backup/backupsnap.sh root password > /var/lib/mariadb/mysql >> /var/log/btrfs_snap.log > 05 09 * * * root /root/script/backupbtrfs_inc.sh /var/lib/mariadb > 192.168.45.166 > /var/lib/mariadb >> /var/log/btrfs_send.log > 30 19 * * * root /root/script/delete_btrfs_sub_snap_volume.sh > /var/lib/mariadb 7 >> > /var/log/btrfs_del.log > > > The following script maintain snapshot to currently only 7 snapshots. > [root@backuplogC7 ~]# cat /root/script/delete_btrfs_sub_snap_volume.sh > basepath=$1 > keepcount=$2 > havecount=`btrfs sub list -s ${basepath}|cut -d' ' -f14|wc -l` > delcount=$[$keepcount-$havecount]; > datet=$(date +%Y%m%d%H%M) > echo "Start Delete ${datet}" > if [ $delcount -lt 0 ]; then > # list only snapshot subvolume > for i in `btrfs sub list -s ${basepath}|cut -d' ' -f14|head ${delcount}` > do > echo "btrfs sub delete ${basepath}/$i" > btrfs sub delete ${basepath}/$i > btrfs sub sync ${basepath} > done > else > echo "$delcount -gt 0 nothing to delete" > fi > echo "Stop Delete ${datet}" > > Does it mean my delete script is not the properly way of the btrfs purge old > snapshot on source? > > Best Regards, > > Siranee Jarwachirakul. > >> On Wed, Aug 9, 2017 at 12:36 AM,wrote: >> >>> 488 btrfs sub snap mysql_201707230830 mysql >>> 489 systemctl start mariadb >>> 490 btrfs sub list . >>> 491 cat /var/log/mariadb/mariadb.log >> >> OK so mysql_201707230830 once on machine B is inconsistent somehow. So >> the questions I have are: >> >> Is mysql_201707230830 on machine A really identical to >> mysql_201707230830 on machine B? You can do an rsync -anc (double >> check those options) which should independently check whether those >> two subvolumes are in fact identical. The -n is a no op, which doesn't >> really matter much because as read only subvolumes any attempt to sync >> will just result in noisy messages. The -c causes rsync to do its own >> checksum verification on both sides. >> >> If the subvolumes are different, we need to find out why. >> >> If the subvolumes are the same, then I wonder if you can reproduce the >> mariadb complaint on machine A merely by making a rw snapshot of >> mysql_201707230830 and trying to start it. If so, then it's not a send >> receive problem, it sounds like the snapshot itself is inconsistent, >> maybe mariadb hasn't actually completely closed out the database at >> the time the read only snapshot was taken? I'm not sure. >> >> If the subvolumes are different, I'm going to recommend updating at >> least the btrfs-progs because 4.4 is kinda old at this point. The >> kernel code is what's mainly responsible for the send stream, and the >> user space code is mainly responsible for receiving. And I don't off >> hand know or want to look up all the send receive changes between 4.4 >> and 4.12 to speculate on whether this is has already been fixed. >> >> What's the kernel version? >> >> -- >> Chris