Re: [PATCH v2 3/3] Btrfs: heuristic add byte core set calculation
Hi Timofey, [auto build test ERROR on next-20170724] [cannot apply to btrfs/next v4.13-rc2 v4.13-rc1 v4.12 v4.13-rc2] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Timofey-Titovets/Btrfs-populate-heuristic-with-detection-logic/20170729-061208 config: i386-randconfig-n0-201730 (attached as .config) compiler: gcc-4.8 (Debian 4.8.4-1) 4.8.4 reproduce: # save the attached .config to linux build tree make ARCH=i386 All errors (new ones prefixed by >>): fs/btrfs/compression.o: In function `btrfs_compress_heuristic': >> compression.c:(.text+0x2208): undefined reference to `__udivdi3' --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
Re: [PATCH] btrfs: preserve i_mode if __btrfs_set_acl() fails
On Sat, Jul 29, 2017 at 12:48:04AM +, Josef Bacik wrote: > On Fri, Jul 28, 2017 at 09:26:29PM -0300, Ernesto A. Fernández wrote: > > + ret = __btrfs_set_acl(trans, inode, acl, type); > > + if (ret) > > + goto out; > > + > > + inode->i_mode = mode; > > + inode_inc_iversion(inode); > > + inode->i_ctime = current_time(inode); > > + set_bit(BTRFS_INODE_COPY_EVERYTHING, _I(inode)->runtime_flags); > > This only needs to be set if we actually set the xattr. I'd fix setxattr to > call it every time it's called. I had not thought of that, thank you. If I'm understanding this correctly the issue would be only when setting a NULL default acl on an inode that is not a directory. In that case I probably shouldn't be calling btrfs_update_inode either, but I can't move that back to setxattr. Perhaps __btrfs_set_acl could return an error in that case, like -ENOTDIR, and then we can set ret back to 0 before returning from btrfs_set_acl. > > + ret = btrfs_update_inode(trans, root, inode); > > + BUG_ON(ret); > > No BUG_ON, return the error. The call to BUG_ON was already there before my patch, only inside the __btrfs_setxattr function. Since I didn't know the reason I thought it was best not to change it. I'll do as you say in the next version. Thank you for your review. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
★webmaster--10月·广州国际进出口汽车配件展【与“广交会”同期同地举行】 (地右P1-L-Me)
webmaster【通过本邮件参展优惠500元一展位,需回信专用邮箱“12809...@qq.com”报名】 尊敬的 企业领导/公司负责人: 诚邀参加中国最大汽配外贸展 —— APF 2017 汽配行业品牌盛会,外贸企业最佳选择,全球采购首选平台! ★ 与“广交会”同期同地举行, ★ 以“广交会”庞大的客流量为依托,买家互动,借势兴展, ★ 共享来自全球数十万采购商资源••• 【 基 本 信 息 】 中文名称: 2017广州国际进出口汽车配件展览会 英文名称: The Guangzhou International Import and Export Auto Parts Fair 2017 (APF 2017) 展会日期: 2017年10月13—15日 展会场馆: 广州琶洲国际采购中心 批准单位: 中华人民共和国商务部 主办单位: 中国对外贸易经济合作企业协会、映德国际会展有限公司 官方网站: http://www.CAPE-china.com 在线客服: 邮箱/QQ:q...@12809395.com; 微信:ZhanShangZhiJia; 微博:http://weibo.com/yingdehuizhan 咨询电话: 4000-580-850(转5206或8144); 131-2662-5206; 010―8699-7155、 8084-2128; 【 展 会 介 绍 】 中国目前的汽车保有量已达1.95亿多辆,预计到2020年,中国汽车保有量将超过2.5亿辆。预计2016年中国汽车年产销量将超过3000万辆,到2020年中国汽车产销量将分别超过4500万辆,从而成为名副其实的全球第一大汽车市场。汽车配件是汽车工业发展的基础,汽车配件配套及售后服务市场是汽车市场的重要组成部分,中国汽车工业的迅猛发展,为汽车配件行业提供了坚实的产业基础和有力的市场支撑,并将形成1.5-2万亿元超大规模的市场产值。 作为汽车市场的焦点,广州拥有国内最大的汽车生产基地和汽车产业集群,连续三年汽车消费增速全国前列。2017年是“十三五”规划实施的重要一年,是供给侧结构性改革的深化之年,中国汽车工业已步入由大到强的发展之路,行业资源分配日益优化、产业布局日趋合理的态势已初现端倪,产业发展正逐步由产销量的提升演变为质量的飞跃。尤其在夯实产业根基、促使健康发展原则指导下,汽车配件产业,已被提升为汽车产业链条中首要的发展对象,资源倾斜、政策扶持、整顿规范,可以预计,继我国整车生产及消费在过去十年取得蓬勃发展成就之后,未来五到十年,将是我国汽车配件行业产生根本性变革的黄金时期。 得益于中国汽车产业高速发展和全球汽车零部件产业链积极向中国转移,映德会展、中汽展览联合行业权威机构定于2017年10月13-15日在广州琶洲国际采购中心举办“2017广州国际进出口汽车配件展览会”(APF 2017)。依托汽车产业和全球最大的潜在市场资源,根据汽车配件产业发展现状和中外市场需求,在继承和延伸往届展会成功经验的基础上,在各级政府部门、行业协会的关心与支持下、经过主承办单位的精心组织策划,“APF 2017”将以全新的面貌再现广州,展会将全面展示汽车领域的最新产品与成果及未来发展方向,将有超过百家合作媒体的超大阵容作全方位的立体宣传。APF 全国统一参展报名热线:4000-580-850(转5206、8144)。 我们将继续以“突出品牌、开拓创新、注重实效、强化服务”的办展宗旨,凭借独特的创意,科学的组织管理和卓越的服务,以全新的理念为广大中外参展商提供一个“专业化、国际化、品牌化”的展示交流平台,为全球汽车配件及后市场行业提供更多的合作机会,有力推动中国汽车配件产品全面进入全球采购体系,与世界各国汽车产业协调合作、互利共赢、共同发展进步。 【 展 会 优 势 】 ●绝佳商机 —— APF 2017举办时间正值“广交会”期间,享有“中国第一展”美誉的“广交会”,每年参加的采购商大约20多万,来自一百多个国家和地区。我们将通过一系列途径充分借助“广交会”全球买家的巨大资源,并通过组委会客户关系邀请系统向国内外三十多万采购商发出邀请,与“广交会”完全互动,借势兴展,同时弥补“广交会”内销的不足,形成“一内一外、相辅相成”的作用。以“广交会”庞大的客流量为依托,中外采购商云集,市场潜力不可估量,巨大商机全面彰显,是开拓国际市场的重要平台! ● 黄金地段 —— 广州琶洲国际采购中心与广交会展馆一路之隔,连为一体,形成完美对接,连接广交会同类产品展区,距离地铁八号线琶洲站A出口仅200米之遥,交通非常便利,方便海外客商前来参观、采购。 ● 参展回报 —— 与每个国内外采购决策者面对面交流,和意向客户达成交易,在专业客户中扩大品牌影响力;建立海外分销网络,拓展国际市场;新产品、新技术推广;开拓新市场;了解竞争对手及行业发展趋势;洞悉国际最新技术与资讯;约见老客户并发展新业务。 【 目 标 观 众 】 中国(广州)国际汽车零部件及用品展览会组委会(映德会展―YOND EXPO)将专业观众组织和媒体宣传作为工作重点,邀请中外汽车制造商、改装厂、改装行、改装店,汽车工业设备制造商、汽车零配件用品制造商、贸易商、代理商、经销商、终端用户,汽车配件用品市场、超市、连锁加盟店、4S店,汽车保养及美容中心、汽车维修中心、汽车修理厂,汽车综合性能检测站、汽车后市场经销商,汽车后市场连锁经营领域专家、学者、投资公司及国内外有志于汽车后市场投资创业人士、汽车服务行业、汽车爱好者、车友会、俱乐部、商务机构、汽车维修检测行业相关部门、汽车交通运输部门、政府主管部门、汽车行业协会、专业媒体等主要单位及负责人参会。采取卓有实效的措施为参展企业搭建交流与合作的平台,促进科技成果转化,提高企业市场竞争力;同时通过系列紧密有序的宣传活动,确保展会在国内外引起最大关注。 16万国内外专业买家云集羊城 —— 一、 国内专业买家 1、300家整车厂和汽车销售公司 - 本田(广州,东风),丰田(一汽,广汽),大众(一汽,上海),北京现代,上海通用,东风日产,长安福特,比亚迪,奇瑞等35家主流整车企业和60家汽车销售公司,汽车用品公司的采购负责人现场参观采购。 2、8000家4S店集团及全国4S店 - 新疆广汇,冀东庞大,上海永达,浙江物产元通,广物汽贸,东创建国,大连中升,湖南申湘,深圳深业,中汽西南,安徽亚夏,郑州豫华等300家4S店集团和中国各品牌4000家4S店采购负责人参展采购。 3、1500家全国一级批发物流商 - 欧特隆(辽宁,杭州,南京,山西),沈阳新天成,郑州二仟家,山西茂德隆,长沙湘泸,福建永联,成都穗丰,广州永丰,新疆半分利,北京派安,石家庄中惠等1200家一级批发物流参展采购。 4、7000家全国各地市代理经销商 5、2500家全国优质影音改装专业店 - 以新城子昂,上海车之宝,北京双周,音乐前线,先歌兄弟, 非常城市等为代表的全国各区域优质影音改装店参展采购。 6、300家大型零售终端连锁 - 以新奇特,黄帽子,上海美车饰等为代表的全国各区域优质零售终端及大型连锁参展采购。。 7、9家国内终端零售店(含南方/泛珠三角地区终端店3家) - 以金手指,车元素等为代表的福建,江西,湖南,广东,广西,海南,四川,贵州,云南,香港,澳门等泛珠三角地区零售终端现场采购。以及2万家全国优秀零售终端。 二、 国外专业买家 1、4000亚洲买家: - 包括日本、韩国、印度尼西亚、马来西亚、印度、泰国、菲律宾、越南、新加坡等国行业商会组团采购参观。 2、1500中东买家: - 包括阿联酋、沙特阿拉伯、伊朗、叙利亚、以色列、科威特、卡塔尔、也门等国采购商组团参观采购。 3、2500欧美买家: - 包括德国、英国、法国、美国、墨西哥、加拿大等国采购商采购参观。 【 展 品 范 围 】 汽车零部件、零配件,发动机系统、底盘系统、制动系统、行驶系统、转向系统、车身系统、传动系统、排气系统、散热冷却系统、燃油系统,汽车附件、通用件、紧固件、密封件、摩擦材料,汽车电机、轴承、蓄电池、滤清器、散热器、消声器、传感器、仪器仪表、雨刷器、变速器、离合器、离合片、刹车片、汽车弹簧、减震器、保险杠、安全气囊、座椅、玻璃、车镜、车灯、汽车空调、轮胎、轮毂、链条、防滑链,汽车线束、插接器、硬管、软管、软轴、拉索,车用纺织品,汽车油漆、润滑油、机油、添加剂,汽车用品,汽车电子电器,汽车音影、音响、导航、车载通讯、安全和防盗系统,汽车改装部件及用品,汽保设备及工具,汽车模具,汽车零部件制造技术、设备、工具及材料,汽车零部件清洗设备及包装,汽车新产品,汽车节能环保与新能源技术及产品,相关软件、媒体、认证、金融和保险机构等。 【 参 展 细 则 】 ◆ 展位规格: 1、特装展位:36平方米起租,仅提供相应面积室内外空地。展台搭建、展览器具、用电用水等自理。 2、标准展位:9平方米(3m×3m)每个,2.5m高壁板、一条楣板(展商名称)、一张洽谈桌、两把椅子、两盏射灯、220V/5A电源插座一处。 ◆ 展位费用: 特装展位:境内企业RMB2000/平方米; 境外企业USD500/平方米; 标准展位:境内企业RMB2/个; 境外企业USD5000/个; (双面开口标准展位另加收10%费用) ◆ 会刊广告: (大会《会刊》将帮助您在展会后找到客户!除在展会期间广为发送外,还通过各种有关渠道发送给未能前来参观展会的各地专业人士手中,他们可利用会刊迅速查找服务内容与联络方法。 会刊尺寸:130mm*210mm,进口铜板纸彩色精印,发行量10万册。) 封面 CNY 3; 封二封三 CNY 22000; 扉页 CNY 18000; 黑白页 CNY 5000; 封底 CNY 2; 彩页跨版 CNY 18000; 彩页 CNY 12000; 300字简介 CNY 2000; ◆ 会议论坛: 如技术交流会/产品推广发布会,CNY9000/小时/场,用于会场及相关设备租金(包括场地、扩音设施、灯具、投影机、投影仪,桌椅、空调、茶水并协助主讲企业组织听众)。 【 参 展 程 序 】 1、大会即日起开始接受厂商报名,组委会(映德会展―YOND EXPO)严格按“款到先后顺序优先安排展位”,先期报名参展企业除“在统一参展费用的基础上获得较靠前展台位置”的同时,并可享受更多“展前宣传”和“买家推介”等增值服务。 2、参展单位请详细填写《参展申请表》(备索)并加盖公章,传真或复印后寄送至大会组织办公室(映德会展―YOND EXPO),并于三个工作日内向大会指定账户汇出参展费用。 3、参展单位请于报名时将300字内企业简介同时提供至大会组织办公室,以便进行及时展前宣传和刊登《会刊》等。 4、展品运输、仓储、吊装,展商报道、接待、食宿等后勤服务,详见会前《参展商手册》,约在大会开幕前一个半月发送。 5、需用动力电、气或用水、特装展台装修等事宜,请于大会开幕前一月将有关资料提供给大会组委会,以便会务组协助参展企业做好相应安排。 6、组委会拒绝与参展范围不符的厂商参展。报名截止日期:2017年08月31日。 【 筹 展 联 络 】 广州国际进出口汽车配件展组委会 官方网站: http://www.CAPE-china.com 全国统一客服热线:
Re: write corruption due to bio cloning on raid5/6
The read-only scrub finished without errors/hangs (with kernel 4.12.3). So, I guess the hangs were caused by: 1: other bug in 4.13-RC1 2: crazy-random SATA/disk-controller issue 3: interference between various btrfs tools [*] 4: something in the background did DIO write with 4.13-RC1 (but all affected content was eventually overwritten/deleted between the scrub attempts) [*] I expected scrub to finish in ~5 rather than ~40 hours (and didn't expect interference issues), so I didn't disable the scheduled maintenance script which deletes old files, recursively defrags the whole fs and runs a balance with usage=33 filters. I guess either of those (especially balance) could potentially cause scrub to hang. On Thu, Jul 27, 2017 at 10:44 PM, Duncan <1i5t5.dun...@cox.net> wrote: > Janos Toth F. posted on Thu, 27 Jul 2017 16:14:47 +0200 as excerpted: > >> * This is off-topic but raid5 scrub is painful. The disks run at >> constant ~100% utilization while performing at ~1/5 of their sequential >> read speeds. And despite explicitly asking idle IO priority when >> launching scrub, the filesystem becomes unbearably slow (while scrub >> takes a days or so to finish ... or get to the point where it hung the >> last time around, close to the end). > > That's because basically all the userspace scrub command does is make the > appropriate kernel calls to have it do the real scrub. So priority- > idling the userspace scrub doesn't do what it does on normal userspace > jobs that do much of the work themselves. > > The problem is that idle-prioritizing the kernel threads actually doing > the work could risk a deadlock due to lock inversion, since they're > kernel threads and aren't designed with the idea of people messing with > their priority in mind. > > Meanwhile, that's yet another reason btrfs raid56 mode isn't recommended > at this time. Try btrfs raid1 or raid10 mode instead, or possible btrfs > raid1, single or raid0 mode on top of a pair of mdraid5s or similar. Tho > parity-raid mode in general (that is, not btrfs-specific) is known for > being slow in various cases, with raid10 normally being the best > performing closest alternative. (Tho in the btrfs-specific case, btrfs > raid1 on top of a pair of mdraid/dmraid/whatever raid0s, is the normally > recommended higher performance reasonably low danger alternative.) If this applies to all RAID flavors then I consider the built-in help and the manual pages of scrub misleading (if it's RAID56-only, the manual should still mention how RAID56 is an exception). Also, a resumed scrub seems to skip a lot of data. It picks up where it left but then prematurely reports a job well done. I remember noticing a similar behavior with balance cancel/resume on RAID5 a few years ago (it went on for a few more chunks but left the rest alone and reported completion --- I am not sure if that's fixed now or these have a common root cause). -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: preserve i_mode if __btrfs_set_acl() fails
On Fri, Jul 28, 2017 at 09:26:29PM -0300, Ernesto A. Fernández wrote: > When changing a file's acl mask, btrfs_set_acl() will first set the > group bits of i_mode to the value of the mask, and only then set the > actual extended attribute representing the new acl. > > If the second part fails (due to lack of space, for example) and the > file had no acl attribute to begin with, the system will from now on > assume that the mask permission bits are actual group permission bits, > potentially granting access to the wrong users. > > Prevent this by starting the journal transaction before calling > __btrfs_set_acl and only changing the inode mode after it returns > successfully. > > Signed-off-by: Ernesto A. Fernández> --- > This issue is covered by generic/449 in xfstests. Several filesystems > are affected; some of them have already applied patches: > - fe26569 ext2: preserve i_mode if ext2_set_acl() fails > - f070e5a jfs: preserve i_mode if __jfs_set_acl() fails > - fcea8ae reiserfs: preserve i_mode if __reiserfs_set_acl() fails > > fs/btrfs/acl.c | 29 ++--- > 1 file changed, 26 insertions(+), 3 deletions(-) > > diff --git a/fs/btrfs/acl.c b/fs/btrfs/acl.c > index 8d8370d..d041526 100644 > --- a/fs/btrfs/acl.c > +++ b/fs/btrfs/acl.c > @@ -27,6 +27,7 @@ > #include "ctree.h" > #include "btrfs_inode.h" > #include "xattr.h" > +#include "transaction.h" > > struct posix_acl *btrfs_get_acl(struct inode *inode, int type) > { > @@ -113,14 +114,36 @@ static int __btrfs_set_acl(struct btrfs_trans_handle > *trans, > > int btrfs_set_acl(struct inode *inode, struct posix_acl *acl, int type) > { > + struct btrfs_root *root = BTRFS_I(inode)->root; > + struct btrfs_trans_handle *trans; > int ret; > + umode_t mode = inode->i_mode; > + > + if (btrfs_root_readonly(root)) > + return -EROFS; > + > + trans = btrfs_start_transaction(root, 2); > + if (IS_ERR(trans)) > + return PTR_ERR(trans); > > if (type == ACL_TYPE_ACCESS && acl) { > - ret = posix_acl_update_mode(inode, >i_mode, ); > + ret = posix_acl_update_mode(inode, , ); > if (ret) > - return ret; > + goto out; > } > - return __btrfs_set_acl(NULL, inode, acl, type); > + ret = __btrfs_set_acl(trans, inode, acl, type); > + if (ret) > + goto out; > + > + inode->i_mode = mode; > + inode_inc_iversion(inode); > + inode->i_ctime = current_time(inode); > + set_bit(BTRFS_INODE_COPY_EVERYTHING, _I(inode)->runtime_flags); This only needs to be set if we actually set the xattr. I'd fix setxattr to call it every time it's called. > + ret = btrfs_update_inode(trans, root, inode); > + BUG_ON(ret); No BUG_ON, return the error. Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs: preserve i_mode if __btrfs_set_acl() fails
When changing a file's acl mask, btrfs_set_acl() will first set the group bits of i_mode to the value of the mask, and only then set the actual extended attribute representing the new acl. If the second part fails (due to lack of space, for example) and the file had no acl attribute to begin with, the system will from now on assume that the mask permission bits are actual group permission bits, potentially granting access to the wrong users. Prevent this by starting the journal transaction before calling __btrfs_set_acl and only changing the inode mode after it returns successfully. Signed-off-by: Ernesto A. Fernández--- This issue is covered by generic/449 in xfstests. Several filesystems are affected; some of them have already applied patches: - fe26569 ext2: preserve i_mode if ext2_set_acl() fails - f070e5a jfs: preserve i_mode if __jfs_set_acl() fails - fcea8ae reiserfs: preserve i_mode if __reiserfs_set_acl() fails fs/btrfs/acl.c | 29 ++--- 1 file changed, 26 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/acl.c b/fs/btrfs/acl.c index 8d8370d..d041526 100644 --- a/fs/btrfs/acl.c +++ b/fs/btrfs/acl.c @@ -27,6 +27,7 @@ #include "ctree.h" #include "btrfs_inode.h" #include "xattr.h" +#include "transaction.h" struct posix_acl *btrfs_get_acl(struct inode *inode, int type) { @@ -113,14 +114,36 @@ static int __btrfs_set_acl(struct btrfs_trans_handle *trans, int btrfs_set_acl(struct inode *inode, struct posix_acl *acl, int type) { + struct btrfs_root *root = BTRFS_I(inode)->root; + struct btrfs_trans_handle *trans; int ret; + umode_t mode = inode->i_mode; + + if (btrfs_root_readonly(root)) + return -EROFS; + + trans = btrfs_start_transaction(root, 2); + if (IS_ERR(trans)) + return PTR_ERR(trans); if (type == ACL_TYPE_ACCESS && acl) { - ret = posix_acl_update_mode(inode, >i_mode, ); + ret = posix_acl_update_mode(inode, , ); if (ret) - return ret; + goto out; } - return __btrfs_set_acl(NULL, inode, acl, type); + ret = __btrfs_set_acl(trans, inode, acl, type); + if (ret) + goto out; + + inode->i_mode = mode; + inode_inc_iversion(inode); + inode->i_ctime = current_time(inode); + set_bit(BTRFS_INODE_COPY_EVERYTHING, _I(inode)->runtime_flags); + ret = btrfs_update_inode(trans, root, inode); + BUG_ON(ret); +out: + btrfs_end_transaction(trans); + return ret; } /* -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs incremental send | receive fails with Error: File not found
On 7/28/2017 9:32 PM, Hermann Schwärzler wrote: Hi for me it looks like those snapshots are not read-only. But as far as I know for using send they have to be. They are read-only. # btrfs property get userData.20170727T1222/ ro=true At least https://btrfs.wiki.kernel.org/index.php/Incremental_Backup#Initial_Bootstrapping states "We will need to create a read-only snapshot ,,," I am using send/receive (with read-only snapshots) on a regular basis and never had a problem like yours. I have no good explanation. There are no problems reported on the filesystems with Btrfs scrub or Btrfs check. Did you also replace files with same name between snapshots? What are the commands you use to create your snapshots? I used to do it in an hourly cron job like this. # btrfs subvolume snapshot -r /mnt/storagePool/volume/userData/ /mnt/storagePool/snapshots/userData.`date +%Y.%m.%d-%H.%M.%S` Now I use btrbk, but the command is the same and the problem is the same. The problem I see seems similar to the issue fixed in https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f59627810e18d4435051d982b5d05cab18c6e653 but that commit should already be in kernel-4.13_rc2 Greetings Hermann On 07/28/2017 07:26 PM, A L wrote: I often hit the following error when doing incremental btrfs send-receive: Btrfs incremental send | receive fails with Error: File not found Sometimes I can do two-three incremental snapshots, but then the same error (different file) happens again. It seems that the files were changed or replaced between snapshots, which is causing the problems for send-receive. I have tried to delete all snapshots and started over but the problem comes back, so I think it must be a bug. The source volume is: /mnt/storagePool (with RAID1 profile) with subvolume: volume/userData Backup disk is: /media/usb-backup (external USB disk) [...] -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 12/13] btrfs: allow backref search checks for shared extents
On Wed, Jul 12, 2017 at 04:20:10PM -0600, Edmund Nadolski wrote: > When called with a struct share_check, find_parent_nodes() > will detect a shared extent and immediately return with > BACKREF_SHARED_FOUND. > Reviewed-by: Liu BoThanks, -liubo > Signed-off-by: Edmund Nadolski > Signed-off-by: Jeff Mahoney > --- > fs/btrfs/backref.c | 164 > + > 1 file changed, 115 insertions(+), 49 deletions(-) > > diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c > index c1882e5..35ac0bd 100644 > --- a/fs/btrfs/backref.c > +++ b/fs/btrfs/backref.c > @@ -135,6 +135,25 @@ struct preftrees { > struct preftree indirect_missing_keys; > }; > > +/* > + * Checks for a shared extent during backref search. > + * > + * The share_count tracks prelim_refs (direct and indirect) having a > + * ref->count >0: > + * - incremented when a ref->count transitions to >0 > + * - decremented when a ref->count transitions to <1 > + */ > +struct share_check { > + u64 root_objectid; > + u64 inum; > + int share_count; > +}; > + > +static inline int extent_is_shared(struct share_check *sc) > +{ > + return (sc && sc->share_count > 1) ? BACKREF_FOUND_SHARED : 0; > +} > + > static struct kmem_cache *btrfs_prelim_ref_cache; > > int __init btrfs_prelim_ref_init(void) > @@ -195,14 +214,26 @@ static int prelim_ref_compare(struct prelim_ref *ref1, > return 0; > } > > +void update_share_count(struct share_check *sc, int oldcount, int newcount) > +{ > + if ((!sc) || (oldcount == 0 && newcount < 1)) > + return; > + > + if (oldcount > 0 && newcount < 1) > + sc->share_count--; > + else if (oldcount < 1 && newcount > 0) > + sc->share_count++; > +} > + > /* > * Add @newref to the @root rbtree, merging identical refs. > * > - * Callers should assumed that newref has been freed after calling. > + * Callers should assume that newref has been freed after calling. > */ > static void prelim_ref_insert(const struct btrfs_fs_info *fs_info, > struct preftree *preftree, > - struct prelim_ref *newref) > + struct prelim_ref *newref, > + struct share_check *sc) > { > struct rb_root *root; > struct rb_node **p; > @@ -234,12 +265,20 @@ static void prelim_ref_insert(const struct > btrfs_fs_info *fs_info, > eie->next = newref->inode_list; > trace_btrfs_prelim_ref_merge(fs_info, ref, newref, >preftree->count); > + /* > + * A delayed ref can have newref->count < 0. > + * The ref->count is updated to follow any > + * BTRFS_[ADD|DROP]_DELAYED_REF actions. > + */ > + update_share_count(sc, ref->count, > +ref->count + newref->count); > ref->count += newref->count; > free_pref(newref); > return; > } > } > > + update_share_count(sc, 0, newref->count); > preftree->count++; > trace_btrfs_prelim_ref_insert(fs_info, newref, NULL, preftree->count); > rb_link_node(>rbnode, parent, p); > @@ -303,7 +342,8 @@ static void prelim_release(struct preftree *preftree) > static int add_prelim_ref(const struct btrfs_fs_info *fs_info, > struct preftree *preftree, u64 root_id, > const struct btrfs_key *key, int level, u64 parent, > - u64 wanted_disk_byte, int count, gfp_t gfp_mask) > + u64 wanted_disk_byte, int count, > + struct share_check *sc, gfp_t gfp_mask) > { > struct prelim_ref *ref; > > @@ -348,31 +388,32 @@ static int add_prelim_ref(const struct btrfs_fs_info > *fs_info, > ref->count = count; > ref->parent = parent; > ref->wanted_disk_byte = wanted_disk_byte; > - prelim_ref_insert(fs_info, preftree, ref); > - > - return 0; > + prelim_ref_insert(fs_info, preftree, ref, sc); > + return extent_is_shared(sc); > } > > /* direct refs use root == 0, key == NULL */ > static int add_direct_ref(const struct btrfs_fs_info *fs_info, > struct preftrees *preftrees, int level, u64 parent, > - u64 wanted_disk_byte, int count, gfp_t gfp_mask) > + u64 wanted_disk_byte, int count, > + struct share_check *sc, gfp_t gfp_mask) > { > return add_prelim_ref(fs_info, >direct, 0, NULL, level, > - parent, wanted_disk_byte, count, gfp_mask); > + parent, wanted_disk_byte, count, sc, gfp_mask); > } > > /* indirect
Re: Btrfs incremental send | receive fails with Error: File not found
Hi for me it looks like those snapshots are not read-only. But as far as I know for using send they have to be. At least https://btrfs.wiki.kernel.org/index.php/Incremental_Backup#Initial_Bootstrapping states "We will need to create a read-only snapshot ,,," I am using send/receive (with read-only snapshots) on a regular basis and never had a problem like yours. What are the commands you use to create your snapshots? Greetings Hermann On 07/28/2017 07:26 PM, A L wrote: I often hit the following error when doing incremental btrfs send-receive: Btrfs incremental send | receive fails with Error: File not found Sometimes I can do two-three incremental snapshots, but then the same error (different file) happens again. It seems that the files were changed or replaced between snapshots, which is causing the problems for send-receive. I have tried to delete all snapshots and started over but the problem comes back, so I think it must be a bug. The source volume is: /mnt/storagePool (with RAID1 profile) with subvolume: volume/userData Backup disk is: /media/usb-backup (external USB disk) [...] -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 08/13] btrfs: convert prelimary reference tracking to use rbtrees
On Wed, Jul 12, 2017 at 04:20:06PM -0600, Edmund Nadolski wrote: > It's been known for a while that the use of multiple lists > that are periodically merged was an algorithmic problem within > btrfs. There are several workloads that don't complete in any > reasonable amount of time (e.g. btrfs/130) and others that cause > soft lockups. > > The solution is to use a set of rbtrees that do insertion merging > for both indirect and direct refs, with the former converting > refs into the latter. The result is a btrfs/130 workload that > used to take several hours now takes about half of that. This > runtime still isn't acceptable and a future patch will address that > by moving the rbtrees higher in the stack so the lookups can be > shared across multiple calls to find_parent_nodes. > Reviewed-by: Liu BoThanks, -liubo > Signed-off-by: Edmund Nadolski > Signed-off-by: Jeff Mahoney > --- > fs/btrfs/backref.c | 441 > ++--- > 1 file changed, 284 insertions(+), 157 deletions(-) > > diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c > index 6cac5ab..1edb107 100644 > --- a/fs/btrfs/backref.c > +++ b/fs/btrfs/backref.c > @@ -26,11 +26,6 @@ > #include "delayed-ref.h" > #include "locking.h" > > -enum merge_mode { > - MERGE_IDENTICAL_KEYS = 1, > - MERGE_IDENTICAL_PARENTS, > -}; > - > /* Just an arbitrary number so we can be sure this happened */ > #define BACKREF_FOUND_SHARED 6 > > @@ -129,7 +124,7 @@ static int find_extent_in_eb(const struct extent_buffer > *eb, > * this structure records all encountered refs on the way up to the root > */ > struct prelim_ref { > - struct list_head list; > + struct rb_node rbnode; > u64 root_id; > struct btrfs_key key_for_search; > int level; > @@ -139,6 +134,18 @@ struct prelim_ref { > u64 wanted_disk_byte; > }; > > +struct preftree { > + struct rb_root root; > +}; > + > +#define PREFTREE_INIT{ .root = RB_ROOT } > + > +struct preftrees { > + struct preftree direct;/* BTRFS_SHARED_[DATA|BLOCK]_REF_KEY */ > + struct preftree indirect; /* BTRFS_[TREE_BLOCK|EXTENT_DATA]_REF_KEY */ > + struct preftree indirect_missing_keys; > +}; > + > static struct kmem_cache *btrfs_prelim_ref_cache; > > int __init btrfs_prelim_ref_init(void) > @@ -158,6 +165,108 @@ void btrfs_prelim_ref_exit(void) > kmem_cache_destroy(btrfs_prelim_ref_cache); > } > > +static void free_pref(struct prelim_ref *ref) > +{ > + kmem_cache_free(btrfs_prelim_ref_cache, ref); > +} > + > +/* > + * Return 0 when both refs are for the same block (and can be merged). > + * A -1 return indicates ref1 is a 'lower' block than ref2, while 1 > + * indicates a 'higher' block. > + */ > +static int prelim_ref_compare(struct prelim_ref *ref1, > + struct prelim_ref *ref2) > +{ > + if (ref1->level < ref2->level) > + return -1; > + if (ref1->level > ref2->level) > + return 1; > + if (ref1->root_id < ref2->root_id) > + return -1; > + if (ref1->root_id > ref2->root_id) > + return 1; > + if (ref1->key_for_search.type < ref2->key_for_search.type) > + return -1; > + if (ref1->key_for_search.type > ref2->key_for_search.type) > + return 1; > + if (ref1->key_for_search.objectid < ref2->key_for_search.objectid) > + return -1; > + if (ref1->key_for_search.objectid > ref2->key_for_search.objectid) > + return 1; > + if (ref1->key_for_search.offset < ref2->key_for_search.offset) > + return -1; > + if (ref1->key_for_search.offset > ref2->key_for_search.offset) > + return 1; > + if (ref1->parent < ref2->parent) > + return -1; > + if (ref1->parent > ref2->parent) > + return 1; > + > + return 0; > +} > + > +/* > + * Add @newref to the @root rbtree, merging identical refs. > + * > + * Callers should assumed that newref has been freed after calling. > + */ > +static void prelim_ref_insert(struct preftree *preftree, > + struct prelim_ref *newref) > +{ > + struct rb_root *root; > + struct rb_node **p; > + struct rb_node *parent = NULL; > + struct prelim_ref *ref; > + int result; > + > + root = >root; > + p = >rb_node; > + > + while (*p) { > + parent = *p; > + ref = rb_entry(parent, struct prelim_ref, rbnode); > + result = prelim_ref_compare(ref, newref); > + if (result < 0) { > + p = &(*p)->rb_left; > + } else if (result > 0) { > + p = &(*p)->rb_right; > + } else { > + /* Identical refs, merge them and free @newref */ > + struct extent_inode_elem *eie = ref->inode_list; > + > + while (eie && eie->next)
Re: Btrfs + compression = slow performance and high cpu usage
In addition to my previous "it does not happen here" comment, if someone is reading this thread, there are some other interesting details: > When the compression is turned off, I am able to get the > maximum 500-600 mb/s write speed on this disk (raid array) > with minimal cpu usage. No details on whether it is a parity RAID or not. > btrfs device usage /mnt/arh-backup1/ > /dev/sda, ID: 2 >Device size:21.83TiB >Device slack: 0.00B >Data,single: 9.29TiB >Metadata,single:46.00GiB >System,single: 32.00MiB >Unallocated:12.49TiB That's exactly 24TB of "Device size", of which around 45% are used, and the string "backup" may suggest that the content is backups, which may indicate a very fragmented freespace. Of course compression does not help with that, in my freshly created Btrfs volume I get as expected: soft# umount /mnt/sde3 soft# mount -t btrfs -o commit=10 /dev/sde3 /mnt/sde3 soft# /usr/bin/time dd iflag=fullblock if=/dev/sda6 of=/mnt/sde3/testfile bs=1M count=1 conv=fsync 1+0 records in 1+0 records out 1048576 bytes (10 GB) copied, 103.747 s, 101 MB/s 0.00user 11.56system 1:44.86elapsed 11%CPU (0avgtext+0avgdata 3072maxresident)k 20480672inputs+20498272outputs (1major+349minor)pagefaults 0swaps soft# filefrag /mnt/sde3/testfile /mnt/sde3/testfile: 11 extents found versus: soft# umount /mnt/sde3 soft# mount -t btrfs -o commit=10,compress=lzo,compress-force /dev/sde3 /mnt/sde3 soft# /usr/bin/time dd iflag=fullblock if=/dev/sda6 of=/mnt/sde3/testfile bs=1M count=1 conv=fsync 1+0 records in 1+0 records out 1048576 bytes (10 GB) copied, 109.051 s, 96.2 MB/s 0.02user 13.03system 1:49.49elapsed 11%CPU (0avgtext+0avgdata 3068maxresident)k 20494784inputs+20492320outputs (1major+347minor)pagefaults 0swaps soft# filefrag /mnt/sde3/testfile /mnt/sde3/testfile: 49287 extents found Most the latter extents are mercifully rather contiguous, their size is just limited by the compression code, here is an extract from 'filefrag -v' from around the middle: 24757: 1321888.. 1321919: 11339579.. 11339610: 32: 11339594: 24758: 1321920.. 1321951: 11339597.. 11339628: 32: 11339611: 24759: 1321952.. 1321983: 11339615.. 11339646: 32: 11339629: 24760: 1321984.. 1322015: 11339632.. 11339663: 32: 11339647: 24761: 1322016.. 1322047: 11339649.. 11339680: 32: 11339664: 24762: 1322048.. 1322079: 11339667.. 11339698: 32: 11339681: 24763: 1322080.. 1322111: 11339686.. 11339717: 32: 11339699: 24764: 1322112.. 1322143: 11339703.. 11339734: 32: 11339718: 24765: 1322144.. 1322175: 11339720.. 11339751: 32: 11339735: 24766: 1322176.. 1322207: 11339737.. 11339768: 32: 11339752: 24767: 1322208.. 1322239: 11339754.. 11339785: 32: 11339769: 24768: 1322240.. 1322271: 11339771.. 11339802: 32: 11339786: 24769: 1322272.. 1322303: 11339789.. 11339820: 32: 11339803: But again this is on a fresh empty Btrfs volume. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs + compression = slow performance and high cpu usage
On Fri, Jul 28, 2017 at 06:20:14PM +, William Muriithi wrote: > Hi Roman, > > > autodefrag > > This sure sounded like a good thing to enable? on paper? right?... > > The moment you see anything remotely weird about btrfs, this is the first > thing you have to disable and retest without. Oh wait, the first would be > qgroups, this one is second. > > What's the problem with autodefrag? I am also using it, so you caught my > attention when you implied that it shouldn't be used. According to docs, it > seem like one of the very mature feature of the filesystem. See below for > the doc I am referring to > > https://btrfs.wiki.kernel.org/index.php/Status > > I am using it as I assumed it could prevent the filesystem being too > fragmented long term, but never thought there was price to pay for using it It introduces additional I/O on writes, as it modifies a small area surrounding any write or cluster of writes. I'm not aware of it causing massive slowdowns, in the way the qgroups does in some situations. If your system is already marginal in terms of being able to support the I/O required, then turning on autodefrag will make things worse (but you may be heading for _much_ worse performance in the future as the FS becomes more fragmented -- depending on your write patterns and use case). Hugo. -- Hugo Mills | Great oxymorons of the world, no. 6: hugo@... carfax.org.uk | Mature Student http://carfax.org.uk/ | PGP: E2AB1DE4 | signature.asc Description: Digital signature
Re: [PATCH 2/2] btrfs: increase ctx->pos for delayed dir index
On Mon, Jul 24, 2017 at 03:14:26PM -0400, jo...@toxicpanda.com wrote: > From: Josef Bacik> > Our dir_context->pos is supposed to hold the next position we're > supposed to look. If we successfully insert a delayed dir index we > could end up with a duplicate entry because we don't increase ctx->pos > after doing the dir_emit. > Looks good. Reviewed-by: Liu Bo Thanks, -liubo > Signed-off-by: Josef Bacik > --- > fs/btrfs/delayed-inode.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c > index 8ae409b..19e4ad2 100644 > --- a/fs/btrfs/delayed-inode.c > +++ b/fs/btrfs/delayed-inode.c > @@ -1727,6 +1727,7 @@ int btrfs_readdir_delayed_dir_index(struct dir_context > *ctx, > > if (over) > return 1; > + ctx->pos++; > } > return 0; > } > -- > 2.7.4 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2][v2] btrfs: fix readdir deadlock with pagefault
On Mon, Jul 24, 2017 at 03:14:25PM -0400, jo...@toxicpanda.com wrote: > From: Josef Bacik> > Readdir does dir_emit while under the btree lock. dir_emit can trigger > the page fault which means we can deadlock. Fix this by allocating a > buffer on opening a directory and copying the readdir into this buffer > and doing dir_emit from outside of the tree lock. > > Signed-off-by: Josef Bacik > --- > v1->v2: > - use kzalloc instead of alloc_page(). > - make struct btrfs_file_private so you can still start a userspace trans on a > directory. > > fs/btrfs/ctree.h | 5 +++ > fs/btrfs/file.c | 9 - > fs/btrfs/inode.c | 107 > +-- > fs/btrfs/ioctl.c | 19 ++ > 4 files changed, 107 insertions(+), 33 deletions(-) > > diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h > index 5ee9f10..33e942b 100644 > --- a/fs/btrfs/ctree.h > +++ b/fs/btrfs/ctree.h > @@ -1264,6 +1264,11 @@ struct btrfs_root { > atomic64_t qgroup_meta_rsv; > }; > > +struct btrfs_file_private { > + struct btrfs_trans_handle *trans; > + void *filldir_buf; > +}; > + > static inline u32 btrfs_inode_sectorsize(const struct inode *inode) > { > return btrfs_sb(inode->i_sb)->sectorsize; > diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c > index 0f102a1..1897c3b 100644 > --- a/fs/btrfs/file.c > +++ b/fs/btrfs/file.c > @@ -1973,8 +1973,15 @@ static ssize_t btrfs_file_write_iter(struct kiocb > *iocb, > > int btrfs_release_file(struct inode *inode, struct file *filp) > { > - if (filp->private_data) > + struct btrfs_file_private *private = filp->private_data; > + > + if (private && private->trans) > btrfs_ioctl_trans_end(filp); > + if (private && private->filldir_buf) > + kfree(private->filldir_buf); > + kfree(private); > + filp->private_data = NULL; > + > /* >* ordered_data_close is set by settattr when we are about to truncate >* a file from a non-zero size to a zero size. This tries to > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > index 9a4413a..bbdbeea 100644 > --- a/fs/btrfs/inode.c > +++ b/fs/btrfs/inode.c > @@ -5877,25 +5877,73 @@ unsigned char btrfs_filetype_table[] = { > DT_UNKNOWN, DT_REG, DT_DIR, DT_CHR, DT_BLK, DT_FIFO, DT_SOCK, DT_LNK > }; > > +/* > + * All this infrastructure exists because dir_emit can fault, and we are > holding > + * the tree lock when doing readdir. For now just allocate a buffer and copy > + * our information into that, and then dir_emit from the buffer. This is > + * similar to what NFS does, only we don't keep the buffer around in > pagecache > + * because I'm afraid I'll fuck that up. Long term we need to make filldir > do > + * copy_to_user_inatomic so we don't have to worry about page faulting under > the > + * tree lock. > + */ > +static int btrfs_opendir(struct inode *inode, struct file *file) > +{ > + struct btrfs_file_private *private; > + > + private = kzalloc(sizeof(struct btrfs_file_private), GFP_KERNEL); > + if (!private) > + return -ENOMEM; > + private->filldir_buf = kzalloc(PAGE_SIZE, GFP_KERNEL); > + if (!private->filldir_buf) { > + kfree(private); > + return -ENOMEM; > + } > + file->private_data = private; > + return 0; > +} > + > +struct dir_entry { > + u64 ino; > + u64 offset; > + unsigned type; > + int name_len; > +}; > + > +static int btrfs_filldir(void *addr, int entries, struct dir_context *ctx) > +{ > + while (entries--) { > + struct dir_entry *entry = addr; > + char *name = (char *)(entry + 1); > + ctx->pos = entry->offset; > + if (!dir_emit(ctx, name, entry->name_len, entry->ino, > + entry->type)) > + return 1; > + addr += sizeof(struct dir_entry) + entry->name_len; > + ctx->pos++; > + } > + return 0; > +} > + > static int btrfs_real_readdir(struct file *file, struct dir_context *ctx) > { > struct inode *inode = file_inode(file); > struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); > struct btrfs_root *root = BTRFS_I(inode)->root; > + struct btrfs_file_private *private = file->private_data; > struct btrfs_dir_item *di; > struct btrfs_key key; > struct btrfs_key found_key; > struct btrfs_path *path; > + void *addr; > struct list_head ins_list; > struct list_head del_list; > int ret; > struct extent_buffer *leaf; > int slot; > - unsigned char d_type; > - int over = 0; > - char tmp_name[32]; > char *name_ptr; > int name_len; > + int entries = 0; > + int total_len = 0; > bool put = false; > struct btrfs_key location; > > @@ -5906,12 +5954,14 @@ static int btrfs_real_readdir(struct file *file, > struct dir_context *ctx) > if (!path)
RE: Btrfs + compression = slow performance and high cpu usage
Hi Roman, > autodefrag This sure sounded like a good thing to enable? on paper? right?... The moment you see anything remotely weird about btrfs, this is the first thing you have to disable and retest without. Oh wait, the first would be qgroups, this one is second. What's the problem with autodefrag? I am also using it, so you caught my attention when you implied that it shouldn't be used. According to docs, it seem like one of the very mature feature of the filesystem. See below for the doc I am referring to https://btrfs.wiki.kernel.org/index.php/Status I am using it as I assumed it could prevent the filesystem being too fragmented long term, but never thought there was price to pay for using it Regards, William -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs + compression = slow performance and high cpu usage
> I am stuck with a problem of btrfs slow performance when using > compression. [ ... ] That to me looks like an issue with speed, not performance, and in particular with PEBCAK issues. As to high CPU usage, when you find a way to do both compression and checksumming without using much CPU time, please send patches urgently :-). In your case the increase in CPU time is bizarre. I have the Ubuntu 4.4 "lts-xenial" kernel and what you report does not happen here (with a few little changes): soft# grep 'model name' /proc/cpuinfo | sort -u model name : AMD FX(tm)-6100 Six-Core Processor soft# cpufreq-info | grep 'current CPU frequency' current CPU frequency is 3.30 GHz (asserted by call to hardware). current CPU frequency is 3.30 GHz (asserted by call to hardware). current CPU frequency is 3.30 GHz (asserted by call to hardware). current CPU frequency is 3.30 GHz (asserted by call to hardware). current CPU frequency is 3.30 GHz (asserted by call to hardware). current CPU frequency is 3.30 GHz (asserted by call to hardware). soft# lsscsi | grep 'sd[ae]' [0:0:0:0]diskATA HFS256G32MNB-220 3L00 /dev/sda [5:0:0:0]diskATA ST2000DM001-1CH1 CC44 /dev/sde soft# mkfs.btrfs -f /dev/sde3 [ ... ] soft# mount -t btrfs -o discard,autodefrag,compress=lzo,compress-force,commit=10 /dev/sde3 /mnt/sde3 soft# df /dev/sda6 /mnt/sde3 Filesystem 1M-blocks Used Available Use% Mounted on /dev/sda6 90048 76046 14003 85% / /dev/sde3 23756819235501 1% /mnt/sde3 The above is useful context information that was "amazingly" omitted from your reported. In dmesg I see (not the "force zlib compression"): [327730.917285] BTRFS info (device sde3): turning on discard [327730.917294] BTRFS info (device sde3): enabling auto defrag [327730.917300] BTRFS info (device sde3): setting 8 feature flag [327730.917304] BTRFS info (device sde3): force zlib compression [327730.917313] BTRFS info (device sde3): disk space caching is enabled [327730.917315] BTRFS: has skinny extents [327730.917317] BTRFS: flagging fs with big metadata feature [327730.920740] BTRFS: creating UUID tree and the result is: soft# pv -tpreb /dev/sda6 | time dd iflag=fullblock of=/mnt/sde3/testfile bs=1M count=1 oflag=direct 1+0 records in17MB/s] [==>] 11% ETA 0:15:06 1+0 records out 1048576 bytes (10 GB) copied, 112.845 s, 92.9 MB/s 0.05user 9.93system 1:53.20elapsed 8%CPU (0avgtext+0avgdata 3016maxresident)k 120inputs+20496000outputs (1major+346minor)pagefaults 0swaps 9.77GB 0:01:53 [88.3MB/s] [==>] 11% soft# btrfs fi df /mnt/sde3/ Data, single: total=10.01GiB, used=9.77GiB System, DUP: total=8.00MiB, used=16.00KiB Metadata, DUP: total=1.00GiB, used=11.66MiB GlobalReserve, single: total=16.00MiB, used=0.00B As it was running system CPU time was under 20% of one CPU: top - 18:57:29 up 3 days, 19:27, 4 users, load average: 5.44, 2.82, 1.45 Tasks: 325 total, 1 running, 324 sleeping, 0 stopped, 0 zombie %Cpu0 : 0.0 us, 2.3 sy, 0.0 ni, 91.3 id, 6.3 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu1 : 0.0 us, 1.3 sy, 0.0 ni, 78.5 id, 20.2 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu2 : 0.3 us, 5.8 sy, 0.0 ni, 81.0 id, 12.5 wa, 0.0 hi, 0.3 si, 0.0 st %Cpu3 : 0.3 us, 3.4 sy, 0.0 ni, 91.9 id, 4.4 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu4 : 0.3 us, 10.6 sy, 0.0 ni, 55.4 id, 33.7 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu5 : 0.0 us, 0.3 sy, 0.0 ni, 99.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem: 8120660 total, 5162236 used, 2958424 free, 4440100 buffers KiB Swap:0 total,0 used,0 free. 351848 cached Mem PID PPID USER PR NIVIRTRESDATA %CPU %MEM TIME+ TTY COMMAND 21047 21046 root 20 08872 26161364 12.9 0.0 0:02.31 pts/3dd iflag=fullblo+ 21045 3535 root 20 07928 1948 460 12.3 0.0 0:00.72 pts/3pv -tpreb /dev/s+ 21019 2 root 20 0 0 0 0 1.3 0.0 0:42.88 ? [kworker/u16:1] Of course "oflag=direct" is a rather "optimistic" option in this context, so I tried again with something more sensible: soft# pv -tpreb /dev/sda6 | time dd iflag=fullblock of=/mnt/sde3/testfile bs=1M count=1 conv=fsync 1+0 records in.4MB/s] [==>] 11% ETA 0:14:41 1+0 records out 1048576 bytes (10 GB) copied, 110.523 s, 94.9 MB/s 0.03user 8.94system 1:50.71elapsed 8%CPU (0avgtext+0avgdata 3024maxresident)k 136inputs+20499648outputs (1major+348minor)pagefaults 0swaps 9.77GB 0:01:50 [90.3MB/s] [==>] 11% soft# btrfs fi df /mnt/sde3/ Data, single: total=7.01GiB, used=6.35GiB System, DUP: total=8.00MiB, used=16.00KiB Metadata, DUP: total=1.00GiB, used=15.81MiB GlobalReserve,
Re: Btrfs + compression = slow performance and high cpu usage
On Fri, 28 Jul 2017 17:40:50 +0100 (BST) "Konstantin V. Gavrilenko"wrote: > Hello list, > > I am stuck with a problem of btrfs slow performance when using compression. > > when the compress-force=lzo mount flag is enabled, the performance drops to > 30-40 mb/s and one of the btrfs processes utilises 100% cpu time. > mount options: btrfs > relatime,discard,autodefrag,compress=lzo,compress-force,space_cache=v2,commit=10 It does not work like that, you need to set compress-force=lzo (and remove compress=). With your setup I believe you currently use compress-force[=zlib](default), overriding compress=lzo, since it's later in the options order. Secondly, > autodefrag This sure sounded like a good thing to enable? on paper? right?... The moment you see anything remotely weird about btrfs, this is the first thing you have to disable and retest without. Oh wait, the first would be qgroups, this one is second. Finally, what is the reasoning behind "commit=10", and did you check with the default value of 30? -- With respect, Roman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Btrfs incremental send | receive fails with Error: File not found
I often hit the following error when doing incremental btrfs send-receive: Btrfs incremental send | receive fails with Error: File not found Sometimes I can do two-three incremental snapshots, but then the same error (different file) happens again. It seems that the files were changed or replaced between snapshots, which is causing the problems for send-receive. I have tried to delete all snapshots and started over but the problem comes back, so I think it must be a bug. The source volume is: /mnt/storagePool (with RAID1 profile) with subvolume: volume/userData Backup disk is: /media/usb-backup (external USB disk) # cat /proc/version Linux version 4.13.0-rc2 (root@e350) (gcc version 6.3.0 (Gentoo 6.3.0 p1.0)) #2 SMP PREEMPT Fri Jul 28 14:25:15 CEST 2017 # btrfs version btrfs-progs v4.11.1 # btrfs fi show: Label: 'Backup' uuid: f021a039-87d6-4498-a0f5-6bbba3dfb1f1 Total devices 1 FS bytes used 362.85GiB devid 1 size 931.51GiB used 367.06GiB path /dev/sdf1 Label: 'pool' uuid: ea4f1d6d-c2c5-4247-a903-15b36ee276a7 Total devices 2 FS bytes used 362.33GiB devid 1 size 927.51GiB used 367.03GiB path /dev/sdc2 devid 2 size 927.51GiB used 367.03GiB path /dev/sdd2 (backup) /media/usb-backup/volumes/userData # btrfs sub list . ID 258 gen 30 top level 5 path scripts ID 1622 gen 3227 top level 5 path volumes/userData/userData.20170727T1222 ID 1999 gen 3251 top level 5 path volumes/userData/userData.20170727T2102 (source) /mnt/storagePool/snapshots # btrfs sub list . ID 262 gen 118703 top level 5 path volume/userData ID 1928 gen 118105 top level 5 path snapshots/userData.20170727T1222 ID 1930 gen 118151 top level 5 path snapshots/userData.20170727T2102 ID 1932 gen 118167 top level 5 path snapshots/userData.20170727T2300 ID 1936 gen 118390 top level 5 path snapshots/userData.20170728T0100 ID 1939 gen 118502 top level 5 path snapshots/userData.20170728T0200 ID 1955 gen 118667 top level 5 path snapshots/userData.20170728T1300 ID 1960 gen 118695 top level 5 path snapshots/userData.20170728T1700 ID 1962 gen 118699 top level 5 path snapshots/userData.20170728T1800 # btrfs subvolume list -p -a -c -g -u -q -R -t /mnt/storagePool/snapshots ID gen cgen parent top level parent_uuid received_uuid uuid path -- --- -- - --- - 260 118702 24 5 5 - 6e20167e-8d72-cc42-b486-10c6a5516ca7 dd86162c-4df2-d646-a65f-77768adc132d volume/mail 262 118703 39 5 5 - 8464242d-0e81-e84e-ba93-78b1c8f00fc9 94c256cb-970e-e349-a660-ff4d9291c829 volume/userData 506 118691 333 5 5 - d0c6ff24-1766-b049-abe9-80396795448f c759b1cc-106e-134a-8cef-f1da1bc5e169 volume/storageTemp 1469 78671 78671 5 5 - - 8a94524e-a956-c14b-bb8d-d453627f27d5 volume/mysql 1928 118105 118105 5 5 94c256cb-970e-e349-a660-ff4d9291c829 8464242d-0e81-e84e-ba93-78b1c8f00fc9 7aed8444-34a7-c54d-ae06-e0e80ead3c18 snapshots/userData.20170727T1222 1930 118151 118151 5 5 94c256cb-970e-e349-a660-ff4d9291c829 8464242d-0e81-e84e-ba93-78b1c8f00fc9 20b4fab3-f75c-4445-914a-23465e09626c snapshots/userData.20170727T2102 1932 118167 118167 5 5 94c256cb-970e-e349-a660-ff4d9291c829 8464242d-0e81-e84e-ba93-78b1c8f00fc9 2b0069dc-5d71-df49-9c32-d5e0f17c09e9 snapshots/userData.20170727T2300 1936 118390 118390 5 5 94c256cb-970e-e349-a660-ff4d9291c829 8464242d-0e81-e84e-ba93-78b1c8f00fc9 8aa3ea70-b703-b740-8012-373be0616720 snapshots/userData.20170728T0100 1939 118502 118502 5 5 94c256cb-970e-e349-a660-ff4d9291c829 8464242d-0e81-e84e-ba93-78b1c8f00fc9 ad84276f-a481-d04a-ad26-301dd79b158f snapshots/userData.20170728T0200 1955 118667 118667 5 5 94c256cb-970e-e349-a660-ff4d9291c829 8464242d-0e81-e84e-ba93-78b1c8f00fc9 605cf43c-5e01-9d4e-ad22-77488f0d3e90 snapshots/userData.20170728T1300 1960 118695 118695 5 5 94c256cb-970e-e349-a660-ff4d9291c829 8464242d-0e81-e84e-ba93-78b1c8f00fc9 31c72ce0-5765-b042-a073-8c4296e111ec snapshots/userData.20170728T1700 1962 118699 118699 5 5 94c256cb-970e-e349-a660-ff4d9291c829 8464242d-0e81-e84e-ba93-78b1c8f00fc9 feadb1df-867b-7245-86d0-5472cd3c899b snapshots/userData.20170728T1800 # btrfs subvolume list -p -a -c -g -u -q -R -t /media/usb-backup/volumes/userData ID gen cgen parent top level parent_uuid received_uuid uuid path -- --- -- - --- - 258 30 9 5 5 - - 95dafde0-677c-7542-9d18-9bbfdbf7c9b3 scripts 1622 3227 2532 5 5 - 8464242d-0e81-e84e-ba93-78b1c8f00fc9 cfe52e52-b7dd-7e48-8616-43286f5a11e0 volumes/userData/userData.20170727T1222 1999 3251 3224 5 5
Btrfs + compression = slow performance and high cpu usage
Hello list, I am stuck with a problem of btrfs slow performance when using compression. when the compress-force=lzo mount flag is enabled, the performance drops to 30-40 mb/s and one of the btrfs processes utilises 100% cpu time. mount options: btrfs relatime,discard,autodefrag,compress=lzo,compress-force,space_cache=v2,commit=10 The command I am testing the write throughput is # pv -tpreb /dev/sdb | dd of=./testfile bs=1M oflag=direct # top -d 1 top - 15:49:13 up 1:52, 2 users, load average: 5.28, 2.32, 1.39 Tasks: 320 total, 2 running, 318 sleeping, 0 stopped, 0 zombie %Cpu0 : 0.0 us, 2.0 sy, 0.0 ni, 77.0 id, 21.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu1 : 0.0 us, 1.0 sy, 0.0 ni, 90.0 id, 9.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu2 : 0.0 us, 1.0 sy, 0.0 ni, 72.0 id, 27.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu3 : 0.0 us,100.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu4 : 0.0 us, 1.0 sy, 0.0 ni, 57.0 id, 42.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu5 : 0.0 us, 0.0 sy, 0.0 ni, 96.0 id, 4.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu6 : 0.0 us, 0.0 sy, 0.0 ni, 94.0 id, 6.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu7 : 0.0 us, 1.0 sy, 0.0 ni, 95.1 id, 3.9 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu8 : 1.0 us, 2.0 sy, 0.0 ni, 24.0 id, 73.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu9 : 0.0 us, 0.0 sy, 0.0 ni, 81.8 id, 18.2 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu10 : 1.0 us, 0.0 sy, 0.0 ni, 98.0 id, 1.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu11 : 0.0 us, 2.0 sy, 0.0 ni, 83.3 id, 14.7 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem : 32934136 total, 10137496 free, 602244 used, 22194396 buff/cache KiB Swap:0 total,0 free,0 used. 30525664 avail Mem PID USER PR NIVIRTRESSHR S %CPU %MEM TIME+ COMMAND 37017 root 20 0 0 0 0 R 100.0 0.0 0:32.42 kworker/u49:8 36732 root 20 0 0 0 0 D 4.0 0.0 0:02.40 btrfs-transacti 40105 root 20 08388 3040 2000 D 4.0 0.0 0:02.88 dd The keyworker process that causes the high cpu usage is most likely searching for the free space. # echo l > /proc/sysrq-trigger # dmest -T Fri Jul 28 15:57:51 2017] CPU: 1 PID: 36430 Comm: kworker/u49:2 Not tainted 4.10.0-28-generic #32~16.04.2-Ubuntu [Fri Jul 28 15:57:51 2017] Hardware name: Supermicro X8DTL/X8DTL, BIOS 2.1b 11/16/2012 [Fri Jul 28 15:57:51 2017] Workqueue: btrfs-delalloc btrfs_delalloc_helper [btrfs] [Fri Jul 28 15:57:51 2017] task: 9ddce6206a40 task.stack: aa9121f6c000 [Fri Jul 28 15:57:51 2017] RIP: 0010:rb_next+0x1e/0x40 [Fri Jul 28 15:57:51 2017] RSP: 0018:aa9121f6fb40 EFLAGS: 0282 [Fri Jul 28 15:57:51 2017] RAX: 9dddc34df1b0 RBX: 0001 RCX: 1000 [Fri Jul 28 15:57:51 2017] RDX: 9dddc34df708 RSI: 9ddccaf470a4 RDI: 9dddc34df2d0 [Fri Jul 28 15:57:51 2017] RBP: aa9121f6fb40 R08: 0001 R09: 3000 [Fri Jul 28 15:57:51 2017] R10: R11: 0002 R12: 9ddccaf47080 [Fri Jul 28 15:57:51 2017] R13: 1000 R14: aa9121f6fc50 R15: 9dddc34df2d0 [Fri Jul 28 15:57:51 2017] FS: () GS:9ddcefa4() knlGS: [Fri Jul 28 15:57:51 2017] CS: 0010 DS: ES: CR0: 80050033 [Fri Jul 28 15:57:51 2017] Call Trace:_space_for_alloc+0xde/0x270 [btrfs] [Fri Jul 28 15:57:51 2017] btrfs_find_space_for_alloc+0xde/0x270 [btrfs] [Fri Jul 28 15:57:51 2017] find_free_extent.isra.68+0x3c6/0x1040 [btrfs]s] [Fri Jul 28 15:57:51 2017] btrfs_reserve_extent+0xab/0x210 [btrfs]btrfs] [Fri Jul 28 15:57:51 2017] submit_compressed_extents+0x154/0x580 [btrfs]s] [Fri Jul 28 15:57:51 2017] ? submit_compressed_extents+0x580/0x580 [btrfs] [Fri Jul 28 15:57:51 2017] async_cow_submit+0x82/0x90 [btrfs]00 [btrfs] [Fri Jul 28 15:57:51 2017] btrfs_scrubparity_helper+0x1fe/0x300 [btrfs] [Fri Jul 28 15:57:51 2017] btrfs_delalloc_helper+0xe/0x10 [btrfs] [Fri Jul 28 15:57:51 2017] process_one_work+0x16b/0x4a0a0 [Fri Jul 28 15:57:51 2017] worker_thread+0x4b/0x500+0x60/0x60 [Fri Jul 28 15:57:51 2017] kthread+0x109/0x1400x4a0/0x4a0 When the compression is turned off, I am able to get the maximum 500-600 mb/s write speed on this disk (raid array) with minimal cpu usage. mount options: relatime,discard,autodefrag,space_cache=v2,commit=10 # iostat -m 1 avg-cpu: %user %nice %system %iowait %steal %idle 0.080.007.74 10.770.00 81.40 Device:tpsMB_read/sMB_wrtn/sMB_readMB_wrtn sda2376.00 0.00
[GIT PULL] Btrfs fixes for 4.13-rc3
Hi, please pull the following btrfs fixes. They're addressing problems reported by users, and there's one more regression fix. Thanks. The next pull request will be sent by Chris, I'm heading off to vacation. The following changes since commit c3cfb656307583ddfea45375c10183737593c195: Btrfs: fix unexpected return value of bio_readpage_error (2017-07-14 20:42:37 +0200) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-4.13-part3 for you to fetch changes up to 0e4324a4c36b3eb5cd1f71cbbc38d888f919ebfc: btrfs: round down size diff when shrinking/growing device (2017-07-24 16:05:00 +0200) Filipe Manana (1): Btrfs: fix dir item validation when replaying xattr deletes Jeff Mahoney (1): btrfs: fix lockup in find_free_extent with read-only block groups Nikolay Borisov (1): btrfs: round down size diff when shrinking/growing device Omar Sandoval (1): Btrfs: fix early ENOSPC due to delalloc fs/btrfs/extent-tree.c | 11 +-- fs/btrfs/tree-log.c| 3 +-- fs/btrfs/volumes.c | 4 ++-- 3 files changed, 8 insertions(+), 10 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2][v2] btrfs: fix readdir deadlock with pagefault
On Mon, Jul 24, 2017 at 03:14:25PM -0400, jo...@toxicpanda.com wrote: > From: Josef Bacik> > Readdir does dir_emit while under the btree lock. dir_emit can trigger > the page fault which means we can deadlock. Fix this by allocating a > buffer on opening a directory and copying the readdir into this buffer > and doing dir_emit from outside of the tree lock. > > Signed-off-by: Josef Bacik Reviewed-by: David Sterba -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] Btrfs: add skeleton code for compression heuristic
On 28/07/2017 00:36, David Sterba wrote: On Mon, Jul 24, 2017 at 11:40:17PM +0800, Anand Jain wrote: Eg. files that are already compressed would increase the cpu consumption with compress-force, while they'd be hopefully detected as incompressible with 'compress' and clever heuristics. So the NOCOMPRESS bit would better reflect the status of the file. I thought 'compress' in above, is the compress option. Ah you mean to say compression algo .. got it. Right compress-force for incompressible-data is very expensive. And its also true that compress option for incompressible data is not at all expensive and its only one time. current NOCOMPRESS is based on trial and error method and is more accurate than heuristic also loss of cpu power is only one time ? Curreently, force-compress beats everything, so even a file with NOCOMPRESS will be compressed, all new writes will be passed to the compression and stored uncompressed eventually. It makes sense to me when you replace NOCOMPRESS with incompressible-data in the above statement. As in my understanding.. You will never have a file with NOCOMPRESS flag if compress-force option is used. Each time they compression code will run and fail, so it's not one time. Although you can say it's more 'accurate', it's also more expensive. yes. Expensive only in compress-force. May be the only opportunity that heuristic can facilitate is at the logic to monitor and reset the NOCOMPRESS, as of now there is no such a logic. The heurictic can be made adaptive, and examine data even for NOCOMPRESS files, but that's a few steps ahead of where we are now. Nice. Thanks, Anand -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 00/13] use rbtrees for preliminary backrefs
On Wed, Jul 12, 2017 at 04:20:05PM -0600, Edmund Nadolski wrote: > This patch series attempts to improve the performance of backref > searches by changing the prelim_refs implementation to use > rbtrees instead of lists. This also aims to reduce the soft > lockup occurences that can result when a backref search consumes > too much cpu time. > > Test runs of btrfs/130 show an improvement in the overall > run time of the test (shown below in seconds) as a function of > the number of extents: > > nr_extents:2565126401024 2048 > +---+-+---+---+-- > unpatched: 20186375220440419 >patched: 12 93203106022007 > > (Note, the current default value for nr_extents in btrfs/130 is > 4096, which takes a very long time to complete.) > > Changes for v3: > > Patch 08/13: > - Update changelog and comments for third rbtree. > - Fixed issue in resolve_indirect_refs() which prevented >module load when sanity checking was enabled. > > Patch 10/13: > - Fix TP_printk_btrfs format string per coding standards. > > Changes for v2: > > Patch 06/13: > - Added changelog description. > > Patch 07/13: > - Updated changelog description. > - Removed 'TODO' comment. > > Patch 08/13: > - Added code for proper iteration of missing keys. This adds >a third rbtree (.indirect_missing_keys in struct preftrees) >plus the requisite code in add_prelim_ref(), add_missing_keys(), >resolve_indirect_refs(), and find_parent_nodes(). > - Rename release_pref() to free_pref(). > - Replace WARN() with BUG_ON(). > - Remove 'TODO' comments and the unused 'merge_mode' enum. > > The other patches have no functional changes. Some have diff > context changes due to the above modifications. > > Edmund Nadolski (6): > btrfs: btrfs_check_shared should manage its own transaction > btrfs: remove ref_tree implementation from backref.c > btrfs: convert prelimary reference tracking to use rbtrees > btrfs: add cond_resched() calls when resolving backrefs > btrfs: allow backref search checks for shared extents > btrfs: clean up extraneous computations in add_delayed_refs > > Jeff Mahoney (7): > btrfs: struct-funcs, constify readers > btrfs: constify tracepoint arguments > btrfs: backref, constify some arguments > btrfs: backref, add unode_aux_to_inode_list helper > btrfs: backref, cleanup __ namespace abuse > btrfs: add a node counter to each of the rbtrees > btrfs: backref, add tracepoints for prelim_ref insertion and merging FYI, the whole patchset is now queued for 4.14. It's been in for-next for a long time and I haven't seen any problems related to it. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: fix assertion failure during fsync in no-holes mode
From: Filipe MananaWhen logging an inode in full mode that has an inline compressed extent that represents a range with a size matching the sector size (currently the same as the page size), has a trailing hole and the no-holes feature is enabled, we end up failing an assertion leading to a trace like the following: [141812.031528] assertion failed: len == i_size, file: fs/btrfs/tree-log.c, line: 4453 [141812.033069] [ cut here ] [141812.034330] kernel BUG at fs/btrfs/ctree.h:3452! [141812.035137] invalid opcode: [#1] PREEMPT SMP [141812.035932] Modules linked in: btrfs dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio dm_flakey dm_mod dax ppdev evdev ghash_clmulni_intel pcbc aesni_intel aes_x86_64 tpm_tis psmouse crypto_simd parport_pc sg pcspkr tpm_tis_core cryptd parport serio_raw glue_helper tpm i2c_piix4 i2c_core button sunrpc loop autofs4 ext4 crc16 jbd2 mbcache raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod sd_mod ata_generic virtio_scsi ata_piix floppy crc32c_intel libata scsi_mod virtio_pci virtio_ring e1000 virtio [last unloaded: btrfs] [141812.036790] CPU: 3 PID: 845 Comm: fdm-stress Tainted: GB W 4.12.3-btrfs-next-52+ #1 [141812.036790] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014 [141812.036790] task: 8801e6694180 task.stack: c90009004000 [141812.036790] RIP: 0010:assfail.constprop.18+0x1c/0x1e [btrfs] [141812.036790] RSP: 0018:c90009007bc0 EFLAGS: 00010282 [141812.036790] RAX: 0046 RBX: 88017512c008 RCX: 0001 [141812.036790] RDX: 88023fd95201 RSI: 8182264c RDI: [141812.036790] RBP: c90009007bc0 R08: 0001 R09: 0001 [141812.036790] R10: 1000 R11: 82f5a0c9 R12: 88014e5947e8 [141812.036790] R13: 000b4000 R14: 8801b234d008 R15: [141812.036790] FS: 7fdba6ffd700() GS:88023fd8() knlGS: [141812.036790] CS: 0010 DS: ES: CR0: 80050033 [141812.036790] CR2: 7fdb9c10 CR3: 00016efa2000 CR4: 001406e0 [141812.036790] Call Trace: [141812.036790] btrfs_log_inode+0x9f0/0xd3d [btrfs] [141812.036790] ? __mutex_lock+0x120/0x3ce [141812.036790] btrfs_log_inode_parent+0x224/0x685 [btrfs] [141812.036790] ? lock_acquire+0x16b/0x1af [141812.036790] btrfs_log_dentry_safe+0x60/0x7b [btrfs] [141812.036790] btrfs_sync_file+0x32e/0x3f8 [btrfs] [141812.036790] vfs_fsync_range+0x8a/0x9d [141812.036790] vfs_fsync+0x1c/0x1e [141812.036790] do_fsync+0x31/0x4a [141812.036790] SyS_fdatasync+0x13/0x17 [141812.036790] entry_SYSCALL_64_fastpath+0x18/0xad [141812.036790] RIP: 0033:0x7fdbac41a47d [141812.036790] RSP: 002b:7fdba6ffce30 EFLAGS: 0293 ORIG_RAX: 004b [141812.036790] RAX: ffda RBX: 81092c9f RCX: 7fdbac41a47d [141812.036790] RDX: 004cf0160a40 RSI: RDI: 0006 [141812.036790] RBP: c90009007f98 R08: R09: 0010 [141812.036790] R10: 02e8 R11: 0293 R12: 8110cd90 [141812.036790] R13: c90009007f78 R14: R15: [141812.036790] ? time_hardirqs_off+0x9/0x14 [141812.036790] ? trace_hardirqs_off_caller+0x1f/0xa3 [141812.036790] Code: c7 d6 61 6b a0 48 89 e5 e8 ba ef a8 e0 0f 0b 55 89 f1 48 c7 c2 6d 65 6b a0 48 89 fe 48 c7 c7 81 65 6b a0 48 89 e5 e8 9c ef a8 e0 <0f> 0b 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41 55 41 54 49 89 [141812.036790] RIP: assfail.constprop.18+0x1c/0x1e [btrfs] RSP: c90009007bc0 [141812.084448] ---[ end trace 44e472684c7a32cc ]--- Which happens because the code that logs a trailing hole when the no-holes feature is enabled, did not consider that a compressed inline extent can represent a range with a size matching the sector size, in which case expanding the inode's i_size, through a truncate operation, won't lead to padding with zeroes the page that represents the inline extent, and therefore the inline extent remains after the truncation. Fix this by adapting the assertion to accept inline extents representing data with a sector size length if, and only if, the inline extents are compressed. A sample and trivial reproducer (for systems with a 4K page size) for this issue: mkfs.btrfs -O no-holes -f /dev/sdc mount -o compress /dev/sdc /mnt xfs_io -f -c "pwrite -S 0xab 0 4K" /mnt/foobar sync xfs_io -c "truncate 32K" /mnt/foobar xfs_io -c "fsync" /mnt/foobar Signed-off-by: Filipe Manana --- fs/btrfs/tree-log.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index 3a11ae63676e..c02654cf4c8b 100644 --- a/fs/btrfs/tree-log.c +++
Btrfs progs release 4.12
Hi, btrfs-progs version 4.12 have been released. Although it's major number update, there are no major updates, besides the usual bugfixes and enhancements. Per user request, the tarball now contains the generated manual pages, as the build dependencies for documentation are not lightweight. If you configure with --disable-documentation, the generated *.gz are not touched and need to be manually copied to the destination path ($prefix/share/man/man[58]). Changes: * subvol show: new options --rootid, --uuid to show subvol by the given spec * convert: progress report fixes, found by tsan * image: progress report fixes, found by tsan * fix infinite looping in find-root, or when looking for free extents * other: * code refactoring * docs updates * build: ThreadSanitizer support * tests: stricter checks for mounted filesystem Tarballs: https://www.kernel.org/pub/linux/kernel/people/kdave/btrfs-progs/ Git: git://git.kernel.org/pub/scm/linux/kernel/git/kdave/btrfs-progs.git Shortlog: Adam Buchbinder (7): btrfs-progs: convert: Fix data race when reporting progress btrfs-progs: image: Fix data races when reporting progress btrfs-progs: image: fix typos in messages btrfs-progs: tests: Fix missing internal deps in check and misc tests btrfs-progs: Tighten integer types in print-tree btrfs-progs: build: Enable ThreadSanitizer, using D=tsan btrfs-progs: tests: Use '-t btrfs' mount option in tests Anand Jain (2): btrfs-progs: subvol show: fix the path use full_path as provided by the root info btrfs-progs: subvol show: add support to search subvolume by rootid or uuid David Sterba (8): btrfs-progs: docs: document conventions btrfs-progs: docs: move deprecated mount option to own section btrfs-progs: docs: enhance documentation of 'btrfs device ready' btrfs-progs: docs: adjust wording for subvol delete btrfs-progs: tests: enhance API to request type of the converted filesystem btrfs-progs: tests: use separate helper for mounting convert filesystems btrfs-progs: docs: update wording for compression mount options btrfs-progs: update CHANGES for v4.12 Justin Maggard (1): btrfs-progs: Fix an infinite loop in btrfs_next_bg Liu Bo (1): Btrfs-progs: fix infinite loop in find_free_extent Philipp Hahn (1): btrfs-progs: Fix slot >= nritems Qu Wenruo (61): btrfs-progs: Cleanup open-coded btrfs_chunk_item_size btrfs-progs: Remove deprecated leafsize usage btrfs-progs: Introduce sectorsize nodesize and stripesize members for btrfs_fs_info btrfs-progs: Refactor block sizes users in disk-io.c btrfs-progs: Refactor block sizes users in btrfs-corrupt-block.c btrfs-progs: Refactor block sizes users in ctree.c and ctree.h btrfs-progs: Refactor block sizes users in btrfs-map-logical.c btrfs-progs: Refactor block sizes users in chunk-recover.c btrfs-progs: Refactor block sizes users in backref.c btrfs-progs: Refactor block sizes users in cmds-restore.c btrfs-progs: Refactor nodesize user in extent_io.c btrfs-progs: Refactor nodesize users in image/main.c btrfs-progs: Refactor block sizes users in cmds-check.c btrfs-progs: Refactor nodesize user in btrfstune.c btrfs-progs: Refactor nodesize users in utils.c btrfs-progs: Refactor block sizes users in extent-tree.c btrfs-progs: Refactor nodesize user in print-tree.c btrfs-progs: Refactor nodesize users in qgroup-verify.c btrfs-progs: Refactor nodesize users in cmds-inspect-tree-stats.c btrfs-progs: Refactor sectorsize users in mkfs/main.c btrfs-progs: Refactor sectorsizes users in file-item.c btrfs-progs: Refactor sectorsize users in free-space-cache.c btrfs-progs: Refactor sectorsize users in file.c btrfs-progs: Refactor sectorsize users in volumes.c btrfs-progs: Refactor sectorsize users in free-space-tree.c btrfs-progs: Refactor sectorsize in convert/source-fs.c btrfs-progs: Refactor sectorsize users in convert/main.c btrfs-progs: Refactor sectorsize users in convert/source-ext2.c btrfs-progs: Refactor sectorsize users in cmds-inspect-dump-tree.c btrfs-progs: Remove block size members in btrfs_root btrfs-progs: Refactor btrfs_root paramters in btrfs-corrupt-block.c btrfs-progs: Refactor read_tree_block to get rid of btrfs_root btrfs-progs: Refactor read_node_slot function to get rid of btrfs_root parameter btrfs-progs: raid56: Introduce raid56 header for later recovery usage btrfs-progs: raid56: Introduce tables for RAID6 recovery btrfs-progs: raid56: Allow raid6 to recover 2 data stripes btrfs-progs: raid56: Allow raid6 to recover data and P btrfs-progs: Introduce wrapper to recover raid56 data btrfs-progs: Enhance chunk item validation check btrfs-progs: check: Reuse btrfs_check_chunk_valid in
Re: [PATCH] btrfs-progs: eliminate bogus IOC_DEV_INFO call
On Thu, Jul 27, 2017 at 9:24 PM, Hans van Kranenburgwrote: > Device ID numbers always start at 1, not at 0. The first IOC_DEV_INFO > call does not make sense, since it will always return ENODEV. When there is a btrfs-replace ongoing, there is a Device ID 0 > ioctl(3, BTRFS_IOC_DEV_INFO, {devid=0}) = -1 ENODEV (No such device) > > Signed-off-by: Hans van Kranenburg > --- > cmds-fi-usage.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/cmds-fi-usage.c b/cmds-fi-usage.c > index 101a0c4..52c4c62 100644 > --- a/cmds-fi-usage.c > +++ b/cmds-fi-usage.c > @@ -535,7 +535,7 @@ static int load_device_info(int fd, struct device_info > **device_info_ptr, > return 1; > } > > - for (i = 0, ndevs = 0 ; i <= fi_args.max_id ; i++) { > + for (i = 1, ndevs = 0 ; i <= fi_args.max_id ; i++) { > if (ndevs >= fi_args.num_devices) { > error("unexpected number of devices: %d >= %llu", > ndevs, > (unsigned long long)fi_args.num_devices); > -- > 2.11.0 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs: Remove redundant setting of uuid in btrfs_block_header.
btrfs_alloc_dev_extent currently unconditionally sets the uuid in the leaf block header the function is working with. This is unnecessary since this operation is peformed by the core btree handling code (splitting a node, allocating a new btree block etc). So let's remove it. Signed-off-by: Nikolay Borisov--- fs/btrfs/volumes.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 5a1913956f20..84501e9d486c 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1611,8 +1611,6 @@ static int btrfs_alloc_dev_extent(struct btrfs_trans_handle *trans, BTRFS_FIRST_CHUNK_TREE_OBJECTID); btrfs_set_dev_extent_chunk_offset(leaf, extent, chunk_offset); - write_extent_buffer_chunk_tree_uuid(leaf, fs_info->chunk_tree_uuid); - btrfs_set_dev_extent_length(leaf, extent, num_bytes); btrfs_mark_buffer_dirty(leaf); out: -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] btrfs: Simplify btrfs_alloc_dev_extent
On Fri, Jul 28, 2017 at 6:59 AM, Nikolay Borisovwrote: > > > On 27.07.2017 20:57, Filipe Manana wrote: >> On Thu, Jul 27, 2017 at 6:36 PM, Nikolay Borisov wrote: >>> Currently btrfs_alloc_dev_extent essentially open codes btrfs_insert_item. >>> So >>> let's remove the superfluous code, leaving only the important bits, namely >>> initialising the device extent and just calling btrfs_insert_item. So first >>> add >>> definition for the stack-based set/get function. And then use them. >>> Additionally, remove the code which sets the uuid of the block header, since >>> this is something which is already handled by the core btree code. >> >> Quite honestly, I don't see the value of this change at all. >> It doesn't make things simpler nor more readable nor nothing. >> We have many, really many places using btrfs_insert_empty_item() >> instead of calling btrfs_insert_item(), are you planning on sending a >> patch to do the replacement for each of them? What's the point? > > I beg you to differ. Some of the code in btrfs is a mess, it's working > but it's messy. There is constant violation of abstractions (as is the > case in this function, heck the uuid setting of the block header > function doesn't even belong here). The uuid setting is a different thing (and that's fine to go away), unrelated to using insert_empty_item() vs insert_item(), which is what I was referring to in my previous reply. > All of this hampers reading > comprehension of the code for newcomers. You are experienced in the code > and likely this doesn't apply to you but since I'm someone relatively > new to the code this has been my experience. And I believe any effort to > actually simplify the code, make it more coherent and succinct is a win > long-term. Well, this hasn't prevented me, or others that have started contributing to btrfs after I did, from being able to understand the code and do useful changes (otherwise such kind of patches would have landed long time ago). This kind of change won't save anyone's time understanding the code. Plus, if I want to go a bit more nitpick, this change of using btrfs_insert_item() is from a performance/efficiency point of view, worse as it requires an additional memory allocation/free (the device extent). > > I will wait for other feedback, if people feel patches like that are > just bikeshedding then I will refrain from such cleanups in the future. > >> >> Plus you are introducing now a memory leak. See below. > > Will fix it. > >> >>> >>> Signed-off-by: Nikolay Borisov >>> --- >>> fs/btrfs/ctree.h | 8 >>> fs/btrfs/volumes.c | 34 -- >>> 2 files changed, 20 insertions(+), 22 deletions(-) >>> >>> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h >>> index cd9497bcdb1e..567fbf186257 100644 >>> --- a/fs/btrfs/ctree.h >>> +++ b/fs/btrfs/ctree.h >>> @@ -1740,6 +1740,14 @@ BTRFS_SETGET_FUNCS(dev_extent_chunk_objectid, struct >>> btrfs_dev_extent, >>> BTRFS_SETGET_FUNCS(dev_extent_chunk_offset, struct btrfs_dev_extent, >>>chunk_offset, 64); >>> BTRFS_SETGET_FUNCS(dev_extent_length, struct btrfs_dev_extent, length, 64); >>> +BTRFS_SETGET_STACK_FUNCS(stack_dev_extent_chunk_tree, struct >>> btrfs_dev_extent, >>> +chunk_tree, 64); >>> +BTRFS_SETGET_STACK_FUNCS(stack_dev_extent_chunk_objectid, >>> +struct btrfs_dev_extent, chunk_objectid, 64); >>> +BTRFS_SETGET_STACK_FUNCS(stack_dev_extent_chunk_offset, struct >>> btrfs_dev_extent, >>> +chunk_offset, 64); >>> +BTRFS_SETGET_STACK_FUNCS(stack_dev_extent_length, struct btrfs_dev_extent, >>> +length, 64); >>> >>> static inline unsigned long btrfs_dev_extent_chunk_tree_uuid(struct >>> btrfs_dev_extent *dev) >>> { >>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c >>> index 5a1913956f20..94e98261dbd0 100644 >>> --- a/fs/btrfs/volumes.c >>> +++ b/fs/btrfs/volumes.c >>> @@ -1581,42 +1581,32 @@ static int btrfs_alloc_dev_extent(struct >>> btrfs_trans_handle *trans, >>> u64 chunk_offset, u64 start, u64 >>> num_bytes) >>> { >>> int ret; >>> - struct btrfs_path *path; >>> - struct btrfs_fs_info *fs_info = device->fs_info; >>> - struct btrfs_root *root = fs_info->dev_root; >>> + struct btrfs_root *root = device->fs_info->dev_root; >>> struct btrfs_dev_extent *extent; >>> - struct extent_buffer *leaf; >>> struct btrfs_key key; >>> >>> WARN_ON(!device->in_fs_metadata); >>> WARN_ON(device->is_tgtdev_for_dev_replace); >>> - path = btrfs_alloc_path(); >>> - if (!path) >>> + >>> + extent = kzalloc(sizeof(*extent), GFP_NOFS); >>> + if (!extent) >>> return -ENOMEM; >>> >>> key.objectid = device->devid; >>> key.offset = start; >>> key.type = BTRFS_DEV_EXTENT_KEY; >>> -
[PATCH v3] btrfs: Do not use data_alloc_cluster in ssd mode
This patch provides a band aid to improve the 'out of the box' behaviour of btrfs for disks that are detected as being an ssd. In a general purpose mixed workload scenario, the current ssd mode causes overallocation of available raw disk space for data, while leaving behind increasing amounts of unused fragmented free space. This situation leads to early ENOSPC problems which are harming user experience and adoption of btrfs as a general purpose filesystem. This patch modifies the data extent allocation behaviour of the ssd mode to make it behave identical to nossd mode. The metadata behaviour and additional ssd_spread option stay untouched so far. Recommendations for future development are to reconsider the current oversimplified nossd / ssd distinction and the broken detection mechanism based on the rotational attribute in sysfs and provide experienced users with a more flexible way to choose allocator behaviour for data and metadata, optimized for certain use cases, while keeping sane 'out of the box' default settings. The internals of the current btrfs code have more potential than what currently gets exposed to the user to choose from. The SSD story... In the first year of btrfs development, around early 2008, btrfs gained a mount option which enables specific functionality for filesystems on solid state devices. The first occurance of this functionality is in commit e18e4809, labeled "Add mount -o ssd, which includes optimizations for seek free storage". The effect on allocating free space for doing (data) writes is to 'cluster' writes together, writing them out in contiguous space, as opposed to a 'tetris' way of putting all separate writes into any free space fragment that fits (which is what the -o nossd behaviour does). A somewhat simplified explanation of what happens is that, when for example, the 'cluster' size is set to 2MiB, when we do some writes, the data allocator will search for a free space block that is 2MiB big, and put the writes in there. The ssd mode itself might allow a 2MiB cluster to be composed of multiple free space extents with some existing data in between, while the additional ssd_spread mount option kills off this option and requires fully free space. The idea behind this is (commit 536ac8ae): "The [...] clusters make it more likely a given IO will completely overwrite the ssd block, so it doesn't have to do an internal rwm cycle."; ssd block meaning nand erase block. So, effectively this means applying a "locality based algorithm" and trying to outsmart the actual ssd. Since then, various changes have been made to the involved code, but the basic idea is still present, and gets activated whenever the ssd mount option is active. This also happens by default, when the rotational flag as seen at /sys/block//queue/rotational is set to 0. However, there's a number of problems with this approach. First, what the optimization is trying to do is outsmart the ssd by assuming there is a relation between the physical address space of the block device as seen by btrfs and the actual physical storage of the ssd, and then adjusting data placement. However, since the introduction of the Flash Translation Layer (FTL) which is a part of the internal controller of an ssd, these attempts are futile. The use of good quality FTL in consumer ssd products might have been limited in 2008, but this situation has changed drastically soon after that time. Today, even the flash memory in your automatic cat feeding machine or your grandma's wheelchair has a full featured one. Second, the behaviour as described above results in the filesystem being filled up with badly fragmented free space extents because of relatively small pieces of space that are freed up by deletes, but not selected again as part of a 'cluster'. Since the algorithm prefers allocating a new chunk over going back to tetris mode, the end result is a filesystem in which all raw space is allocated, but which is composed of underutilized chunks with a 'shotgun blast' pattern of fragmented free space. Usually, the next problematic thing that happens is the filesystem wanting to allocate new space for metadata, which causes the filesystem to fail in spectacular ways. Third, the default mount options you get for an ssd ('ssd' mode enabled, 'discard' not enabled), in combination with spreading out writes over the full address space and ignoring freed up space leads to worst case behaviour in providing information to the ssd itself, since it will never learn that all the free space left behind is actually free. There are two ways to let an ssd know previously written data does not have to be preserved, which are sending explicit signals using discard or fstrim, or by simply overwriting the space with new data. The worst case behaviour is the btrfs ssd_spread mount option in combination with not having discard enabled. It has a side effect of minimizing the reuse of free space previously written in.