[Gluster-devel] Patches being posted by Facebook and plans thereof
Hi, You may have noticed that Facebook has started posting their patches to the newly created release-3.8-fb branch. The patches posted can be seen here [1]. The total possible set of patches that Facebook may post is ~300, this number may reduce based on fixes for issues that Facebook engineers already find in the 3.8 main branch, or things that they have already posted earlier and are part of the 3.8 code base. This effort is a voluntary and welcome contribution by Facebook to the rest of the Gluster community. This is really exciting and something we've wanted to encourage for a really long time, so we’re going to need everyone’s help to get this underway from here. The plan to help make the above a reality (as exchanged with Facebook engineers) is as follows: 1) Facebook will port all their patches to the special branch release-3.8-fb, where they have exclusive merge rights. 2) Facebook will also be porting the fixes and features into the Gluster master branch. This is to ensure that their work is incorporated after required due diligence around reviews, testing, etc. into master. 3) We request other members of the Gluster development community to keep a watch on the release-3.8-fb branch and pick out patches of interest, and help in the activity of porting these patches to master. This will ensure quicker movement of patches to master and hence deliver the overall value of these patches to the broader community. Maintainers, this is your explicit invitation to come participate. 4) At some future point, master would have caught up with release-3.8-fb branch, and hence the next LTM/STM release would have the same for general availability. - The current desired target is 3.11 STM for this activity to complete 5) Also, there are some useful/big/interesting features in the patches that may need more attention. Please participate in understanding these in more detail as soon as you can! Helping move these to master or extending them will help this process of merging features greatly. - Some such features would be: Halo, GFProxy, multi-threaded rebalance improvements, IPv6 changes, io-stats changes, throttling We will attempt to build some form of a tracker that lists patches in release-3.8-fb and missing in master (or vice-versa), to enable quicker participation by the community when attempting to choose which patch to take a stab at. Finally, thanks to Facebook for taking this initiative to strengthen our community, and for hosting some of us in their Cambridge,MA office to kick this off. Gluster devs, our turn now to make this happen! Regards, Amye, Jeff, Shyam, Vijay [1] Facebook patches against release-3.8-fb branch: http://review.gluster.org/#/q/project:glusterfs+branch:release-3.8-fb ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Release 3.10 feature proposal : Gluster Block Storage CLI Integration
On 12/21/2016 09:15 PM, Prasanna Kalever wrote: [Top posting] I agree with Niels and Shyam here. We are now trying to decouple the gluster-block cli from gluster cli. Since anyway it doesn't depend on core gluster changes, I think its better to move it out. Also I do not see a decent tool/util that does these jobs, hence its better we make it as a separate project (may be gluster-block). The design side changes are still in discussion, I shall give an update once we conclude on it. Since gluster-block plans to maintain it as separate project, I don't think we still need to make it as 3.10 feature. With gluster-block we will aim to support all possible versions of gluster. Should this be bundled with gluster? If so, it maybe a good thing to track that part against gluster releases (3.10 or otherwise). Just a thought. Thanks, -- Prasanna On Mon, Dec 19, 2016 at 5:10 PM, Shyam wrote: On 12/14/2016 01:38 PM, Niels de Vos wrote: On Wed, Dec 14, 2016 at 12:40:53PM +0530, Prasanna Kumar Kalever wrote: On 16-12-14 07:43:05, Niels de Vos wrote: On Fri, Dec 09, 2016 at 11:28:52AM +0530, Prasanna Kalever wrote: Hi all, As we know gluster block storage creation and maintanace is not simple today, as it involves all the manual steps mentioned at [1] To make this basic operations simple we would like to integrate the block story with gluster CLI. As part of it, we would like Introduce the following commands # gluster block create # gluster block modify # gluster block list # gluster block delete I am not sure why this needs to be done through the Gluster CLI. Creating a file on a (how to select?) volume, and then export that as a block device through tcmu-runner (iSCSI) seems more like a task similar to what libvirt does with VM images. May be not exactly, but similar Would it not be more suitable to make this part of whatever tcmu admin tools are available? I assume tcmu needs to address this, with similar configuration options for LVM and other backends too. Building on top of that may give users of tcmu a better experience. s/tcmu/tcmu-runner/ I don't think there are separate tools/utils for tcmu-runner as of now. Also currently we are using tcmu-runner to export the file in the gluster volume as a iSCSI block device, in the future we may move to qemu-tcmu (which does the same job of tcmu-runner, except it uses qemu gluster driver) for benefits like snapshots ? One of the main objections I have, is that the CLI is currently very 'dumb'. Integrating with it to have it generate the tcmu-configuration as well as let the (current management only!) CLI create the disk-images on a volume seem breaking the current seperation of tasks. Integrations are good to have, but they should be done on the appropriate level. Teaching the CLI all it needs to know about tcmu-runner, including setting suitable permissions on the disk-image on a volume, access permissions for the iSCSI protocol and possibly more seems quite a lot of effort to me. I prefer to keep the CLI as simple as possible, and any integration should use the low-level tools (CLI, gfapi, ...) that are available. +1, I agree. This seems more like a task for a tool using gfapi in parts for the file creation and other CLI/deploy options for managing tcmu-runner. The latter a more tcmu project, or gluster-block as the abstraction if we want to gain eyeballs into the support. When we integrate tcmu-runner now, people will hopefully use it. That means it can not easily be replaced by an other project. qemu-tcmu would be an addition to the tcmu-integration, leaving a huge maintainance burdon. I have a strong preference to see any integrations done on a higher level. If there are no tcmu-runner tools (like targetcli?) to configure iSCSI backends and other options, it may make sense to start a new project dedicated to iSCSI access for Gluster. If no suitable projects exist, a gluster-block-utils project can be created. Management utilities also benefit from being written in languages other than C, a new project offers you many options there ;-) Also configuring and running tcmu-runner on each node in the cluster for multipathing is something not easy (take the case where we have more than a dozen of node). If we can do these via gluster CLI with one simple command from any node, we can configure and run tcmu-runner on all the nodes. Right, sharing configurations between different servers is tricky. But you can also not assume that everyone can or want to run the iSCSI target on the Gluster storage servers themselves. For all other integrations that are similar, users like to have the flexibility to run the additional services (QEMU, Samba, NFS-Ganesha, ..) on seperate systems. If you can add such a consideration in the feature page, I'd appreciate it. Maybe other approaches have been discussed earlier as well? In that case, those approaches should probably be added too. Sure! We may be missing something, so be
Re: [Gluster-devel] Release 3.10 feature proposal : Gluster Block Storage CLI Integration
[Top posting] I agree with Niels and Shyam here. We are now trying to decouple the gluster-block cli from gluster cli. Since anyway it doesn't depend on core gluster changes, I think its better to move it out. Also I do not see a decent tool/util that does these jobs, hence its better we make it as a separate project (may be gluster-block). The design side changes are still in discussion, I shall give an update once we conclude on it. Since gluster-block plans to maintain it as separate project, I don't think we still need to make it as 3.10 feature. With gluster-block we will aim to support all possible versions of gluster. Thanks, -- Prasanna On Mon, Dec 19, 2016 at 5:10 PM, Shyam wrote: > On 12/14/2016 01:38 PM, Niels de Vos wrote: >> >> On Wed, Dec 14, 2016 at 12:40:53PM +0530, Prasanna Kumar Kalever wrote: >>> >>> On 16-12-14 07:43:05, Niels de Vos wrote: On Fri, Dec 09, 2016 at 11:28:52AM +0530, Prasanna Kalever wrote: > > Hi all, > > As we know gluster block storage creation and maintanace is not simple > today, as it involves all the manual steps mentioned at [1] > To make this basic operations simple we would like to integrate the > block story with gluster CLI. > > As part of it, we would like Introduce the following commands > > # gluster block create > # gluster block modify > # gluster block list > # gluster block delete I am not sure why this needs to be done through the Gluster CLI. Creating a file on a (how to select?) volume, and then export that as a block device through tcmu-runner (iSCSI) seems more like a task similar to what libvirt does with VM images. >>> >>> >>> May be not exactly, but similar >>> Would it not be more suitable to make this part of whatever tcmu admin tools are available? I assume tcmu needs to address this, with similar configuration options for LVM and other backends too. Building on top of that may give users of tcmu a better experience. >>> >>> >>> s/tcmu/tcmu-runner/ >>> >>> I don't think there are separate tools/utils for tcmu-runner as of now. >>> Also currently we are using tcmu-runner to export the file in the >>> gluster volume as a iSCSI block device, in the future we may move to >>> qemu-tcmu (which does the same job of tcmu-runner, except it uses >>> qemu gluster driver) for benefits like snapshots ? >> >> >> One of the main objections I have, is that the CLI is currently very >> 'dumb'. Integrating with it to have it generate the tcmu-configuration >> as well as let the (current management only!) CLI create the disk-images >> on a volume seem breaking the current seperation of tasks. Integrations >> are good to have, but they should be done on the appropriate level. >> >> Teaching the CLI all it needs to know about tcmu-runner, including >> setting suitable permissions on the disk-image on a volume, access >> permissions for the iSCSI protocol and possibly more seems quite a lot >> of effort to me. I prefer to keep the CLI as simple as possible, and any >> integration should use the low-level tools (CLI, gfapi, ...) that are >> available. > > > +1, I agree. This seems more like a task for a tool using gfapi in parts for > the file creation and other CLI/deploy options for managing tcmu-runner. The > latter a more tcmu project, or gluster-block as the abstraction if we want > to gain eyeballs into the support. > > >> >> When we integrate tcmu-runner now, people will hopefully use it. That >> means it can not easily be replaced by an other project. qemu-tcmu would >> be an addition to the tcmu-integration, leaving a huge maintainance >> burdon. >> >> I have a strong preference to see any integrations done on a higher >> level. If there are no tcmu-runner tools (like targetcli?) to configure >> iSCSI backends and other options, it may make sense to start a new >> project dedicated to iSCSI access for Gluster. If no suitable projects >> exist, a gluster-block-utils project can be created. Management >> utilities also benefit from being written in languages other than C, a >> new project offers you many options there ;-) >> >>> Also configuring and running tcmu-runner on each node in the cluster >>> for multipathing is something not easy (take the case where we have >>> more than a dozen of node). If we can do these via gluster CLI with >>> one simple command from any node, we can configure and run tcmu-runner >>> on all the nodes. >> >> >> Right, sharing configurations between different servers is tricky. But >> you can also not assume that everyone can or want to run the iSCSI >> target on the Gluster storage servers themselves. For all other >> integrations that are similar, users like to have the flexibility to run >> the additional services (QEMU, Samba, NFS-Ganesha, ..) on seperate >> systems. >> If you can add such a consideration in the feature page, I'd appreciate it. Maybe other approaches have been discusse
Re: [Gluster-devel] Assertion failed: lru_inode_ctx->block_num > 0
Just one more information I need from you. Assuming you have the coredump, could you attach it to gdb and print local->fop and tell me what fop it was when the crash happened? You'll need to switch to frame 3 in gdb to get the value of this variable. -Krutika On Wed, Dec 21, 2016 at 5:35 PM, Krutika Dhananjay wrote: > Thanks for this. The information seems sufficient at the moment. > Will get back to you on this if/when I find something. > > -Krutika > > On Mon, Dec 19, 2016 at 1:44 PM, qingwei wei wrote: > >> Hi Krutika, >> >> Sorry for the delay as i am busy with other works. Attached is the >> tar.gz file with client and server log, the gfid information on the >> shard folder (please look at test.0.0 file as the log is captured when >> i run fio on this file.) and also the print statement i put inside the >> code. Fyi, i did 2 runs this time and only the second run give me >> problem. Hope this information helps. >> >> Regards, >> >> Cw >> >> On Thu, Dec 15, 2016 at 8:02 PM, Krutika Dhananjay >> wrote: >> > Good that you asked. I'll try but be warned this will involve me coming >> back >> > to you with lot more questions. :) >> > >> > I've been trying this for the past two days (not to mention the fio run >> > takes >> > really long) and so far there has been no crash/assert failure. >> > >> > If you already have the core: >> > in frame 1, >> > 0. print block_num >> > 1. get lru_inode_ctx->stat.ia_gfid >> > 2. convert it to hex >> > 3. find the gfid in your backend that corresponds to this gfid and >> share its >> > path in your response >> > 4. print priv->inode_count >> > 5. and of course lru_inode_ctx->block_num :) >> > 6. Also attach the complete brick and client logs. >> > >> > -Krutika >> > >> > >> > On Thu, Dec 15, 2016 at 3:18 PM, qingwei wei >> wrote: >> >> >> >> Hi Krutika, >> >> >> >> Do you need anymore information? Do let me know as i can try on my >> >> test system. Thanks. >> >> >> >> Cw >> >> >> >> On Tue, Dec 13, 2016 at 12:17 AM, qingwei wei >> wrote: >> >> > Hi Krutika, >> >> > >> >> > You mean FIO command? >> >> > >> >> > Below is how i do the sequential write. This example i am using 400GB >> >> > file, for the SHARD_MAX_INODE=16, i use 300MB file. >> >> > >> >> > fio -group_reporting -ioengine libaio -directory /mnt/testSF-HDD1 >> >> > -fallocate none -direct 1 -filesize 400g -nrfiles 1 -openfiles 1 -bs >> >> > 256k -numjobs 1 -iodepth 2 -name test -rw write >> >> > >> >> > And after FIO complete the above workload, i do the random write >> >> > >> >> > fio -group_reporting -ioengine libaio -directory /mnt/testSF-HDD1 >> >> > -fallocate none -direct 1 -filesize 400g -nrfiles 1 -openfiles 1 -bs >> >> > 8k -numjobs 1 -iodepth 2 -name test -rw randwrite >> >> > >> >> > The error (Sometimes segmentation fault) only happen during random >> >> > write. >> >> > >> >> > The gluster volume is 3 replica volume with shard enable and 16MB >> >> > shard block size. >> >> > >> >> > Thanks. >> >> > >> >> > Cw >> >> > >> >> > On Tue, Dec 13, 2016 at 12:00 AM, Krutika Dhananjay >> >> > wrote: >> >> >> I tried but couldn't recreate this issue (even with SHARD_MAX_INODES >> >> >> being >> >> >> 16). >> >> >> Could you share the exact command you used? >> >> >> >> >> >> -Krutika >> >> >> >> >> >> On Mon, Dec 12, 2016 at 12:15 PM, qingwei wei >> >> >> wrote: >> >> >>> >> >> >>> Hi Krutika, >> >> >>> >> >> >>> Thanks. Looking forward to your reply. >> >> >>> >> >> >>> Cw >> >> >>> >> >> >>> On Mon, Dec 12, 2016 at 2:27 PM, Krutika Dhananjay >> >> >>> >> >> >>> wrote: >> >> >>> > Hi, >> >> >>> > >> >> >>> > First of all, apologies for the late reply. Couldn't find time to >> >> >>> > look >> >> >>> > into >> >> >>> > this >> >> >>> > until now. >> >> >>> > >> >> >>> > Changing SHARD_MAX_INODES value from 12384 to 16 is a cool trick! >> >> >>> > Let me try that as well and get back to you in some time. >> >> >>> > >> >> >>> > -Krutika >> >> >>> > >> >> >>> > On Thu, Dec 8, 2016 at 11:07 AM, qingwei wei < >> tcheng...@gmail.com> >> >> >>> > wrote: >> >> >>> >> >> >> >>> >> Hi, >> >> >>> >> >> >> >>> >> With the help from my colleague, we did some changes to the code >> >> >>> >> with >> >> >>> >> reduce number of SHARD_MAX_INODES (from 16384 to 16) and also >> >> >>> >> include >> >> >>> >> the printing of blk_num inside __shard_update_shards_inode_list. >> We >> >> >>> >> then execute fio to first do sequential write of 300MB file. >> After >> >> >>> >> this run completed, we then use fio to generate random write >> (8k). >> >> >>> >> And >> >> >>> >> during this random write run, we found that there is situation >> >> >>> >> where >> >> >>> >> the blk_num is negative number and this trigger the following >> >> >>> >> assertion. >> >> >>> >> >> >> >>> >> GF_ASSERT (lru_inode_ctx->block_num > 0); >> >> >>> >> >> >> >>> >> [2016-12-08 03:16:34.217582] E >> >> >>> >> [shard.c:468:__shard_update_shards_inode_list] >> >> >>> >> >> >> >>> >> >> >> >>> >> >> >> >>> >> (-->/usr/local/lib/glusterfs/3.7.
Re: [Gluster-devel] Assertion failed: lru_inode_ctx->block_num > 0
Thanks for this. The information seems sufficient at the moment. Will get back to you on this if/when I find something. -Krutika On Mon, Dec 19, 2016 at 1:44 PM, qingwei wei wrote: > Hi Krutika, > > Sorry for the delay as i am busy with other works. Attached is the > tar.gz file with client and server log, the gfid information on the > shard folder (please look at test.0.0 file as the log is captured when > i run fio on this file.) and also the print statement i put inside the > code. Fyi, i did 2 runs this time and only the second run give me > problem. Hope this information helps. > > Regards, > > Cw > > On Thu, Dec 15, 2016 at 8:02 PM, Krutika Dhananjay > wrote: > > Good that you asked. I'll try but be warned this will involve me coming > back > > to you with lot more questions. :) > > > > I've been trying this for the past two days (not to mention the fio run > > takes > > really long) and so far there has been no crash/assert failure. > > > > If you already have the core: > > in frame 1, > > 0. print block_num > > 1. get lru_inode_ctx->stat.ia_gfid > > 2. convert it to hex > > 3. find the gfid in your backend that corresponds to this gfid and share > its > > path in your response > > 4. print priv->inode_count > > 5. and of course lru_inode_ctx->block_num :) > > 6. Also attach the complete brick and client logs. > > > > -Krutika > > > > > > On Thu, Dec 15, 2016 at 3:18 PM, qingwei wei > wrote: > >> > >> Hi Krutika, > >> > >> Do you need anymore information? Do let me know as i can try on my > >> test system. Thanks. > >> > >> Cw > >> > >> On Tue, Dec 13, 2016 at 12:17 AM, qingwei wei > wrote: > >> > Hi Krutika, > >> > > >> > You mean FIO command? > >> > > >> > Below is how i do the sequential write. This example i am using 400GB > >> > file, for the SHARD_MAX_INODE=16, i use 300MB file. > >> > > >> > fio -group_reporting -ioengine libaio -directory /mnt/testSF-HDD1 > >> > -fallocate none -direct 1 -filesize 400g -nrfiles 1 -openfiles 1 -bs > >> > 256k -numjobs 1 -iodepth 2 -name test -rw write > >> > > >> > And after FIO complete the above workload, i do the random write > >> > > >> > fio -group_reporting -ioengine libaio -directory /mnt/testSF-HDD1 > >> > -fallocate none -direct 1 -filesize 400g -nrfiles 1 -openfiles 1 -bs > >> > 8k -numjobs 1 -iodepth 2 -name test -rw randwrite > >> > > >> > The error (Sometimes segmentation fault) only happen during random > >> > write. > >> > > >> > The gluster volume is 3 replica volume with shard enable and 16MB > >> > shard block size. > >> > > >> > Thanks. > >> > > >> > Cw > >> > > >> > On Tue, Dec 13, 2016 at 12:00 AM, Krutika Dhananjay > >> > wrote: > >> >> I tried but couldn't recreate this issue (even with SHARD_MAX_INODES > >> >> being > >> >> 16). > >> >> Could you share the exact command you used? > >> >> > >> >> -Krutika > >> >> > >> >> On Mon, Dec 12, 2016 at 12:15 PM, qingwei wei > >> >> wrote: > >> >>> > >> >>> Hi Krutika, > >> >>> > >> >>> Thanks. Looking forward to your reply. > >> >>> > >> >>> Cw > >> >>> > >> >>> On Mon, Dec 12, 2016 at 2:27 PM, Krutika Dhananjay > >> >>> > >> >>> wrote: > >> >>> > Hi, > >> >>> > > >> >>> > First of all, apologies for the late reply. Couldn't find time to > >> >>> > look > >> >>> > into > >> >>> > this > >> >>> > until now. > >> >>> > > >> >>> > Changing SHARD_MAX_INODES value from 12384 to 16 is a cool trick! > >> >>> > Let me try that as well and get back to you in some time. > >> >>> > > >> >>> > -Krutika > >> >>> > > >> >>> > On Thu, Dec 8, 2016 at 11:07 AM, qingwei wei > > >> >>> > wrote: > >> >>> >> > >> >>> >> Hi, > >> >>> >> > >> >>> >> With the help from my colleague, we did some changes to the code > >> >>> >> with > >> >>> >> reduce number of SHARD_MAX_INODES (from 16384 to 16) and also > >> >>> >> include > >> >>> >> the printing of blk_num inside __shard_update_shards_inode_list. > We > >> >>> >> then execute fio to first do sequential write of 300MB file. > After > >> >>> >> this run completed, we then use fio to generate random write > (8k). > >> >>> >> And > >> >>> >> during this random write run, we found that there is situation > >> >>> >> where > >> >>> >> the blk_num is negative number and this trigger the following > >> >>> >> assertion. > >> >>> >> > >> >>> >> GF_ASSERT (lru_inode_ctx->block_num > 0); > >> >>> >> > >> >>> >> [2016-12-08 03:16:34.217582] E > >> >>> >> [shard.c:468:__shard_update_shards_inode_list] > >> >>> >> > >> >>> >> > >> >>> >> > >> >>> >> (-->/usr/local/lib/glusterfs/3.7.17/xlator/features/shard. > so(shard_common_lookup_shards_cbk+0x2d) > >> >>> >> [0x7f7300930b6d] > >> >>> >> > >> >>> >> > >> >>> >> > >> >>> >> -->/usr/local/lib/glusterfs/3.7.17/xlator/features/shard.so( > shard_link_block_inode+0xce) > >> >>> >> [0x7f7300930b1e] > >> >>> >> > >> >>> >> > >> >>> >> > >> >>> >> -->/usr/local/lib/glusterfs/3.7.17/xlator/features/shard.so( > __shard_update_shards_inode_list+0x36b) > >> >>> >> [0x7f730092bf5b] ) 0-: Assertion failed: > lru_inode_ctx->block_num
Re: [Gluster-devel] Invitation: Re: Question on merging zfs snapshot supp... @ Tue Dec 20, 2016 2:30pm - 3:30pm (IST) (sri...@marirs.net.in)
On Wed, Dec 21, 2016 at 10:00:17AM +0530, sri...@marirs.net.in wrote: > In continuation to the discussion we'd yesterday, I'd be working on the > change we'd initiated sometime back for pluggable FS specific snapshot > implementation Let me know how I can contribute the FFS implementation for NetBSD. In case it helps for designing the API, here is the relevant man page: http://netbsd.gw.com/cgi-bin/man-cgi?fss+.NONE+NetBSD-7.0.2 Basically, you find iterate on /dev/fss[0-9], open it cand call ioctl FSSIOCGET to checkif it is already in use. Once you have an unused one, ioctl FSSIOCSET to cast the snapshot. It requires a backing store file, which may be created by mktemp() and unlinked immediatly. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel