Re: GFS, what's remaining
On Thu, Sep 01, 2005 at 01:35:23PM +0200, Arjan van de Ven wrote: > +static inline void glock_put(struct gfs2_glock *gl) > +{ > + if (atomic_read(&gl->gl_count) == 1) > + gfs2_glock_schedule_for_reclaim(gl); > + gfs2_assert(gl->gl_sbd, atomic_read(&gl->gl_count) > 0,); > + atomic_dec(&gl->gl_count); > +} > > this code has a race The first two lines of the function with the race are non-essential and could be removed. In the common case where there's no race, they just add efficiency by moving the glock to the reclaim list immediately. Otherwise, the scand thread would do it later when actively trying to reclaim glocks. > +static inline int queue_empty(struct gfs2_glock *gl, struct list_head *head) > +{ > + int empty; > + spin_lock(&gl->gl_spin); > + empty = list_empty(head); > + spin_unlock(&gl->gl_spin); > + return empty; > +} > > that looks like a racey interface to me... if so.. why bother locking at > all? The spinlock protects the list but is not the primary method of synchronizing processes that are working with a glock. When the list is in fact empty, there will be no race, and the locking wouldn't be necessary. In this case, the "glmutex" in the code fragment below is preventing any change in the list, so we can safely release the spinlock immediately. When the list is not empty, then a process could be adding another entry to the list without "glmutex" locked [1], making the spinlock necessary. In this case we quit after queue_empty() returns and don't do anything else, so releasing the spinlock immediately was still safe. [1] A process that already holds a glock (i.e. has a "holder" struct on the gl_holders list) is allowed to hold it again by adding another holder struct to the same list. It adds the second hold without locking glmutex. if (gfs2_glmutex_trylock(gl)) { if (gl->gl_ops == &gfs2_inode_glops) { struct gfs2_inode *ip = get_gl2ip(gl); if (ip && !atomic_read(&ip->i_count)) gfs2_inode_destroy(ip); } if (queue_empty(gl, &gl->gl_holders) && gl->gl_state != LM_ST_UNLOCKED) handle_callback(gl, LM_ST_UNLOCKED); gfs2_glmutex_unlock(gl); } There is a second way that queue_empty() is used, and that's within assertions that the list is empty. If the assertion is correct, locking isn't necessary; locking is only needed if there's already another bug causing the list to not be empty and the assertion to fail. > static int gi_skeleton(struct gfs2_inode *ip, struct gfs2_ioctl *gi, > +gi_filler_t filler) > +{ > + unsigned int size = gfs2_tune_get(ip->i_sbd, gt_lockdump_size); > + char *buf; > + unsigned int count = 0; > + int error; > + > + if (size > gi->gi_size) > + size = gi->gi_size; > + > + buf = kmalloc(size, GFP_KERNEL); > + if (!buf) > + return -ENOMEM; > + > + error = filler(ip, gi, buf, size, &count); > + if (error) > + goto out; > + > + if (copy_to_user(gi->gi_data, buf, count + 1)) > + error = -EFAULT; > > where does count get a sensible value? from filler() We'll add comments in the code to document the things above. Thanks, Dave - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: GFS, what's remaining
On Fri, Sep 02, 2005 at 11:17:08PM +0200, Andi Kleen wrote: > Andrew Morton <[EMAIL PROTECTED]> writes: > > > > > > > - Why GFS is better than OCFS2, or has functionality which OCFS2 cannot > > > > possibly gain (or vice versa) > > > > > > > > - Relative merits of the two offerings > > > > > > You missed the important one - people actively use it and have been for > > > some years. Same reason with have NTFS, HPFS, and all the others. On > > > that alone it makes sense to include. > > > > Again, that's not a technical reason. It's _a_ reason, sure. But what are > > the technical reasons for merging gfs[2], ocfs2, both or neither? > > There seems to be clearly a need for a shared-storage fs of some sort > for HA clusters and virtualized usage (multiple guests sharing a > partition). Shared storage can be more efficient than network file > systems like NFS because the storage access is often more efficient > than network access and it is more reliable because it doesn't have a > single point of failure in form of the NFS server. > > It's also a logical extension of the "failover on failure" clusters > many people run now - instead of only failing over the shared fs at > failure and keeping one machine idle the load can be balanced between > multiple machines at any time. > > One argument to merge both might be that nobody really knows yet which > shared-storage file system (GFS or OCFS2) is better. The only way to > find out would be to let the user base try out both, and that's most > practical when they're merged. > > Personally I think ocfs2 has nicer&cleaner code than GFS. > It seems to be more or less a 64bit ext3 with cluster support, while The "more or less" is what bothers me here - the first time I heard this, it sounded a little misleading, as I expected to find some kind of a patch to ext3 to make it 64 bit with extents and cluster support. Now I understand it a little better (thanks to Joel and Mark) And herein lies the issue where I tend to agree with Andrew on -- its really nice to have multiple filesystems innovating freely in their niches and eventually proving themselves in practice, without being bogged down by legacy etc. But at the same time, is there enough thought and discussion about where the fragmentation/diversification is really warranted, vs improving what is already there, or say incorporating the best of one into another, maybe over a period of time ? The number of filesystems seems to just keep growing, and supporting all of them isn't easy -- for users it isn't really easy to switch from one to another, and the justifications for choosing between them is sometimes confusing and burdensome from an administrator standpoint - one filesystem is good in certain conditions, another in others, stability levels may vary etc, and its not always possible to predict which aspect to prioritize. Now, with filesystems that have been around in production for a long time, the on-disk format becomes a major constraining factor, and the reason for having various legacy support around. Likewise, for some special purpose filesystems there really is a niche usage. But for new and sufficiently general purpose filesystems, with new on-disk structure, isn't it worth thinking this through and trying to get it right ? Yeah, it is a lot of work upfront ... but with double the people working on something, it just might get much better than what they individually can. Sometimes. BTW, I don't know if it is worth it in this particular case, but just something that worries me in general. > GFS seems to reinvent a lot more things and has somewhat uglier code. > On the other hand GFS' cluster support seems to be more aimed > at being a universal cluster service open for other usages too, > which might be a good thing. OCFS2s cluster seems to be more > aimed at only serving the file system. > > But which one works better in practice is really an open question. True, but what usually ends up happening is that this question can never quite be answered in black and white. So both just continue to exist and apps need to support both ... convergence becomes impossible and long term duplication inevitable. So at least having a clear demarcation/guideline of what situations each is suitable for upfront would be a good thing. That might also get some cross ocfs-gfs and ocfs-ext3 reviews in the process :) Regards Suparna -- Suparna Bhattacharya ([EMAIL PROTECTED]) Linux Technology Center IBM Software Lab, India - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: GFS, what's remaining
On Monday 05 September 2005 19:37, Joel Becker wrote: > OCFS2, the new filesystem, is fully general purpose. It > supports all the usual stuff, is quite fast... So I have heard, but isn't it time to quantify that? How do you think you would stack up here: http://www.caspur.it/Files/2005/01/10/1105354214692.pdf Regards, Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: GFS, what's remaining
On Monday 05 September 2005 22:03, Dmitry Torokhov wrote: > On Monday 05 September 2005 19:57, Daniel Phillips wrote: > > On Monday 05 September 2005 12:18, Dmitry Torokhov wrote: > > > On Monday 05 September 2005 10:49, Daniel Phillips wrote: > > > > On Monday 05 September 2005 10:14, Lars Marowsky-Bree wrote: > > > > > On 2005-09-03T01:57:31, Daniel Phillips <[EMAIL PROTECTED]> wrote: > > > > > > The only current users of dlms are cluster filesystems. There > > > > > > are zero users of the userspace dlm api. > > > > > > > > > > That is incorrect... > > > > > > > > Application users Lars, sorry if I did not make that clear. The > > > > issue is whether we need to export an all-singing-all-dancing dlm api > > > > from kernel to userspace today, or whether we can afford to take the > > > > necessary time to get it right while application writers take their > > > > time to have a good think about whether they even need it. > > > > > > If Linux fully supported OpenVMS DLM semantics we could start thinking > > > asbout moving our application onto a Linux box because our alpha server > > > is aging. > > > > > > That's just my user application writer $0.02. > > > > What stops you from trying it with the patch? That kind of feedback > > would be worth way more than $0.02. > > We do not have such plans at the moment and I prefer spending my free > time on tinkering with kernel, not rewriting some in-house application. > Besides, DLM is not the only thing that does not have a drop-in > replacement in Linux. > > You just said you did not know if there are any potential users for the > full DLM and I said there are some. I did not say "potential", I said there are zero dlm applications at the moment. Nobody has picked up the prototype (g)dlm api, used it in an application and said "gee this works great, look what it does". I also claim that most developers who think that using a dlm for application synchronization would be really cool are probably wrong. Use sockets for synchronization exactly as for a single-node, multi-tasking application and you will end up with less code, more obviously correct code, probably more efficient and... you get an optimal, single-node version for free. And I also claim that there is precious little reason to have a full-featured dlm in-kernel. Being in-kernel has no benefit for a userspace application. But being in-kernel does add kernel bloat, because there will be extra features lathered on that are not needed by the only in-kernel user, the cluster filesystem. In the case of your port, you'd be better off hacking up a userspace library to provide OpenVMS dlm semantics exactly, not almost. By the way, you said "alpha server" not "alpha servers", was that just a slip? Because if you don't have a cluster then why are you using a dlm? Regards, Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: GFS, what's remaining
On Monday 05 September 2005 19:57, Daniel Phillips wrote: > On Monday 05 September 2005 12:18, Dmitry Torokhov wrote: > > On Monday 05 September 2005 10:49, Daniel Phillips wrote: > > > On Monday 05 September 2005 10:14, Lars Marowsky-Bree wrote: > > > > On 2005-09-03T01:57:31, Daniel Phillips <[EMAIL PROTECTED]> wrote: > > > > > The only current users of dlms are cluster filesystems. There are > > > > > zero users of the userspace dlm api. > > > > > > > > That is incorrect... > > > > > > Application users Lars, sorry if I did not make that clear. The issue is > > > whether we need to export an all-singing-all-dancing dlm api from kernel > > > to userspace today, or whether we can afford to take the necessary time > > > to get it right while application writers take their time to have a good > > > think about whether they even need it. > > > > If Linux fully supported OpenVMS DLM semantics we could start thinking > > asbout moving our application onto a Linux box because our alpha server is > > aging. > > > > That's just my user application writer $0.02. > > What stops you from trying it with the patch? That kind of feedback would be > worth way more than $0.02. > We do not have such plans at the moment and I prefer spending my free time on tinkering with kernel, not rewriting some in-house application. Besides, DLM is not the only thing that does not have a drop-in replacement in Linux. You just said you did not know if there are any potential users for the full DLM and I said there are some. -- Dmitry - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: GFS, what's remaining
On Monday 05 September 2005 12:18, Dmitry Torokhov wrote: > On Monday 05 September 2005 10:49, Daniel Phillips wrote: > > On Monday 05 September 2005 10:14, Lars Marowsky-Bree wrote: > > > On 2005-09-03T01:57:31, Daniel Phillips <[EMAIL PROTECTED]> wrote: > > > > The only current users of dlms are cluster filesystems. There are > > > > zero users of the userspace dlm api. > > > > > > That is incorrect... > > > > Application users Lars, sorry if I did not make that clear. The issue is > > whether we need to export an all-singing-all-dancing dlm api from kernel > > to userspace today, or whether we can afford to take the necessary time > > to get it right while application writers take their time to have a good > > think about whether they even need it. > > If Linux fully supported OpenVMS DLM semantics we could start thinking > asbout moving our application onto a Linux box because our alpha server is > aging. > > That's just my user application writer $0.02. What stops you from trying it with the patch? That kind of feedback would be worth way more than $0.02. Regards, Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: GFS, what's remaining
On Mon, Sep 05, 2005 at 10:24:03PM +0200, Bernd Eckenfels wrote: > The whole point of the orcacle cluster filesystem as it was described in old > papers was about pfiles, control files and software, because you can easyly > use direct block access (with ASM) for tablespaces. OCFS, the original filesystem, only works for datafiles, logfiles, and other database data. It's currently used in serious anger by several major customers. Oracle's websites must have a list of them somewhere. We're talking many terabytes of datafiles. > Yes, I dont dispute the usefullness of OCFS for ORA_HOME (beside I think a > replicated filesystem makes more sense), I am just nor sure if anybody sane > would use it for tablespaces. OCFS2, the new filesystem, is fully general purpose. It supports all the usual stuff, is quite fast, and is what we expect folks to use for both ORACLE_HOME and datafiles in the future. Customers can, of course, use ASM or even raw devices. OCFS2 is as fast as raw devices, and far more manageable, so raw devices are probably not a choice for the future. ASM has its own management advantages, and we certainly expect customers to like it as well. But that doesn't mean people won't use OCFS2 for datafiles depending on their environment or needs. -- "The first requisite of a good citizen in this republic of ours is that he shall be able and willing to pull his weight." - Theodore Roosevelt Joel Becker Senior Member of Technical Staff Oracle E-mail: [EMAIL PROTECTED] Phone: (650) 506-8127 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Linux-cluster] Re: GFS, what's remaining
On Sun, Sep 04, 2005 at 09:37:15AM +0100, Alan Cox wrote: > I am curious why a lock manager uses open to implement its locking > semantics rather than using the locking API (POSIX locks etc) however. Because it is simple (how do you fcntl(2) from a shell fd?), has no ranges (what do you do with ranges passed in to fcntl(2) and you don't support them?), and has a well-known fork(2)/exec(2) pattern. fcntl(2) has a known but less intuitive fork(2) pattern. The real reason, though, is that we never considered fcntl(2). We could never think of a case when a process wanted a lock fd open but not locked. At least, that's my recollection. Mark might have more to comment. Joel -- "In the room the women come and go Talking of Michaelangelo." Joel Becker Senior Member of Technical Staff Oracle E-mail: [EMAIL PROTECTED] Phone: (650) 506-8127 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Linux-cluster] Re: GFS, what's remaining
Alan Cox <[EMAIL PROTECTED]> wrote: > > On Llu, 2005-09-05 at 12:53 -0700, Andrew Morton wrote: > > > - How are they ref counted > > > - What are the cleanup semantics > > > - How do I pass a lock between processes (AF_UNIX sockets wont work now) > > > - How do I poll on a lock coming free. > > > - What are the semantics of lock ownership > > > - What rules apply for inheritance > > > - How do I access a lock across threads. > > > - What is the permission model. > > > - How do I attach audit to it > > > - How do I write SELinux rules for it > > > - How do I use mount to make namespaces appear in multiple vservers > > > > > > and thats for starters... > > > > Return an fd from create_lockspace(). > > That only answers about four of the questions. The rest only come out if > create_lockspace behaves like a file system - in other words > create_lockspace is better known as either mkdir or mount. But David said that "We export our full dlm API through read/write/poll on a misc device.". That miscdevice will simply give us an fd. Hence my suggestion that the miscdevice be done away with in favour of a dedicated syscall which returns an fd. What does a filesystem have to do with this? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Linux-cluster] Re: GFS, what's remaining
On Llu, 2005-09-05 at 12:53 -0700, Andrew Morton wrote: > > - How are they ref counted > > - What are the cleanup semantics > > - How do I pass a lock between processes (AF_UNIX sockets wont work now) > > - How do I poll on a lock coming free. > > - What are the semantics of lock ownership > > - What rules apply for inheritance > > - How do I access a lock across threads. > > - What is the permission model. > > - How do I attach audit to it > > - How do I write SELinux rules for it > > - How do I use mount to make namespaces appear in multiple vservers > > > > and thats for starters... > > Return an fd from create_lockspace(). That only answers about four of the questions. The rest only come out if create_lockspace behaves like a file system - in other words create_lockspace is better known as either mkdir or mount. Its certainly viable to make the lock/unlock functions taken a fd, it's just not clear why the current lock/unlock functions we have won't do the job. Being able to extend the functionality to leases later on may be very powerful indeed and will fit the existing API - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: GFS, what's remaining
On Mon, Sep 05, 2005 at 10:24:03PM +0200, Bernd Eckenfels wrote: > On Mon, Sep 05, 2005 at 04:16:31PM +0200, Lars Marowsky-Bree wrote: > > That is the whole point why OCFS exists ;-) > > The whole point of the orcacle cluster filesystem as it was described in old > papers was about pfiles, control files and software, because you can easyly > use direct block access (with ASM) for tablespaces. The original OCFS was intended for use with pfiles and control files but very definitely *not* software (the ORACLE_HOME). It was not remotely general purpose. It also predated ASM by about a year or so, and the two solutions are complementary. Either one is a good choice for Oracle datafiles, depending upon your needs. > > No. Beyond the table spaces, there's also ORACLE_HOME; a cluster > > benefits in several aspects from a general-purpose SAN-backed CFS. > > Yes, I dont dispute the usefullness of OCFS for ORA_HOME (beside I think a > replicated filesystem makes more sense), I am just nor sure if anybody sane > would use it for tablespaces. Too many to mention here, but let's just say that some of the largest databases are running Oracle datafiles on top of OCFS1. Very large companies with very important data. > I guess I have to correct the artile in my german it blog :) (if somebody > can name productive customers). Yeah you should definitely update your blog ;-) If you need named references, we can give you loads of those. -kurt Kurt C. Hackel Oracle [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: GFS, what's remaining
On Mon, Sep 05, 2005 at 04:16:31PM +0200, Lars Marowsky-Bree wrote: > That is the whole point why OCFS exists ;-) The whole point of the orcacle cluster filesystem as it was described in old papers was about pfiles, control files and software, because you can easyly use direct block access (with ASM) for tablespaces. > No. Beyond the table spaces, there's also ORACLE_HOME; a cluster > benefits in several aspects from a general-purpose SAN-backed CFS. Yes, I dont dispute the usefullness of OCFS for ORA_HOME (beside I think a replicated filesystem makes more sense), I am just nor sure if anybody sane would use it for tablespaces. I guess I have to correct the artile in my german it blog :) (if somebody can name productive customers). Gruss Bernd -- http://itblog.eckenfels.net/archives/54-Cluster-Filesysteme.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Linux-cluster] Re: GFS, what's remaining
Alan Cox <[EMAIL PROTECTED]> wrote: > > On Llu, 2005-09-05 at 02:19 -0700, Andrew Morton wrote: > > > create_lockspace() > > > release_lockspace() > > > lock() > > > unlock() > > > > Neat. I'd be inclined to make them syscalls then. I don't suppose anyone > > is likely to object if we reserve those slots. > > If the locks are not file descriptors then answer the following: > > - How are they ref counted > - What are the cleanup semantics > - How do I pass a lock between processes (AF_UNIX sockets wont work now) > - How do I poll on a lock coming free. > - What are the semantics of lock ownership > - What rules apply for inheritance > - How do I access a lock across threads. > - What is the permission model. > - How do I attach audit to it > - How do I write SELinux rules for it > - How do I use mount to make namespaces appear in multiple vservers > > and thats for starters... Return an fd from create_lockspace(). - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Linux-cluster] Re: GFS, what's remaining
On Mon, Sep 05, 2005 at 05:24:33PM +0800, David Teigland wrote: > On Mon, Sep 05, 2005 at 01:54:08AM -0700, Andrew Morton wrote: > > David Teigland <[EMAIL PROTECTED]> wrote: > > > > > > We export our full dlm API through read/write/poll on a misc device. > > > > > > > inotify did that for a while, but we ended up going with a straight syscall > > interface. > > > > How fat is the dlm interface? ie: how many syscalls would it take? > > Four functions: > create_lockspace() > release_lockspace() > lock() > unlock() FWIW, it looks like we can agree on the core interface. ocfs2_dlm exports essentially the same functions: dlm_register_domain() dlm_unregister_domain() dlmlock() dlmunlock() I also implemented dlm_migrate_lockres() to explicitly remaster a lock on another node, but this isn't used by any callers today (except for debugging purposes). There is also some wiring between the fs and the dlm (eviction callbacks) to deal with some ordering issues between the two layers, but these could go if we get stronger membership. There are quite a few other functions in the "full" spec(1) that we didn't even attempt, either because we didn't require direct user<->kernel access or we just didn't need the function. As for the rather thick set of parameters expected in dlm calls, we managed to get dlmlock down to *ahem* eight, and the rest are fairly slim. Looking at the misc device that gfs uses, it seems like there is pretty much complete interface to the same calls you have in kernel, validated on the write() calls to the misc device. With dlmfs, we were seeking to lock down and simplify user access by using standard ast/bast/unlockast calls, using a file descriptor as an opaque token for a single lock, letting the vfs lifetime on this fd help with abnormal termination, etc. I think both the misc device and dlmfs are helpful and not necessarily mutually exclusive, and probably both are better approaches than exporting everything via loads of syscalls (which seems to be the VMS/opendlm model). -kurt 1. http://opendlm.sourceforge.net/cvsmirror/opendlm/docs/dlmbook_final.pdf Kurt C. Hackel Oracle [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Linux-cluster] Re: GFS, what's remaining
On Sad, 2005-09-03 at 21:46 -0700, Andrew Morton wrote: > Actually I think it's rather sick. Taking O_NONBLOCK and making it a > lock-manager trylock because they're kinda-sorta-similar-sounding? Spare > me. O_NONBLOCK means "open this file in nonblocking mode", not "attempt to > acquire a clustered filesystem lock". Not even close. The semantics of O_NONBLOCK on many other devices are "trylock" semantics. OSS audio has those semantics for example, as do regular files in the presence of SYS5 mandatory locks. While the latter is "try lock , do operation and then drop lock" the drivers using O_NDELAY are very definitely providing trylock semantics. I am curious why a lock manager uses open to implement its locking semantics rather than using the locking API (POSIX locks etc) however. Alan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Linux-cluster] Re: GFS, what's remaining
On Llu, 2005-09-05 at 02:19 -0700, Andrew Morton wrote: > > create_lockspace() > > release_lockspace() > > lock() > > unlock() > > Neat. I'd be inclined to make them syscalls then. I don't suppose anyone > is likely to object if we reserve those slots. If the locks are not file descriptors then answer the following: - How are they ref counted - What are the cleanup semantics - How do I pass a lock between processes (AF_UNIX sockets wont work now) - How do I poll on a lock coming free. - What are the semantics of lock ownership - What rules apply for inheritance - How do I access a lock across threads. - What is the permission model. - How do I attach audit to it - How do I write SELinux rules for it - How do I use mount to make namespaces appear in multiple vservers and thats for starters... Every so often someone decides that a deeply un-unix interface with new syscalls is a good idea. Every time history proves them totally bonkers. There are cases for new system calls but this doesn't seem one of them. Look at system 5 shared memory, look at system 5 ipc, and so on. You can't use common interfaces on them, you can't select on them, you can't sanely pass them by fd passing. All our existing locking uses the following behaviour fd = open(namespace, options) fcntl(.. lock ...) blah flush fcntl(.. unlock ...) close Unfortunately some people here seem to have forgotten WHY we do things this way. 1. The semantics of file descriptors are well understood by users and by programs. That makes programming easier and keeps code size down 2. Everyone knows how close() works including across fork 3. FD passing is an obscure art but understood and just works 4. Poll() is a standard understood interface 5. Ownership of files is a standard model 6. FD passing across fork/exec is controlled in a standard way 7. The semantics for threaded applications are defined 8. Permissions are a standard model 9. Audit just works with the same tools 9. SELinux just works with the same tools 10. I don't need specialist applications to see the system state (the whole point of sysfs yet someone wants to break it all again) 11. fcntl fd locking is a posix standard interface with precisely defined semantics. Our extensions including leases are very powerful 12. And yes - fcntl fd locking supports mandatory locking too. That also is standards based with precise semantics. Everyone understands how to use the existing locking operations. So if you use the existing interfaces with some small extensions if neccessary everyone understands how to use cluster locks. Isn't that neat - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: GFS, what's remaining
On Monday 05 September 2005 10:49, Daniel Phillips wrote: > On Monday 05 September 2005 10:14, Lars Marowsky-Bree wrote: > > On 2005-09-03T01:57:31, Daniel Phillips <[EMAIL PROTECTED]> wrote: > > > The only current users of dlms are cluster filesystems. There are zero > > > users of the userspace dlm api. > > > > That is incorrect... > > Application users Lars, sorry if I did not make that clear. The issue is > whether we need to export an all-singing-all-dancing dlm api from kernel to > userspace today, or whether we can afford to take the necessary time to get > it right while application writers take their time to have a good think about > whether they even need it. > If Linux fully supported OpenVMS DLM semantics we could start thinking asbout moving our application onto a Linux box because our alpha server is aging. That's just my user application writer $0.02. -- Dmitry - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: GFS, what's remaining
On Monday 05 September 2005 10:14, Lars Marowsky-Bree wrote: > On 2005-09-03T01:57:31, Daniel Phillips <[EMAIL PROTECTED]> wrote: > > The only current users of dlms are cluster filesystems. There are zero > > users of the userspace dlm api. > > That is incorrect... Application users Lars, sorry if I did not make that clear. The issue is whether we need to export an all-singing-all-dancing dlm api from kernel to userspace today, or whether we can afford to take the necessary time to get it right while application writers take their time to have a good think about whether they even need it. > ...and you're contradicting yourself here: How so? Above talks about dlm, below talks about cluster membership. > > What does have to be resolved is a common API for node management. It is > > not just cluster filesystems and their lock managers that have to > > interface to node management. Below the filesystem layer, cluster block > > devices and cluster volume management need to be coordinated by the same > > system, and above the filesystem layer, applications also need to be > > hooked into it. This work is, in a word, incomplete. Regards, Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: GFS, what's remaining
On 2005-09-03T01:57:31, Daniel Phillips <[EMAIL PROTECTED]> wrote: > The only current users of dlms are cluster filesystems. There are zero users > of the userspace dlm api. That is incorrect, and you're contradicting yourself here: > What does have to be resolved is a common API for node management. It is not > just cluster filesystems and their lock managers that have to interface to > node management. Below the filesystem layer, cluster block devices and > cluster volume management need to be coordinated by the same system, and > above the filesystem layer, applications also need to be hooked into it. > This work is, in a word, incomplete. The Cluster Volume Management of LVM2 for example _does_ use simple cluster-wide locks, and some OCFS2 scripts, I seem to recall, do too. (EVMS2 in cluster-mode uses a verrry simple locking scheme which is basically operated by the failover software and thus uses a different model.) Sincerely, Lars Marowsky-Brée <[EMAIL PROTECTED]> -- High Availability & Clustering SUSE Labs, Research and Development SUSE LINUX Products GmbH - A Novell Business -- Charles Darwin "Ignorance more frequently begets confidence than does knowledge" - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: GFS, what's remaining
On 2005-09-03T09:27:41, Bernd Eckenfels <[EMAIL PROTECTED]> wrote: > Oh thats interesting, I never thought about putting data files (tablespaces) > in a clustered file system. Does that mean you can run supported RAC on > shared ocfs2 files and anybody is using that? That is the whole point why OCFS exists ;-) > Do you see this go away with ASM? No. Beyond the table spaces, there's also ORACLE_HOME; a cluster benefits in several aspects from a general-purpose SAN-backed CFS. Sincerely, Lars Marowsky-Brée <[EMAIL PROTECTED]> -- High Availability & Clustering SUSE Labs, Research and Development SUSE LINUX Products GmbH - A Novell Business -- Charles Darwin "Ignorance more frequently begets confidence than does knowledge" - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: GFS, what's remaining
On Mon, Sep 05, 2005 at 12:09:23AM -0700, Mark Fasheh wrote: > Btw, I'm curious to know how useful folks find the ext3 mount options > errors=continue and errors=panic. I'm extremely likely to implement the > errors=read-only behavior as default in OCFS2 and I'm wondering whether the > other two are worth looking into. For a single-user system errors=panic is definitely very useful on the system disk, since that's the only way that we can force an fsck, and also abort a server that might be failing and returning erroneous information to its clients. Think of it is as i/o fencing when you're not sure that the system is going to be performing correctly. Whether or not this is useful for ocfs2 is a different matter. If it's only for data volumes, and if the only way to fix filesystem inconsistencies on a cluster filesystem is to request all nodes in the cluster to unmount the filesystem and then arrange to run ocfs2's fsck on the filesystem, then forcing every single cluster in the node to panic is probably counterproductive. :-) - Ted - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: real read-only [was Re: GFS, what's remaining]
On Mon, Sep 05, 2005 at 10:27:35AM +0200, Pavel Machek wrote: > > There's a better reason, too. I do swsusp. Then I'd like to boot with > / mounted read-only (so that I can read my config files, some > binaries, and maybe suspended image), but I absolutely may not write > to disk at this point, because I still want to resume. > You could _hope_ that the filesystem is consistent enough that it is safe to try to read config files, binaries, etc. without running the journal, but there is absolutely no guarantee that this is the case. I'm not sure you want to depend on that for swsusp. One potential solution that would probably meet your needs is a dm hack which reads in the blocks in the journal, and then uses the most recent block in the journal in preference to the version on disk. - Ted - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Linux-cluster] Re: GFS, what's remaining
Hi, On Sun, 2005-09-04 at 21:33, Pavel Machek wrote: > > - read-only mount > > - "specatator" mount (like ro but no journal allocated for the mount, > > no fencing needed for failed node that was mounted as specatator) > > I'd call it "real-read-only", and yes, that's very usefull > mount. Could we get it for ext3, too? I don't want to pollute the ext3 paths with extra checks for the case when there's no journal struct at all. But a dummy journal struct that isn't associated with an on-disk journal and that can never, ever go writable would certainly be pretty easy to do. But mount -o readonly gives you most of what you want already. An always-readonly option would be different in some key ways --- for a start, it would be impossible to perform journal recovery if that's needed, as that still needs journal and superblock write access. That's not necessarily a good thing. And you *still* wouldn't get something that could act as a spectator to a filesystem mounted writable elsewhere on a SAN, because updates on the other node wouldn't invalidate cached data on the readonly node. So is this really a useful combination? About the only combination I can think of that really makes sense in this context is if you have a busted filesystem that somehow can't be recovered --- either the journal is broken or the underlying device is truly readonly --- and you want to mount without recovery in order to attempt to see what you can find. That's asking for data corruption, but that may be better than getting no data at all. But that is something that could be done with a "-o skip-recovery" mount option, which would necessarily imply always-readonly behaviour. --Stephen - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Linux-cluster] Re: GFS, what's remaining
On Mon, Sep 05, 2005 at 02:19:48AM -0700, Andrew Morton wrote: > David Teigland <[EMAIL PROTECTED]> wrote: > > Four functions: > > create_lockspace() > > release_lockspace() > > lock() > > unlock() > > Neat. I'd be inclined to make them syscalls then. I don't suppose anyone > is likely to object if we reserve those slots. Patrick is really the expert in this area and he's off this week, but based on what he's done with the misc device I don't see why there'd be more than two or three parameters for any of these. Dave - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Linux-cluster] Re: GFS, what's remaining
On Monday 05 September 2005 05:19, Andrew Morton wrote: > David Teigland <[EMAIL PROTECTED]> wrote: > > On Mon, Sep 05, 2005 at 01:54:08AM -0700, Andrew Morton wrote: > > > David Teigland <[EMAIL PROTECTED]> wrote: > > > > We export our full dlm API through read/write/poll on a misc device. > > > > > > inotify did that for a while, but we ended up going with a straight > > > syscall interface. > > > > > > How fat is the dlm interface? ie: how many syscalls would it take? > > > > Four functions: > > create_lockspace() > > release_lockspace() > > lock() > > unlock() > > Neat. I'd be inclined to make them syscalls then. I don't suppose anyone > is likely to object if we reserve those slots. Better take a look at the actual parameter lists to those calls before jumping to conclusions... Regards, Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Linux-cluster] Re: GFS, what's remaining
David Teigland <[EMAIL PROTECTED]> wrote: > > On Mon, Sep 05, 2005 at 01:54:08AM -0700, Andrew Morton wrote: > > David Teigland <[EMAIL PROTECTED]> wrote: > > > > > > We export our full dlm API through read/write/poll on a misc device. > > > > > > > inotify did that for a while, but we ended up going with a straight syscall > > interface. > > > > How fat is the dlm interface? ie: how many syscalls would it take? > > Four functions: > create_lockspace() > release_lockspace() > lock() > unlock() Neat. I'd be inclined to make them syscalls then. I don't suppose anyone is likely to object if we reserve those slots. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Linux-cluster] Re: GFS, what's remaining
On Mon, Sep 05, 2005 at 01:54:08AM -0700, Andrew Morton wrote: > David Teigland <[EMAIL PROTECTED]> wrote: > > > > We export our full dlm API through read/write/poll on a misc device. > > > > inotify did that for a while, but we ended up going with a straight syscall > interface. > > How fat is the dlm interface? ie: how many syscalls would it take? Four functions: create_lockspace() release_lockspace() lock() unlock() Dave - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: GFS, what's remaining
On Mon, Sep 05, 2005 at 10:58:08AM +0200, J?rn Engel wrote: > #define gfs2_assert(sdp, assertion) do { \ > if (unlikely(!(assertion))) { \ > printk(KERN_ERR "GFS2: fsid=\n", (sdp)->sd_fsname); \ > BUG(); \ > } while (0) OK thanks, Dave - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: GFS, what's remaining
On Mon, 5 September 2005 11:47:39 +0800, David Teigland wrote: > > Joern already suggested moving this out of line and into a function (as it > was before) to avoid repeating string constants. In that case the > function, file and line from BUG aren't useful. We now have this, does it > look ok? Ok wrt. my concerns, but not with Greg's. BUG() still gives you everything that you need, except: o fsid Notice how this list is just one entry long? ;) So how about #define gfs2_assert(sdp, assertion) do {\ if (unlikely(!(assertion))) { \ printk(KERN_ERR "GFS2: fsid=\n", (sdp)->sd_fsname); \ BUG(); \ } while (0) Or, to move the constant out of line again void __gfs2_assert(struct gfs2_sbd *sdp) { printk(KERN_ERR "GFS2: fsid=\n", sdp->sd_fsname); } #define gfs2_assert(sdp, assertion) do {\ if (unlikely(!(assertion))) { \ __gfs2_assert(sdp); \ BUG(); \ } while (0) Jörn -- Admonish your friends privately, but praise them openly. -- Publilius Syrus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Linux-cluster] Re: GFS, what's remaining
David Teigland <[EMAIL PROTECTED]> wrote: > > We export our full dlm API through read/write/poll on a misc device. > inotify did that for a while, but we ended up going with a straight syscall interface. How fat is the dlm interface? ie: how many syscalls would it take? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: GFS, what's remaining
On Thu, Sep 01, 2005 at 01:35:23PM +0200, Arjan van de Ven wrote: > > +void gfs2_glock_hold(struct gfs2_glock *gl) > > +{ > > + glock_hold(gl); > > +} > > > > eh why? On 9/5/05, David Teigland <[EMAIL PROTECTED]> wrote: > You removed the comment stating exactly why, see below. If that's not a > accepted technique in the kernel, say so and I'll be happy to change it > here and elsewhere. Is there a reason why users of gfs2_glock_hold() cannot use glock_hold() directly? Pekka - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: GFS, what's remaining
On Sun, Sep 04, 2005 at 10:33:44PM +0200, Pavel Machek wrote: > Hi! > > > - read-only mount > > - "specatator" mount (like ro but no journal allocated for the mount, > > no fencing needed for failed node that was mounted as specatator) > > I'd call it "real-read-only", and yes, that's very usefull > mount. Could we get it for ext3, too? This is a bit of a degression, but it's quite a bit different from what ocfs2 is doing, where it is not necessary to replay the journal in order to assure filesystem consistency. In the ext3 case, the only time when read-only isn't quite read-only is when the filesystem was unmounted uncleanly and the journal needs to be replayed in order for the filesystem to be consistent. Mounting the filesystem read-only without replaying the journal could and very likely would result in the filesystem reporting filesystem consistency problems, and if the filesystem is mounted with the reboot-on-errors option, well - Ted - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: GFS, what's remaining
On Thu, Sep 01, 2005 at 01:35:23PM +0200, Arjan van de Ven wrote: > +static unsigned int handle_roll(atomic_t *a) > +{ > + int x = atomic_read(a); > + if (x < 0) { > + atomic_set(a, 0); > + return 0; > + } > + return (unsigned int)x; > +} > > this is just plain scary. Not really, it was just resetting atomic statistics counters when they became negative. Unecessary, though, so removed. Dave - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: GFS, what's remaining
On Thu, Sep 01, 2005 at 01:35:23PM +0200, Arjan van de Ven wrote: > +void gfs2_glock_hold(struct gfs2_glock *gl) > +{ > + glock_hold(gl); > +} > > eh why? You removed the comment stating exactly why, see below. If that's not a accepted technique in the kernel, say so and I'll be happy to change it here and elsewhere. Thanks, Dave static inline void glock_hold(struct gfs2_glock *gl) { gfs2_assert(gl->gl_sbd, atomic_read(&gl->gl_count) > 0); atomic_inc(&gl->gl_count); } /** * gfs2_glock_hold() - As glock_hold(), but suitable for exporting * @gl: The glock to hold * */ void gfs2_glock_hold(struct gfs2_glock *gl) { glock_hold(gl); } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Linux-cluster] Re: GFS, what's remaining
On Sat, Sep 03, 2005 at 10:41:40PM -0700, Andrew Morton wrote: > Joel Becker <[EMAIL PROTECTED]> wrote: > > > > > What happens when we want to add some new primitive which has no > > > posix-file analog? > > > > The point of dlmfs is not to express every primitive that the > > DLM has. dlmfs cannot express the CR, CW, and PW levels of the VMS > > locking scheme. Nor should it. The point isn't to use a filesystem > > interface for programs that need all the flexibility and power of the > > VMS DLM. The point is a simple system that programs needing the basic > > operations can use. Even shell scripts. > > Are you saying that the posix-file lookalike interface provides access to > part of the functionality, but there are other APIs which are used to > access the rest of the functionality? If so, what is that interface, and > why cannot that interface offer access to 100% of the functionality, thus > making the posix-file tricks unnecessary? We're using our dlm quite a bit in user space and require the full dlm API. It's difficult to export the full API through a pseudo fs like dlmfs, so we've not found it a very practical approach. That said, it's a nice idea and I'd be happy if someone could map a more complete dlm API onto it. We export our full dlm API through read/write/poll on a misc device. All user space apps use the dlm through a library as you'd expect. The library communicates with the dlm_device kernel module through read/write/poll and the dlm_device module talks with the actual dlm: linux/drivers/dlm/device.c If there's a better way to do this, via a pseudo fs or not, we'd be pleased to try it. Dave - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: GFS, what's remaining
On Fri, Sep 02, 2005 at 10:28:21PM -0700, Greg KH wrote: > On Fri, Sep 02, 2005 at 05:44:03PM +0800, David Teigland wrote: > > On Thu, Sep 01, 2005 at 01:35:23PM +0200, Arjan van de Ven wrote: > > > > > + gfs2_assert(gl->gl_sbd, atomic_read(&gl->gl_count) > 0,); > > > > > what is gfs2_assert() about anyway? please just use BUG_ON directly > > > everywhere > > > > When a machine has many gfs file systems mounted at once it can be useful > > to know which one failed. Does the following look ok? > > > > #define gfs2_assert(sdp, assertion) \ > > do { \ > > if (unlikely(!(assertion))) { \ > > printk(KERN_ERR \ > > "GFS2: fsid=%s: fatal: assertion \"%s\" failed\n" \ > > "GFS2: fsid=%s: function = %s\n"\ > > "GFS2: fsid=%s: file = %s, line = %u\n" \ > > "GFS2: fsid=%s: time = %lu\n", \ > > sdp->sd_fsname, # assertion, \ > > sdp->sd_fsname, __FUNCTION__,\ > > sdp->sd_fsname, __FILE__, __LINE__, \ > > sdp->sd_fsname, get_seconds()); \ > > BUG();\ > > You will already get the __FUNCTION__ (and hence the __FILE__ info) > directly from the BUG() dump, as well as the time from the syslog > message (turn on the printk timestamps if you want a more fine grain > timestamp), so the majority of this macro is redundant with the BUG() > macro... Joern already suggested moving this out of line and into a function (as it was before) to avoid repeating string constants. In that case the function, file and line from BUG aren't useful. We now have this, does it look ok? void gfs2_assert_i(struct gfs2_sbd *sdp, char *assertion, const char *function, char *file, unsigned int line) { panic("GFS2: fsid=%s: fatal: assertion \"%s\" failed\n" "GFS2: fsid=%s: function = %s, file = %s, line = %u\n", sdp->sd_fsname, assertion, sdp->sd_fsname, function, file, line); } #define gfs2_assert(sdp, assertion) \ do { \ if (unlikely(!(assertion))) { \ gfs2_assert_i((sdp), #assertion, \ __FUNCTION__, __FILE__, __LINE__); \ } \ } while (0) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: GFS, what's remaining
On Sun, Sep 04, 2005 at 10:33:44PM +0200, Pavel Machek wrote: > > - read-only mount > > - "specatator" mount (like ro but no journal allocated for the mount, > > no fencing needed for failed node that was mounted as specatator) > > I'd call it "real-read-only", and yes, that's very usefull > mount. Could we get it for ext3, too? In OCFS2 we call readonly+journal+connected-to-cluster "soft readonly". We're a live node, other nodes know we exist, and we can flush pending transactions during the rw->ro transition. In addition, we can allow a ro->rw transition. The no-journal+no-cluster-connection mode we call "hard readonly". This is the mode you get when a device itself is readonly, because you can't do *anything*. Joel -- "Lately I've been talking in my sleep. Can't imagine what I'd have to say. Except my world will be right When love comes back my way." Joel Becker Senior Member of Technical Staff Oracle E-mail: [EMAIL PROTECTED] Phone: (650) 506-8127 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: GFS, what's remaining
Hi! > - read-only mount > - "specatator" mount (like ro but no journal allocated for the mount, > no fencing needed for failed node that was mounted as specatator) I'd call it "real-read-only", and yes, that's very usefull mount. Could we get it for ext3, too? Pavel -- if you have sharp zaurus hardware you don't need... you know my address - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Linux-cluster] Re: GFS, what's remaining
On Sunday 04 September 2005 03:28, Andrew Morton wrote: > If there is already a richer interface into all this code (such as a > syscall one) and it's feasible to migrate the open() tricksies to that API > in the future if it all comes unstuck then OK. That's why I asked (thus > far unsuccessfully): > >Are you saying that the posix-file lookalike interface provides >access to part of the functionality, but there are other APIs which are >used to access the rest of the functionality? If so, what is that >interface, and why cannot that interface offer access to 100% of the >functionality, thus making the posix-file tricks unnecessary? There is no such interface at the moment, nor is one needed in the immediate future. Let's look at the arguments for exporting a dlm to userspace: 1) Since we already have a dlm in kernel, why not just export that and save 100K of userspace library? Answer: because we don't want userspace-only dlm features bulking up the kernel. Answer #2: the extra syscalls and interface baggage serve no useful purpose. 2) But we need to take locks in the same lockspaces as the kernel dlm(s)! Answer: only support tools need to do that. A cut-down locking api is entirely appropriate for this. 3) But the kernel dlm is the only one we have! Answer: easily fixed, a simple matter of coding. But please bear in mind that dlm-style synchronization is probably a bad idea for most cluster applications, particularly ones that already do their synchronization via sockets. In other words, exporting the full dlm api is a red herring. It has nothing to do with getting cluster filesystems up and running. It is really just marketing: it sounds like a great thing for userspace to get a dlm "for free", but it isn't free, it contributes to kernel bloat and it isn't even the most efficient way to do it. If after considering that, we _still_ want to export a dlm api from kernel, then can we please take the necessary time and get it right? The full api requires not only syscall-style elements, but asynchronous events as well, similar to aio. I do not think anybody has a good answer to this today, nor do we even need it to begin porting applications to cluster filesystems. Oracle guys: what is the distributed locking API for RAC? Is the RAC team waiting with bated breath to adopt your kernel-based dlm? If not, why not? Regards, Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Linux-cluster] Re: GFS, what's remaining
>takelock domainxxx lock1 >do sutff >droplock domainxxx lock1 > > When someone kills the shell, the lock is leaked, becuase droplock isn't > called. Why not open the lock resource (or the lock space) instead of individual locks as file? It then looks like this: open lock space file takelock lockresource lock1 do stuff droplock lockresource lock1 close lock space file Then if you are killed the ->release of lock space file should take care of cleaning up all the locks - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Linux-cluster] Re: GFS, what's remaining
On Sun, Sep 04, 2005 at 02:18:36AM -0700, Andrew Morton wrote: > take-and-drop-lock -d domainxxx -l lock1 -e "do stuff" Ahh, but then you have to have lots of scripts somewhere in path, or do massive inline scripts. especially if you want to take another lock in there somewhere. It's doable, but it's nowhere near as easy. :-) Joel -- "I always thought the hardest questions were those I could not answer. Now I know they are the ones I can never ask." - Charlie Watkins Joel Becker Senior Member of Technical Staff Oracle E-mail: [EMAIL PROTECTED] Phone: (650) 506-8127 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Linux-cluster] Re: GFS, what's remaining
Joel Becker <[EMAIL PROTECTED]> wrote: > > I can't see how that works easily. I'm not worried about a > tarball (eventually Red Hat and SuSE and Debian would have it). I'm > thinking about this shell: > > exec 7 do stuff > exec 7 > If someone kills the shell while stuff is doing, the lock is unlocked > because fd 7 is closed. However, if you have an application to do the > locking: > > takelock domainxxx lock1 > do sutff > droplock domainxxx lock1 > > When someone kills the shell, the lock is leaked, becuase droplock isn't > called. And SEGV/QUIT/-9 (especially -9, folks love it too much) are > handled by the first example but not by the second. take-and-drop-lock -d domainxxx -l lock1 -e "do stuff" - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Linux-cluster] Re: GFS, what's remaining
On Sun, Sep 04, 2005 at 01:18:05AM -0700, Andrew Morton wrote: > > I thought I stated this in my other email. We're not intending > > to extend dlmfs. > > Famous last words ;) Heh, of course :-) > I don't buy the general "fs is nice because we can script it" argument, > really. You can just write a few simple applications which provide access > to the syscalls (or the fs!) and then write scripts around those. I can't see how that works easily. I'm not worried about a tarball (eventually Red Hat and SuSE and Debian would have it). I'm thinking about this shell: exec 7http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Linux-cluster] Re: GFS, what's remaining
Mark Fasheh <[EMAIL PROTECTED]> wrote: > > On Sun, Sep 04, 2005 at 12:23:43AM -0700, Andrew Morton wrote: > > > What would be an acceptable replacement? I admit that O_NONBLOCK -> > > > trylock > > > is a bit unfortunate, but really it just needs a bit to express that - > > > nobody over here cares what it's called. > > > > The whole idea of reinterpreting file operations to mean something utterly > > different just seems inappropriate to me. > Putting aside trylock for a minute, I'm not sure how utterly different the > operations are. You create a lock resource by creating a file named after > it. You get a lock (fd) at read or write level on the resource by calling > open(2) with the appropriate mode (O_RDONLY, O_WRONLY/O_RDWR). > Now that we've got an fd, lock value blocks are naturally represented as > file data which can be read(2) or written(2). > Close(2) drops the lock. > > A really trivial usage example from shell: > > node1$ echo "hello world" > mylock > node2$ cat mylock > hello world > > I could always give a more useful one after I get some sleep :) It isn't extensible though. One couldn't retain this approach while adding (random cfs ignorance exposure) upgrade-read, downgrade-write, query-for-various-runtime-stats, priority modification, whatever. > > You get a lot of goodies when using a filesystem - the ability for > > unrelated processes to look things up, resource release on exit(), etc. If > > those features are valuable in the ocfs2 context then fine. > Right, they certainly are and I think Joel, in another e-mail on this > thread, explained well the advantages of using a filesystem. > > > But I'd have thought that it would be saner and more extensible to add new > > syscalls (perhaps taking fd's) rather than overloading the open() mode in > > this manner. > The idea behind dlmfs was to very simply export a small set of cluster dlm > operations to userspace. Given that goal, I felt that a whole set of system > calls would have been overkill. That said, I think perhaps I should clarify > that I don't intend dlmfs to become _the_ userspace dlm api, just a simple > and (imho) intuitive one which could be trivially accessed from any software > which just knows how to read and write files. Well, as I say. Making it a filesystem is superficially attractive, but once you've build a super-dooper enterprise-grade infrastructure on top of it all, nobody's going to touch the fs interface by hand and you end up wondering why it's there, adding baggage. Not that I'm questioning the fs interface! It has useful permission management, monitoring and resource releasing characteristics. I'm questioning the open() tricks. I guess from Joel's tiny description, the filesystem's interpretation of mknod and mkdir look sensible enough. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Linux-cluster] Re: GFS, what's remaining
Joel Becker <[EMAIL PROTECTED]> wrote: > > On Sun, Sep 04, 2005 at 12:28:28AM -0700, Andrew Morton wrote: > > If there is already a richer interface into all this code (such as a > > syscall one) and it's feasible to migrate the open() tricksies to that API > > in the future if it all comes unstuck then OK. > > That's why I asked (thus far unsuccessfully): > > I personally was under the impression that "syscalls are not > to be added". We add syscalls all the time. Whichever user<->kernel API is considered to be most appropriate, use it. > I'm also wary of the effort required to hook into process > exit. I'm not questioning the use of a filesystem. I'm questioning this overloading of normal filesystem system calls. For example (and this is just an example! there's also mknod, mkdir, O_RDWR, O_EXCL...) it would be more usual to do fd = open("/sys/whatever", ...); err = sys_dlm_trylock(fd); I guess your current implementation prevents /sys/whatever from ever appearing if the trylock failed. Dunno if that's valuable. > Not to mention all the lifetiming that has to be written again. > On top of that, we lose our cute ability to shell script it. We > find this very useful in testing, and think others would in practice. > > >Are you saying that the posix-file lookalike interface provides > >access to part of the functionality, but there are other APIs which are > >used to access the rest of the functionality? If so, what is that > >interface, and why cannot that interface offer access to 100% of the > >functionality, thus making the posix-file tricks unnecessary? > > I thought I stated this in my other email. We're not intending > to extend dlmfs. Famous last words ;) > It pretty much covers the simple DLM usage required of > a simple interface. The OCFS2 DLM does not provide any other > functionality. > If the OCFS2 DLM grew more functionality, or you consider the > GFS2 DLM that already has it (and a less intuitive interface via sysfs > IIRC), I would contend that dlmfs still has a place. It's simple to use > and understand, and it's usable from shell scripts and other simple > code. (wonders how to do O_NONBLOCK from a script) I don't buy the general "fs is nice because we can script it" argument, really. You can just write a few simple applications which provide access to the syscalls (or the fs!) and then write scripts around those. Yes, you suddenly need to get a little tarball into users' hands and that's a hassle. And I sometimes think we let this hassle guide kernel interfaces (mutters something about /sbin/hotplug), and that's sad. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Linux-cluster] Re: GFS, what's remaining
On Sun, Sep 04, 2005 at 12:23:43AM -0700, Andrew Morton wrote: > > What would be an acceptable replacement? I admit that O_NONBLOCK -> trylock > > is a bit unfortunate, but really it just needs a bit to express that - > > nobody over here cares what it's called. > > The whole idea of reinterpreting file operations to mean something utterly > different just seems inappropriate to me. Putting aside trylock for a minute, I'm not sure how utterly different the operations are. You create a lock resource by creating a file named after it. You get a lock (fd) at read or write level on the resource by calling open(2) with the appropriate mode (O_RDONLY, O_WRONLY/O_RDWR). Now that we've got an fd, lock value blocks are naturally represented as file data which can be read(2) or written(2). Close(2) drops the lock. A really trivial usage example from shell: node1$ echo "hello world" > mylock node2$ cat mylock hello world I could always give a more useful one after I get some sleep :) > You get a lot of goodies when using a filesystem - the ability for > unrelated processes to look things up, resource release on exit(), etc. If > those features are valuable in the ocfs2 context then fine. Right, they certainly are and I think Joel, in another e-mail on this thread, explained well the advantages of using a filesystem. > But I'd have thought that it would be saner and more extensible to add new > syscalls (perhaps taking fd's) rather than overloading the open() mode in > this manner. The idea behind dlmfs was to very simply export a small set of cluster dlm operations to userspace. Given that goal, I felt that a whole set of system calls would have been overkill. That said, I think perhaps I should clarify that I don't intend dlmfs to become _the_ userspace dlm api, just a simple and (imho) intuitive one which could be trivially accessed from any software which just knows how to read and write files. --Mark -- Mark Fasheh Senior Software Developer, Oracle [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Linux-cluster] Re: GFS, what's remaining
On Sun, Sep 04, 2005 at 12:28:28AM -0700, Andrew Morton wrote: > If there is already a richer interface into all this code (such as a > syscall one) and it's feasible to migrate the open() tricksies to that API > in the future if it all comes unstuck then OK. > That's why I asked (thus far unsuccessfully): I personally was under the impression that "syscalls are not to be added". I'm also wary of the effort required to hook into process exit. Not to mention all the lifetiming that has to be written again. On top of that, we lose our cute ability to shell script it. We find this very useful in testing, and think others would in practice. >Are you saying that the posix-file lookalike interface provides >access to part of the functionality, but there are other APIs which are >used to access the rest of the functionality? If so, what is that >interface, and why cannot that interface offer access to 100% of the >functionality, thus making the posix-file tricks unnecessary? I thought I stated this in my other email. We're not intending to extend dlmfs. It pretty much covers the simple DLM usage required of a simple interface. The OCFS2 DLM does not provide any other functionality. If the OCFS2 DLM grew more functionality, or you consider the GFS2 DLM that already has it (and a less intuitive interface via sysfs IIRC), I would contend that dlmfs still has a place. It's simple to use and understand, and it's usable from shell scripts and other simple code. Joel -- "The first thing we do, let's kill all the lawyers." -Henry VI, IV:ii Joel Becker Senior Member of Technical Staff Oracle E-mail: [EMAIL PROTECTED] Phone: (650) 506-8127 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Linux-cluster] Re: GFS, what's remaining
Daniel Phillips <[EMAIL PROTECTED]> wrote: > > If the only user is their tools I would say let it go ahead and be cute, even > sickeningly so. It is not supposed to be a general dlm api, at least that > is > my understanding. It is just supposed to be an interface for their tools. > Of course it would help to know exactly how those tools use it. Well I'm not saying "don't do this". I'm saying "eww" and "why?". If there is already a richer interface into all this code (such as a syscall one) and it's feasible to migrate the open() tricksies to that API in the future if it all comes unstuck then OK. That's why I asked (thus far unsuccessfully): Are you saying that the posix-file lookalike interface provides access to part of the functionality, but there are other APIs which are used to access the rest of the functionality? If so, what is that interface, and why cannot that interface offer access to 100% of the functionality, thus making the posix-file tricks unnecessary? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Linux-cluster] Re: GFS, what's remaining
Mark Fasheh <[EMAIL PROTECTED]> wrote: > > On Sat, Sep 03, 2005 at 09:46:53PM -0700, Andrew Morton wrote: > > Actually I think it's rather sick. Taking O_NONBLOCK and making it a > > lock-manager trylock because they're kinda-sorta-similar-sounding? Spare > > me. O_NONBLOCK means "open this file in nonblocking mode", not "attempt to > > acquire a clustered filesystem lock". Not even close. > > What would be an acceptable replacement? I admit that O_NONBLOCK -> trylock > is a bit unfortunate, but really it just needs a bit to express that - > nobody over here cares what it's called. The whole idea of reinterpreting file operations to mean something utterly different just seems inappropriate to me. You get a lot of goodies when using a filesystem - the ability for unrelated processes to look things up, resource release on exit(), etc. If those features are valuable in the ocfs2 context then fine. But I'd have thought that it would be saner and more extensible to add new syscalls (perhaps taking fd's) rather than overloading the open() mode in this manner. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Linux-cluster] Re: GFS, what's remaining
On Sunday 04 September 2005 00:46, Andrew Morton wrote: > Daniel Phillips <[EMAIL PROTECTED]> wrote: > > The model you came up with for dlmfs is beyond cute, it's downright > > clever. > > Actually I think it's rather sick. Taking O_NONBLOCK and making it a > lock-manager trylock because they're kinda-sorta-similar-sounding? Spare > me. O_NONBLOCK means "open this file in nonblocking mode", not "attempt to > acquire a clustered filesystem lock". Not even close. Now, I see the ocfs2 guys are all ready to back down on this one, but I will at least argue weakly in favor. Sick is a nice word for it, but it is actually not that far off. Normally, this fs will acquire a lock whenever the user creates a virtual file and the create will block until the global lock arrives. With O_NONBLOCK, it will return, erm... ETXTBSY (!) immediately. Is that not what O_NONBLOCK is supposed to accomplish? > It would be much better to do something which explicitly and directly > expresses what you're trying to do rather than this strange "lets do this > because the names sound the same" thing. > > What happens when we want to add some new primitive which has no posix-file > analog? > > Wy too cute. Oh well, whatever. The explicit way is syscalls or a set of ioctls, which he already has the makings of. If there is going to be a userspace api, I would hope it looks more like the contents of userdlm.c than the traditional Vaxcluster API, which sucks beyond belief. Another explicit way is to do it with a whole set of virtual attributes instead of just a single file trying to capture the whole model. That is really unappealing, but I am afraid that is exactly what a whole lot of sysfs/configfs usage is going to end up looking like. But more to the point: we have no urgent need for a userspace dlm api at the moment. Nothing will break if we just put that issue off for a few months, quite the contrary. If the only user is their tools I would say let it go ahead and be cute, even sickeningly so. It is not supposed to be a general dlm api, at least that is my understanding. It is just supposed to be an interface for their tools. Of course it would help to know exactly how those tools use it. Too sleepy to find out tonight... Regards, Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Linux-cluster] Re: GFS, what's remaining
On Sat, Sep 03, 2005 at 09:46:53PM -0700, Andrew Morton wrote: > Actually I think it's rather sick. Taking O_NONBLOCK and making it a > lock-manager trylock because they're kinda-sorta-similar-sounding? Spare > me. O_NONBLOCK means "open this file in nonblocking mode", not "attempt to > acquire a clustered filesystem lock". Not even close. What would be an acceptable replacement? I admit that O_NONBLOCK -> trylock is a bit unfortunate, but really it just needs a bit to express that - nobody over here cares what it's called. --Mark -- Mark Fasheh Senior Software Developer, Oracle [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Linux-cluster] Re: GFS, what's remaining
On Sun, Sep 04, 2005 at 01:52:29AM -0400, Daniel Phillips wrote: > You do have ->release and ->make_item/group. ->release is like kobject release. It's a free callback, not a callback from close. > If I may hand you a more substantive argument: you don't support user-driven > creation of files in configfs, only directories. Dlmfs supports user-created > files. But you know, there isn't actually a good reason not to support > user-created files in configfs, as dlmfs demonstrates. It is outside the domain of configfs. Just because it can be done does not mean it should be. configfs isn't a "thing to create files". It's an interface to creating kernel items. The actual filesystem representation isn't the end, it's just the means. Joel -- "In the room the women come and go Talking of Michaelangelo." Joel Becker Senior Member of Technical Staff Oracle E-mail: [EMAIL PROTECTED] Phone: (650) 506-8127 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Linux-cluster] Re: GFS, what's remaining
On Sat, Sep 03, 2005 at 10:41:40PM -0700, Andrew Morton wrote: > Are you saying that the posix-file lookalike interface provides access to > part of the functionality, but there are other APIs which are used to > access the rest of the functionality? If so, what is that interface, and > why cannot that interface offer access to 100% of the functionality, thus > making the posix-file tricks unnecessary? Currently, this is all the interface that the OCFS2 DLM provides. But yes, if you wanted to provide the rest of the VMS functionality (something that GFS2's DLM does), you'd need to use a more concrete interface. IMHO, it's worthwhile to have a simple interface, one already used by mkfs.ocfs2, mount.ocfs2, fsck.ocfs2, etc. This is an interface that can and is used by shell scripts even (we do this to test the DLM). If you make it a C-library-only interface, you've just restricted the subset of folks that can use it, while adding programming complexity. I think that a simple fs-based interface can coexist with a more complex one. FILE* doesn't give you the flexibility of read()/write(), but I wouldn't remove it :-) Joel -- "In the beginning, the universe was created. This has made a lot of people very angry, and is generally considered to have been a bad move." - Douglas Adams Joel Becker Senior Member of Technical Staff Oracle E-mail: [EMAIL PROTECTED] Phone: (650) 506-8127 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Linux-cluster] Re: GFS, what's remaining
On Sunday 04 September 2005 01:00, Joel Becker wrote: > On Sun, Sep 04, 2005 at 12:51:10AM -0400, Daniel Phillips wrote: > > Clearly, I ought to have asked why dlmfs can't be done by configfs. It > > is the same paradigm: drive the kernel logic from user-initiated vfs > > methods. You already have nearly all the right methods in nearly all the > > right places. > > configfs, like sysfs, does not support ->open() or ->release() > callbacks. struct configfs_item_operations { void (*release)(struct config_item *); ssize_t (*show)(struct config_item *, struct attribute *,char *); ssize_t (*store)(struct config_item *,struct attribute *,const char *, size_t); int (*allow_link)(struct config_item *src, struct config_item *target); int (*drop_link)(struct config_item *src, struct config_item *target); }; struct configfs_group_operations { struct config_item *(*make_item)(struct config_group *group, const char *name); struct config_group *(*make_group)(struct config_group *group, const char *name); int (*commit_item)(struct config_item *item); void (*drop_item)(struct config_group *group, struct config_item *item); }; You do have ->release and ->make_item/group. If I may hand you a more substantive argument: you don't support user-driven creation of files in configfs, only directories. Dlmfs supports user-created files. But you know, there isn't actually a good reason not to support user-created files in configfs, as dlmfs demonstrates. Anyway, goodnight. Regards, Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Linux-cluster] Re: GFS, what's remaining
Joel Becker <[EMAIL PROTECTED]> wrote: > > > What happens when we want to add some new primitive which has no posix-file > > analog? > > The point of dlmfs is not to express every primitive that the > DLM has. dlmfs cannot express the CR, CW, and PW levels of the VMS > locking scheme. Nor should it. The point isn't to use a filesystem > interface for programs that need all the flexibility and power of the > VMS DLM. The point is a simple system that programs needing the basic > operations can use. Even shell scripts. Are you saying that the posix-file lookalike interface provides access to part of the functionality, but there are other APIs which are used to access the rest of the functionality? If so, what is that interface, and why cannot that interface offer access to 100% of the functionality, thus making the posix-file tricks unnecessary? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Linux-cluster] Re: GFS, what's remaining
On Sun, Sep 04, 2005 at 12:51:10AM -0400, Daniel Phillips wrote: > Clearly, I ought to have asked why dlmfs can't be done by configfs. It is > the > same paradigm: drive the kernel logic from user-initiated vfs methods. You > already have nearly all the right methods in nearly all the right places. configfs, like sysfs, does not support ->open() or ->release() callbacks. And it shouldn't. The point is to hide the complexity and make it easier to plug into. A client object should not ever have to know or care that it is being controlled by a filesystem. It only knows that it has a tree of items with attributes that can be set or shown. Joel -- "In a crisis, don't hide behind anything or anybody. They're going to find you anyway." - Paul "Bear" Bryant Joel Becker Senior Member of Technical Staff Oracle E-mail: [EMAIL PROTECTED] Phone: (650) 506-8127 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Linux-cluster] Re: GFS, what's remaining
On Sat, Sep 03, 2005 at 09:46:53PM -0700, Andrew Morton wrote: > It would be much better to do something which explicitly and directly > expresses what you're trying to do rather than this strange "lets do this > because the names sound the same" thing. So, you'd like a new flag name? That can be done. > What happens when we want to add some new primitive which has no posix-file > analog? The point of dlmfs is not to express every primitive that the DLM has. dlmfs cannot express the CR, CW, and PW levels of the VMS locking scheme. Nor should it. The point isn't to use a filesystem interface for programs that need all the flexibility and power of the VMS DLM. The point is a simple system that programs needing the basic operations can use. Even shell scripts. Joel -- "You must remember this: A kiss is just a kiss, A sigh is just a sigh. The fundamental rules apply As time goes by." Joel Becker Senior Member of Technical Staff Oracle E-mail: [EMAIL PROTECTED] Phone: (650) 506-8127 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Linux-cluster] Re: GFS, what's remaining
Daniel Phillips <[EMAIL PROTECTED]> wrote: > > The model you came up with for dlmfs is beyond cute, it's downright clever. Actually I think it's rather sick. Taking O_NONBLOCK and making it a lock-manager trylock because they're kinda-sorta-similar-sounding? Spare me. O_NONBLOCK means "open this file in nonblocking mode", not "attempt to acquire a clustered filesystem lock". Not even close. It would be much better to do something which explicitly and directly expresses what you're trying to do rather than this strange "lets do this because the names sound the same" thing. What happens when we want to add some new primitive which has no posix-file analog? Wy too cute. Oh well, whatever. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Linux-cluster] Re: GFS, what's remaining
On Sunday 04 September 2005 00:30, Joel Becker wrote: > You asked why dlmfs can't go into sysfs, and I responded. And you got me! In the heat of the moment I overlooked the fact that you and Greg haven't agreed to the merge yet ;-) Clearly, I ought to have asked why dlmfs can't be done by configfs. It is the same paradigm: drive the kernel logic from user-initiated vfs methods. You already have nearly all the right methods in nearly all the right places. Regards, Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Linux-cluster] Re: GFS, what's remaining
On Sun, Sep 04, 2005 at 12:22:36AM -0400, Daniel Phillips wrote: > It is 640 lines. It's 450 without comments and blank lines. Please, don't tell me that comments to help understanding are bloat. > I said "configfs" in the email to which you are replying. To wit: > Daniel Phillips said: > > Mark Fasheh said: > > > as far as userspace dlm apis go, dlmfs already abstracts away a > > > large > > > part of the dlm interaction... > > > > Dumb question, why can't you use sysfs for this instead of rolling > > your > > own? You asked why dlmfs can't go into sysfs, and I responded. Joel -- "I don't want to achieve immortality through my work; I want to achieve immortality through not dying." - Woody Allen Joel Becker Senior Member of Technical Staff Oracle E-mail: [EMAIL PROTECTED] Phone: (650) 506-8127 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Linux-cluster] Re: GFS, what's remaining
On Saturday 03 September 2005 23:06, Joel Becker wrote: > dlmfs is *tiny*. The VFS interface is less than his claimed 500 > lines of savings. It is 640 lines. > The few VFS callbacks do nothing but call DLM > functions. You'd have to replace this VFS glue with sysfs glue, and > probably save very few lines of code. > In addition, sysfs cannot support the dlmfs model. In dlmfs, > mkdir(2) creates a directory representing a DLM domain and mknod(2) > creates the user representation of a lock. sysfs doesn't support > mkdir(2) or mknod(2) at all. I said "configfs" in the email to which you are replying. > More than mkdir() and mknod(), however, dlmfs uses open(2) to > acquire locks from userspace. O_RDONLY acquires a shared read lock (PR > in VMS parlance). O_RDWR gets an exclusive lock (X). O_NONBLOCK is a > trylock. Here, dlmfs is using the VFS for complete lifetiming. A lock > is released via close(2). If a process dies, close(2) happens. In > other words, ->release() handles all the cleanup for normal and abnormal > termination. > > sysfs does not allow hooking into ->open() or ->release(). So > this model, and the inherent lifetiming that comes with it, cannot be > used. Configfs has a per-item release method. Configfs has a group open method. What is it that configfs can't do, or can't be made to do trivially? > If dlmfs was changed to use a less intuitive model that fits > sysfs, all the handling of lifetimes and cleanup would have to be added. The model you came up with for dlmfs is beyond cute, it's downright clever. Why mar that achievement by then failing to capitalize on the framework you already have in configfs? By the way, do you agree that dlmfs is too inefficient to be an effective way of exporting your dlm api to user space, except for slow-path applications like you have here? Regards, Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Linux-cluster] Re: GFS, what's remaining
On Sat, Sep 03, 2005 at 06:32:41PM -0700, Andrew Morton wrote: > If there's duplicated code in there then we should seek to either make the > code multi-purpose or place the common or reusable parts into a library > somewhere. Regarding sysfs and configfs, that's a whole 'nother conversation. I've not yet come up with a function involved that is identical, but that's a response here for another email. Understanding that Daniel is talking about dlmfs, dlmfs is far more similar to devptsfs, tmpfs, and even sockfs and pipefs than it is to sysfs. I don't see him proposing that sockfs and devptsfs be folded into sysfs. dlmfs is *tiny*. The VFS interface is less than his claimed 500 lines of savings. The few VFS callbacks do nothing but call DLM functions. You'd have to replace this VFS glue with sysfs glue, and probably save very few lines of code. In addition, sysfs cannot support the dlmfs model. In dlmfs, mkdir(2) creates a directory representing a DLM domain and mknod(2) creates the user representation of a lock. sysfs doesn't support mkdir(2) or mknod(2) at all. More than mkdir() and mknod(), however, dlmfs uses open(2) to acquire locks from userspace. O_RDONLY acquires a shared read lock (PR in VMS parlance). O_RDWR gets an exclusive lock (X). O_NONBLOCK is a trylock. Here, dlmfs is using the VFS for complete lifetiming. A lock is released via close(2). If a process dies, close(2) happens. In other words, ->release() handles all the cleanup for normal and abnormal termination. sysfs does not allow hooking into ->open() or ->release(). So this model, and the inherent lifetiming that comes with it, cannot be used. If dlmfs was changed to use a less intuitive model that fits sysfs, all the handling of lifetimes and cleanup would have to be added. This would make it more complex, not less complex. It would give it a larger code size, not a smaller one. In the end, it would be harder to maintian, less intuitive to use, and larger. Joel -- "Anything that is too stupid to be spoken is sung." - Voltaire Joel Becker Senior Member of Technical Staff Oracle E-mail: [EMAIL PROTECTED] Phone: (650) 506-8127 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Linux-cluster] Re: GFS, what's remaining
Joel Becker <[EMAIL PROTECTED]> wrote: > > On Sat, Sep 03, 2005 at 06:21:26PM -0400, Daniel Phillips wrote: > > that fit the configfs-nee-sysfs model? If it does, the payoff will be > about > > 500 lines saved. > > I'm still awaiting your merge of ext3 and reiserfs, because you > can save probably 500 lines having a filesystem that can create reiser > and ext3 files at the same time. oy. Daniel is asking a legitimate question. If there's duplicated code in there then we should seek to either make the code multi-purpose or place the common or reusable parts into a library somewhere. If neither approach is applicable or practical for *every single function* then fine, please explain why. AFAIR that has not been done. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Linux-cluster] Re: GFS, what's remaining
On Sat, Sep 03, 2005 at 06:21:26PM -0400, Daniel Phillips wrote: > that fit the configfs-nee-sysfs model? If it does, the payoff will be about > 500 lines saved. I'm still awaiting your merge of ext3 and reiserfs, because you can save probably 500 lines having a filesystem that can create reiser and ext3 files at the same time. Joel -- Life's Little Instruction Book #267 "Lie on your back and look at the stars." Joel Becker Senior Member of Technical Staff Oracle E-mail: [EMAIL PROTECTED] Phone: (650) 506-8127 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Linux-cluster] Re: GFS, what's remaining
On Saturday 03 September 2005 02:46, Wim Coekaerts wrote: > On Sat, Sep 03, 2005 at 02:42:36AM -0400, Daniel Phillips wrote: > > On Friday 02 September 2005 20:16, Mark Fasheh wrote: > > > As far as userspace dlm apis go, dlmfs already abstracts away a large > > > part of the dlm interaction... > > > > Dumb question, why can't you use sysfs for this instead of rolling your > > own? > > because it's totally different. have a look at what it does. You create a dlm domain when a directory is created. You create a lock resource when a file of that name is opened. You lock the resource when the file is opened. You access the lvb by read/writing the file. Why doesn't that fit the configfs-nee-sysfs model? If it does, the payoff will be about 500 lines saved. This little dlm fs is very slick, but grossly inefficient. Maybe efficiency doesn't matter here since it is just your slow-path userspace tools taking these locks. Please do not even think of proposing this as a way to export a kernel-based dlm for general purpose use! Your userdlm.c file has some hidden gold in it. You have factored the dlm calls far more attractively than the bad old bazillion-parameter Vaxcluster legacy. You are almost in system call zone there. (But note my earlier comment on dlms in general: until there are dlm-based applications, merging a general-purpose dlm API is pointless and has nothing to do with getting your filesystem merged.) Regards, Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: GFS, what's remaining
On Saturday 03 September 2005 06:35, David Teigland wrote: > Just a new version, not a big difference. The ondisk format changed a > little making it incompatible with the previous versions. We'd been > holding out on the format change for a long time and thought now would be > a sensible time to finally do it. What exactly was the format change, and for what purpose? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: GFS, what's remaining
On Sat, Sep 03, 2005 at 08:14:00AM +0200, Arjan van de Ven wrote: > On Sat, 2005-09-03 at 13:18 +0800, David Teigland wrote: > > On Thu, Sep 01, 2005 at 01:21:04PM -0700, Andrew Morton wrote: > > > Alan Cox <[EMAIL PROTECTED]> wrote: > > > > > - Why GFS is better than OCFS2, or has functionality which OCFS2 > > > > > cannot > > > > > possibly gain (or vice versa) > > > > > > > > > > - Relative merits of the two offerings > > > > > > > > You missed the important one - people actively use it and have been for > > > > some years. Same reason with have NTFS, HPFS, and all the others. On > > > > that alone it makes sense to include. > > > > > > Again, that's not a technical reason. It's _a_ reason, sure. But what > > > are > > > the technical reasons for merging gfs[2], ocfs2, both or neither? > > > > > > If one can be grown to encompass the capabilities of the other then we're > > > left with a bunch of legacy code and wasted effort. > > > > GFS is an established fs, it's not going away, you'd be hard pressed to > > find a more widely used cluster fs on Linux. GFS is about 10 years old > > and has been in use by customers in production environments for about 5 > > years. > > but you submitted GFS2 not GFS. Just a new version, not a big difference. The ondisk format changed a little making it incompatible with the previous versions. We'd been holding out on the format change for a long time and thought now would be a sensible time to finally do it. This is also about timing things conveniently. Each GFS version coincides with a development cycle and we decided to wait for this version/cycle to move code upstream. So, we have new version, format change, and code upstream all together, but it's still the same GFS to us. As with _any_ new version (involving ondisk formats or not) we need to thoroughly test everything to fix the inevitible bugs and regresssions that are introduced, there's nothing new or surprising about that. About the name -- we need to support customers running both versions for a long time. The "2" was added to make that process a little easier and clearer for people, that's all. If the 2 is really distressing we could rip it off, but there seems to be as many file systems ending in digits than not these days... Dave - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: GFS, what's remaining
In article <[EMAIL PROTECTED]> you wrote: > for ocfs we have tons of production customers running many terabyte > databases on a cfs. why ? because dealing with the raw disk froma number > of nodes sucks. because nfs is pretty broken for a lot of stuff, there > is no consistency across nodes when each machine nfs mounts a server > partition. yes nfs can be used for things but cfs's are very useful for > many things nfs just can't do. want a list ? Oh thats interesting, I never thought about putting data files (tablespaces) in a clustered file system. Does that mean you can run supported RAC on shared ocfs2 files and anybody is using that? Do you see this go away with ASM? Greetings Bernd - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: GFS, what's remaining
On Fri, Sep 02, 2005 at 11:17:08PM +0200, Andi Kleen wrote: > Andrew Morton <[EMAIL PROTECTED]> writes: > > > > > Again, that's not a technical reason. It's _a_ reason, sure. But what are > > the technical reasons for merging gfs[2], ocfs2, both or neither? clusterfilesystems are very common, there are companies that had/have a whole business around it, veritas, polyserve, ex-sistina, thus now redhat, ibm, tons of companies out there sell this, big bucks. as someone said, it's different than nfs because for certian things there is less overhead but there are many other reasons, it makes it a lot easier to create a clustered nfs server so you create a cfs on a set of disks with a number of nodes and export that fs from all those, you can easily do loadbalancing for applications, you have a lot of infrastructure where people have invested in that allows for shared storage... for ocfs we have tons of production customers running many terabyte databases on a cfs. why ? because dealing with the raw disk froma number of nodes sucks. because nfs is pretty broken for a lot of stuff, there is no consistency across nodes when each machine nfs mounts a server partition. yes nfs can be used for things but cfs's are very useful for many things nfs just can't do. want a list ? companies building failover for services like to use things like this, it creates a non single point of failure kind of setup much more easily. andso on and so on, yes there are alternatives out there but fact is that a lot of folks like to use it, have been using it for ages, and want to be using it. from an implementation point of view, as folks here have already said, we 've tried our best to implement things as a real linux filesystem, no abstractions to have something generic, it's clean and as tight as can be for a lot of stuff. and compared to other cfs's it's pretty darned nice, however I think it's silly to have competition between ocfs2 and gfs2. they are different just like the ton of local filesystems are different and people like to use one or/over the other. david said gfs is popular and has been around, well, I can list you tons of folks that have been using our stuff 24/7 for years (for free) just as well. it's different. that's that. it'd be really nice if mainline kernel had it/them included. it would be a good start to get more folks involved and instead of years of talk on maillists that end up in nothing actually end up with folks participating and contributing. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Linux-cluster] Re: GFS, what's remaining
On Sat, Sep 03, 2005 at 02:42:36AM -0400, Daniel Phillips wrote: > On Friday 02 September 2005 20:16, Mark Fasheh wrote: > > As far as userspace dlm apis go, dlmfs already abstracts away a large part > > of the dlm interaction... > > Dumb question, why can't you use sysfs for this instead of rolling your own? because it's totally different. have a look at what it does. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: GFS, what's remaining
On Friday 02 September 2005 20:16, Mark Fasheh wrote: > As far as userspace dlm apis go, dlmfs already abstracts away a large part > of the dlm interaction... Dumb question, why can't you use sysfs for this instead of rolling your own? Side note: you seem to have deleted all the 2.6.12-rc4 patches. Perhaps you forgot that there are dozens of lkml archives pointing at them? Regards, Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: GFS, what's remaining
On Saturday 03 September 2005 02:14, Arjan van de Ven wrote: > On Sat, 2005-09-03 at 13:18 +0800, David Teigland wrote: > > On Thu, Sep 01, 2005 at 01:21:04PM -0700, Andrew Morton wrote: > > > Alan Cox <[EMAIL PROTECTED]> wrote: > > > > > - Why GFS is better than OCFS2, or has functionality which > > > > > OCFS2 cannot possibly gain (or vice versa) > > > > > > > > > > - Relative merits of the two offerings > > > > > > > > You missed the important one - people actively use it and > > > > have been for some years. Same reason with have NTFS, HPFS, > > > > and all the others. On that alone it makes sense to include. > > > > > > Again, that's not a technical reason. It's _a_ reason, sure. > > > But what are the technical reasons for merging gfs[2], ocfs2, > > > both or neither? > > > > > > If one can be grown to encompass the capabilities of the other > > > then we're left with a bunch of legacy code and wasted effort. > > > > GFS is an established fs, it's not going away, you'd be hard > > pressed to find a more widely used cluster fs on Linux. GFS is > > about 10 years old and has been in use by customers in production > > environments for about 5 years. > > but you submitted GFS2 not GFS. I'd rather not step into the middle of this mess, but you clipped out a good portion that explains why he talks about GFS when he submitted GFS2. Let me quote the post you've pulled that partial paragraph from: "The latest development cycle (GFS2) has focused on improving performance, it's not a new file system -- the "2" indicates that it's not ondisk compatible with earlier versions." In other words he didn't submit the original, but the new version of it that is not compatable with the original GFS on disk format. While it is clear that GFS2 cannot claim the large installed user base or the proven capacity of the original (it is, after all, a new version that has incompatabilities) it can claim that as it's heritage and what it's aiming towards, the same as ext3 can (and does) claim the power and reliability of ext2. In this case I've been following this thread just for the hell of it and I've noticed that there are some people who seem to not want to even think of having GFS2 included in a mainline kernel for personal and not technical reasons. That does not describe most of the people on this list, many of whom have helped debug the code (among other things), but it does describe a few. I'll go back to being quiet now... DRH 0xA6992F96300F159086FF28208F8280BB8B00C32A.asc Description: application/pgp-keys pgp0QhxWOkyt5.pgp Description: PGP signature
Re: GFS, what's remaining
On Sat, 2005-09-03 at 13:18 +0800, David Teigland wrote: > On Thu, Sep 01, 2005 at 01:21:04PM -0700, Andrew Morton wrote: > > Alan Cox <[EMAIL PROTECTED]> wrote: > > > > - Why GFS is better than OCFS2, or has functionality which OCFS2 cannot > > > > possibly gain (or vice versa) > > > > > > > > - Relative merits of the two offerings > > > > > > You missed the important one - people actively use it and have been for > > > some years. Same reason with have NTFS, HPFS, and all the others. On > > > that alone it makes sense to include. > > > > Again, that's not a technical reason. It's _a_ reason, sure. But what are > > the technical reasons for merging gfs[2], ocfs2, both or neither? > > > > If one can be grown to encompass the capabilities of the other then we're > > left with a bunch of legacy code and wasted effort. > > GFS is an established fs, it's not going away, you'd be hard pressed to > find a more widely used cluster fs on Linux. GFS is about 10 years old > and has been in use by customers in production environments for about 5 > years. but you submitted GFS2 not GFS. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: GFS, what's remaining
On Friday 02 September 2005 17:17, Andi Kleen wrote: > The only thing that should be probably resolved is a common API > for at least the clustered lock manager. Having multiple > incompatible user space APIs for that would be sad. The only current users of dlms are cluster filesystems. There are zero users of the userspace dlm api. Therefore, the (g)dlm userspace interface actually has nothing to do with the needs of gfs. It should be taken out the gfs patch and merged later, when or if user space applications emerge that need it. Maybe in the meantime it will be possible to come up with a userspace dlm api that isn't completely repulsive. Also, note that the only reason the two current dlms are in-kernel is because it supposedly cuts down on userspace-kernel communication with the cluster filesystems. Then why should a userspace application bother with a an awkward interface to an in-kernel dlm? This is obviously suboptimal. Why not have a userspace dlm for userspace apps, if indeed there are any userspace apps that would need to use dlm-style synchronization instead of more typical socket-based synchronization, or Posix locking, which is already exposed via a standard api? There is actually nothing wrong with having multiple, completely different dlms active at the same time. There is no urgent need to merge them into the one true dlm. It would be a lot better to let them evolve separately and pick the winner a year or two from now. Just think of the dlm as part of the cfs until then. What does have to be resolved is a common API for node management. It is not just cluster filesystems and their lock managers that have to interface to node management. Below the filesystem layer, cluster block devices and cluster volume management need to be coordinated by the same system, and above the filesystem layer, applications also need to be hooked into it. This work is, in a word, incomplete. Regards, Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: GFS, what's remaining
On Fri, Sep 02, 2005 at 05:44:03PM +0800, David Teigland wrote: > On Thu, Sep 01, 2005 at 01:35:23PM +0200, Arjan van de Ven wrote: > > > + gfs2_assert(gl->gl_sbd, atomic_read(&gl->gl_count) > 0,); > > > what is gfs2_assert() about anyway? please just use BUG_ON directly > > everywhere > > When a machine has many gfs file systems mounted at once it can be useful > to know which one failed. Does the following look ok? > > #define gfs2_assert(sdp, assertion) \ > do { \ > if (unlikely(!(assertion))) { \ > printk(KERN_ERR \ > "GFS2: fsid=%s: fatal: assertion \"%s\" failed\n" \ > "GFS2: fsid=%s: function = %s\n"\ > "GFS2: fsid=%s: file = %s, line = %u\n" \ > "GFS2: fsid=%s: time = %lu\n", \ > sdp->sd_fsname, # assertion, \ > sdp->sd_fsname, __FUNCTION__,\ > sdp->sd_fsname, __FILE__, __LINE__, \ > sdp->sd_fsname, get_seconds()); \ > BUG();\ You will already get the __FUNCTION__ (and hence the __FILE__ info) directly from the BUG() dump, as well as the time from the syslog message (turn on the printk timestamps if you want a more fine grain timestamp), so the majority of this macro is redundant with the BUG() macro... thanks, greg k-h - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: GFS, what's remaining
On Thu, Sep 01, 2005 at 01:21:04PM -0700, Andrew Morton wrote: > Alan Cox <[EMAIL PROTECTED]> wrote: > > > - Why GFS is better than OCFS2, or has functionality which OCFS2 cannot > > > possibly gain (or vice versa) > > > > > > - Relative merits of the two offerings > > > > You missed the important one - people actively use it and have been for > > some years. Same reason with have NTFS, HPFS, and all the others. On > > that alone it makes sense to include. > > Again, that's not a technical reason. It's _a_ reason, sure. But what are > the technical reasons for merging gfs[2], ocfs2, both or neither? > > If one can be grown to encompass the capabilities of the other then we're > left with a bunch of legacy code and wasted effort. GFS is an established fs, it's not going away, you'd be hard pressed to find a more widely used cluster fs on Linux. GFS is about 10 years old and has been in use by customers in production environments for about 5 years. It is a mature, stable file system with many features that have been technically refined over years of experience and customer/user feedback. The latest development cycle (GFS2) has focussed on improving performance, it's not a new file system -- the "2" indicates that it's not ondisk compatible with earlier versions. OCFS2 is a new file system. I expect they'll want to optimize for their own unique goals. When OCFS appeared everyone I know accepted it would coexist with GFS, each in their niche like every other fs. That's good, OCFS and GFS help each other technically even though they may eventually compete in some areas (which can also be good.) Dave Here's a random summary of technical features: - cluster infrastructure: a lot of work, perhaps as much as gfs itself, has gone into the infrastructure surrounding and supporting gfs - cluster infrastructure allows for easy cooperation with CLVM - interchangable lock/cluster modules: gfs interacts with the external infrastructure, including lock manager, through an interchangable module allowing the fs to be adapted to different environments. - a "nolock" module can be plugged in to use gfs as a local fs (can be selected at mount time, so any fs can be mounted locally) - quotas, acls, cluster flocks, direct io, data journaling, ordered/writeback journaling modes -- all supported - gfs transparently switches to a different locking scheme for direct io allowing parallel non-allocating writes with no lock contention - posix locks -- supported, although it's being reworked for better performance right now - asynchronous locking, lock prefetching + read-ahead - coherent shared-writeable memory mappings across the cluster - nfs3 support (multiple nfs servers exporting one gfs is very common) - extend fs online, add journals online - full fs quiesce to allow for block level snapshot below gfs - read-only mount - "specatator" mount (like ro but no journal allocated for the mount, no fencing needed for failed node that was mounted as specatator) - infrastructure in place for live ondisk inode migration, fs shrink - stuffed dinodes, small files are stored in the disk inode block - tunable (fuzzy) atime updates - fast, nondisruptive stat on files during non-allocating direct-io - fast, nondisruptive statfs (df) even during heavy fs usage - friendly handling of io errors: shut down fs and withdraw from cluster - largest GFS cluster deployed was around 200 nodes, most are much smaller - use many GFS file systems at once on a node and in a cluster - customers use GFS for: scientific apps, HA, NFS serving, database, others I'm sure - graphical management tools for gfs, clvm, and the cluster infrastruture exist and are improving quickly - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: GFS, what's remaining
On Fri, Sep 02, 2005 at 11:17:08PM +0200, Andi Kleen wrote: > The only thing that should be probably resolved is a common API > for at least the clustered lock manager. Having multiple > incompatible user space APIs for that would be sad. As far as userspace dlm apis go, dlmfs already abstracts away a large part of the dlm interaction, so writing a module against another dlm looks like it wouldn't be too bad (startup of a lockspace is probably the most difficult part there). --Mark -- Mark Fasheh Senior Software Developer, Oracle [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: GFS, what's remaining
I have to correct an error in perspective, or at least in the wording of it, in the following, because it affects how people see the big picture in trying to decide how the filesystem types in question fit into the world: >Shared storage can be more efficient than network file >systems like NFS because the storage access is often more efficient >than network access The shared storage access _is_ network access. In most cases, it's a fibre channel/FCP network. Nowadays, it's more and more common for it to be a TCP/IP network just like the one folks use for NFS (but carrying ISCSI instead of NFS). It's also been done with a handful of other TCP/IP-based block storage protocols. The reason the storage access is expected to be more efficient than the NFS access is because the block access network protocols are supposed to be more efficient than the file access network protocols. In reality, I'm not sure there really is such a difference in efficiency between the protocols. The demonstrated differences in efficiency, or at least in speed, are due to other things that are different between a given new shared block implementation and a given old shared file implementation. But there's another advantage to shared block over shared file that hasn't been mentioned yet: some people find it easier to manage a pool of blocks than a pool of filesystems. >it is more reliable because it doesn't have a >single point of failure in form of the NFS server. This advantage isn't because it's shared (block) storage, but because it's a distributed filesystem. There are shared storage filesystems (e.g. IBM SANFS, ADIC StorNext) that have a centralized metadata or locking server that makes them unreliable (or unscalable) in the same ways as an NFS server. -- Bryan Henderson IBM Almaden Research Center San Jose CA Filesystems - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: GFS, what's remaining
Andrew Morton <[EMAIL PROTECTED]> writes: > > > > - Why GFS is better than OCFS2, or has functionality which OCFS2 cannot > > > possibly gain (or vice versa) > > > > > > - Relative merits of the two offerings > > > > You missed the important one - people actively use it and have been for > > some years. Same reason with have NTFS, HPFS, and all the others. On > > that alone it makes sense to include. > > Again, that's not a technical reason. It's _a_ reason, sure. But what are > the technical reasons for merging gfs[2], ocfs2, both or neither? There seems to be clearly a need for a shared-storage fs of some sort for HA clusters and virtualized usage (multiple guests sharing a partition). Shared storage can be more efficient than network file systems like NFS because the storage access is often more efficient than network access and it is more reliable because it doesn't have a single point of failure in form of the NFS server. It's also a logical extension of the "failover on failure" clusters many people run now - instead of only failing over the shared fs at failure and keeping one machine idle the load can be balanced between multiple machines at any time. One argument to merge both might be that nobody really knows yet which shared-storage file system (GFS or OCFS2) is better. The only way to find out would be to let the user base try out both, and that's most practical when they're merged. Personally I think ocfs2 has nicer&cleaner code than GFS. It seems to be more or less a 64bit ext3 with cluster support, while GFS seems to reinvent a lot more things and has somewhat uglier code. On the other hand GFS' cluster support seems to be more aimed at being a universal cluster service open for other usages too, which might be a good thing. OCFS2s cluster seems to be more aimed at only serving the file system. But which one works better in practice is really an open question. The only thing that should be probably resolved is a common API for at least the clustered lock manager. Having multiple incompatible user space APIs for that would be sad. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: GFS, what's remaining
On Fri, 2 September 2005 17:44:03 +0800, David Teigland wrote: > On Thu, Sep 01, 2005 at 01:35:23PM +0200, Arjan van de Ven wrote: > > > + gfs2_assert(gl->gl_sbd, atomic_read(&gl->gl_count) > 0,); > > > what is gfs2_assert() about anyway? please just use BUG_ON directly > > everywhere > > When a machine has many gfs file systems mounted at once it can be useful > to know which one failed. Does the following look ok? > > #define gfs2_assert(sdp, assertion) \ > do { \ > if (unlikely(!(assertion))) { \ > printk(KERN_ERR \ > "GFS2: fsid=%s: fatal: assertion \"%s\" failed\n" \ > "GFS2: fsid=%s: function = %s\n"\ > "GFS2: fsid=%s: file = %s, line = %u\n" \ > "GFS2: fsid=%s: time = %lu\n", \ > sdp->sd_fsname, # assertion, \ > sdp->sd_fsname, __FUNCTION__,\ > sdp->sd_fsname, __FILE__, __LINE__, \ > sdp->sd_fsname, get_seconds()); \ > BUG();\ > } \ > } while (0) That's a lot of string constants. I'm not sure how smart current versions of gcc are, but older ones created a new constant for each invocation of such a macro, iirc. So you might want to move the code out of line. Jörn -- There's nothing better for promoting creativity in a medium than making an audience feel "Hmm I could do better than that!" -- Douglas Adams in a slashdot interview - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: GFS, what's remaining
On Thu, Sep 01, 2005 at 01:35:23PM +0200, Arjan van de Ven wrote: > + gfs2_assert(gl->gl_sbd, atomic_read(&gl->gl_count) > 0,); > what is gfs2_assert() about anyway? please just use BUG_ON directly > everywhere When a machine has many gfs file systems mounted at once it can be useful to know which one failed. Does the following look ok? #define gfs2_assert(sdp, assertion) \ do { \ if (unlikely(!(assertion))) { \ printk(KERN_ERR \ "GFS2: fsid=%s: fatal: assertion \"%s\" failed\n" \ "GFS2: fsid=%s: function = %s\n"\ "GFS2: fsid=%s: file = %s, line = %u\n" \ "GFS2: fsid=%s: time = %lu\n", \ sdp->sd_fsname, # assertion, \ sdp->sd_fsname, __FUNCTION__,\ sdp->sd_fsname, __FILE__, __LINE__, \ sdp->sd_fsname, get_seconds()); \ BUG();\ } \ } while (0) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: GFS, what's remaining
On Thu, Sep 01, 2005 at 06:56:03PM +0100, Christoph Hellwig wrote: > Whether the gfs2 code is mergeable is a completely different question, > and it seems at least debatable to submit a filesystem for inclusion I actually asked what needs to be done for merging. We appreciate the feedback and are carefully studying and working on all of it as usual. We'd also appreciate help, of course, if that sounds interesting to anyone. Thanks Dave - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: GFS, what's remaining
Alan Cox <[EMAIL PROTECTED]> wrote: > > On Iau, 2005-09-01 at 03:59 -0700, Andrew Morton wrote: > > - Why the kernel needs two clustered fileystems > > So delete reiserfs4, FAT, VFAT, ext2, and all the other "junk". Well, we did delete intermezzo. I was looking for technical reasons, please. > > - Why GFS is better than OCFS2, or has functionality which OCFS2 cannot > > possibly gain (or vice versa) > > > > - Relative merits of the two offerings > > You missed the important one - people actively use it and have been for > some years. Same reason with have NTFS, HPFS, and all the others. On > that alone it makes sense to include. Again, that's not a technical reason. It's _a_ reason, sure. But what are the technical reasons for merging gfs[2], ocfs2, both or neither? If one can be grown to encompass the capabilities of the other then we're left with a bunch of legacy code and wasted effort. I'm not saying it's wrong. But I'd like to hear the proponents explain why it's right, please. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [Linux-cluster] Re: GFS, what's remaining
I just started looking at gfs. To understand it you'd need to look at it from the entire cluster solution point of view. This is a good document from David. It's not about GFS in particular but about the architecture of the cluster. http://people.redhat.com/~teigland/sca.pdf Hua > -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of > Christoph Hellwig > Sent: Thursday, September 01, 2005 10:56 AM > To: Alan Cox > Cc: Christoph Hellwig; Andrew Morton; > linux-fsdevel@vger.kernel.org; [EMAIL PROTECTED]; > linux-kernel@vger.kernel.org > Subject: [Linux-cluster] Re: GFS, what's remaining > > On Thu, Sep 01, 2005 at 04:28:30PM +0100, Alan Cox wrote: > > > That's GFS. The submission is about a GFS2 that's > on-disk incompatible > > > to GFS. > > > > Just like say reiserfs3 and reiserfs4 or ext and ext2 or > ext2 and ext3 > > then. I think the main point still stands - we have always taken > > multiple file systems on board and we have benefitted > enormously from > > having the competition between them instead of a dictat > from the kernel > > kremlin that 'foofs is the one true way' > > I didn't say anything agains a particular fs, just that your previous > arguments where utter nonsense. In fact I think having two > or more cluster > filesystems in the tree is a good thing. Whether the gfs2 > code is mergeable > is a completely different question, and it seems at least debatable to > submit a filesystem for inclusion that's still pretty new. > > While we're at it I can't find anything describing what gfs2 is about, > what is lacking in gfs, what structual changes did you make, etc.. > > p.s. why is gfs2 in fs/gfs in the kernel tree? > > -- > Linux-cluster mailing list > [EMAIL PROTECTED] > http://www.redhat.com/mailman/listinfo/linux-cluster - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: GFS, what's remaining
On Thu, Sep 01, 2005 at 04:28:30PM +0100, Alan Cox wrote: > > That's GFS. The submission is about a GFS2 that's on-disk incompatible > > to GFS. > > Just like say reiserfs3 and reiserfs4 or ext and ext2 or ext2 and ext3 > then. I think the main point still stands - we have always taken > multiple file systems on board and we have benefitted enormously from > having the competition between them instead of a dictat from the kernel > kremlin that 'foofs is the one true way' I didn't say anything agains a particular fs, just that your previous arguments where utter nonsense. In fact I think having two or more cluster filesystems in the tree is a good thing. Whether the gfs2 code is mergeable is a completely different question, and it seems at least debatable to submit a filesystem for inclusion that's still pretty new. While we're at it I can't find anything describing what gfs2 is about, what is lacking in gfs, what structual changes did you make, etc.. p.s. why is gfs2 in fs/gfs in the kernel tree? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: GFS, what's remaining
On Thursday 01 September 2005 06:46, David Teigland wrote: > I'd like to get a list of specific things remaining for merging. Where are the benchmarks and stability analysis? How many hours does it survive cerberos running on all nodes simultaneously? Where are the testimonials from users? How long has there been a gfs2 filesystem? Note that Reiser4 is still not in mainline a year after it was first offered, why do you think gfs2 should be in mainline after one month? So far, all catches are surface things like bogus spinlocks. Substantive issues have not even begun to be addressed. Patience please, this is going to take a while. Regards, Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: GFS, what's remaining
On Thursday 01 September 2005 10:49, Alan Cox wrote: > On Iau, 2005-09-01 at 03:59 -0700, Andrew Morton wrote: > > - Why GFS is better than OCFS2, or has functionality which OCFS2 cannot > > possibly gain (or vice versa) > > > > - Relative merits of the two offerings > > You missed the important one - people actively use it and have been for > some years. Same reason with have NTFS, HPFS, and all the others. On > that alone it makes sense to include. I thought that gfs2 just appeared last month. Or is it really still just gfs? If there are substantive changes from gfs to gfs2 then obviously they have had practically zero testing, let alone posted benchmarks, testimonials, etc. If it is really still just gfs then the silly-rename should be undone. Regards, Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: GFS, what's remaining
On 2005-09-01T16:28:30, Alan Cox <[EMAIL PROTECTED]> wrote: > Competition will decide if OCFS or GFS is better, or indeed if someone > comes along with another contender that is better still. And competition > will probably get the answer right. Competition will come up with the same situation like reiserfs and ext3 and XFS, namely that they'll all be maintained going forward because of, uhm, political constraints ;-) But then, as long as they _are_ maintained and play along nicely with eachother (which, btw, is needed already so that at least data can be migrated...), I don't really see a problem of having two or three. > The only thing that is important is we don't end up with each cluster fs > wanting different core VFS interfaces added. Indeed. Sincerely, Lars Marowsky-Brée <[EMAIL PROTECTED]> -- High Availability & Clustering SUSE Labs, Research and Development SUSE LINUX Products GmbH - A Novell Business -- Charles Darwin "Ignorance more frequently begets confidence than does knowledge" - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: GFS, what's remaining
> That's GFS. The submission is about a GFS2 that's on-disk incompatible > to GFS. Just like say reiserfs3 and reiserfs4 or ext and ext2 or ext2 and ext3 then. I think the main point still stands - we have always taken multiple file systems on board and we have benefitted enormously from having the competition between them instead of a dictat from the kernel kremlin that 'foofs is the one true way' Competition will decide if OCFS or GFS is better, or indeed if someone comes along with another contender that is better still. And competition will probably get the answer right. The only thing that is important is we don't end up with each cluster fs wanting different core VFS interfaces added. Alan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: GFS, what's remaining
On Iau, 2005-09-01 at 03:59 -0700, Andrew Morton wrote: > - Why the kernel needs two clustered fileystems So delete reiserfs4, FAT, VFAT, ext2, and all the other "junk". > - Why GFS is better than OCFS2, or has functionality which OCFS2 cannot > possibly gain (or vice versa) > > - Relative merits of the two offerings You missed the important one - people actively use it and have been for some years. Same reason with have NTFS, HPFS, and all the others. On that alone it makes sense to include. Alan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: GFS, what's remaining
On Thu, Sep 01, 2005 at 03:49:18PM +0100, Alan Cox wrote: > > - Why GFS is better than OCFS2, or has functionality which OCFS2 cannot > > possibly gain (or vice versa) > > > > - Relative merits of the two offerings > > You missed the important one - people actively use it and have been for > some years. Same reason with have NTFS, HPFS, and all the others. On > that alone it makes sense to include. That's GFS. The submission is about a GFS2 that's on-disk incompatible to GFS. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: GFS, what's remaining
On 9/1/05, David Teigland <[EMAIL PROTECTED]> wrote: > - Adapt the vfs so gfs (and other cfs's) don't need to walk vma lists. > [cf. ops_file.c:walk_vm(), gfs works fine as is, but some don't like it.] It works fine only if you don't care about playing well with other clustered filesystems. Pekka - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: GFS, what's remaining
On Thu, 2005-09-01 at 18:46 +0800, David Teigland wrote: > Hi, this is the latest set of gfs patches, it includes some minor munging > since the previous set. Andrew, could this be added to -mm? there's not > much in the way of pending changes. > > http://redhat.com/~teigland/gfs2/20050901/gfs2-full.patch > http://redhat.com/~teigland/gfs2/20050901/broken-out/ +static inline void glock_put(struct gfs2_glock *gl) +{ + if (atomic_read(&gl->gl_count) == 1) + gfs2_glock_schedule_for_reclaim(gl); + gfs2_assert(gl->gl_sbd, atomic_read(&gl->gl_count) > 0,); + atomic_dec(&gl->gl_count); +} this code has a race what is gfs2_assert() about anyway? please just use BUG_ON directly everywhere +static inline int queue_empty(struct gfs2_glock *gl, struct list_head *head) +{ + int empty; + spin_lock(&gl->gl_spin); + empty = list_empty(head); + spin_unlock(&gl->gl_spin); + return empty; +} that looks like a racey interface to me... if so.. why bother locking at all? +void gfs2_glock_hold(struct gfs2_glock *gl) +{ + glock_hold(gl); +} eh why? +struct gfs2_holder *gfs2_holder_get(struct gfs2_glock *gl, unsigned int state, + int flags, int gfp_flags) +{ + struct gfs2_holder *gh; + + gh = kmalloc(sizeof(struct gfs2_holder), GFP_KERNEL | gfp_flags); this looks odd. Either you take flags or you don't.. this looks really half arsed and thus is really surprising to all callers static int gi_skeleton(struct gfs2_inode *ip, struct gfs2_ioctl *gi, + gi_filler_t filler) +{ + unsigned int size = gfs2_tune_get(ip->i_sbd, gt_lockdump_size); + char *buf; + unsigned int count = 0; + int error; + + if (size > gi->gi_size) + size = gi->gi_size; + + buf = kmalloc(size, GFP_KERNEL); + if (!buf) + return -ENOMEM; + + error = filler(ip, gi, buf, size, &count); + if (error) + goto out; + + if (copy_to_user(gi->gi_data, buf, count + 1)) + error = -EFAULT; where does count get a sensible value? +static unsigned int handle_roll(atomic_t *a) +{ + int x = atomic_read(a); + if (x < 0) { + atomic_set(a, 0); + return 0; + } + return (unsigned int)x; +} this is just plain scary. you'll have to post the rest of your patches if you want anyone to look at them... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: GFS, what's remaining
David Teigland <[EMAIL PROTECTED]> wrote: > > Hi, this is the latest set of gfs patches, it includes some minor munging > since the previous set. Andrew, could this be added to -mm? Dumb question: why? Maybe I was asleep, but I don't recall seeing much discussion or exposition of - Why the kernel needs two clustered fileystems - Why GFS is better than OCFS2, or has functionality which OCFS2 cannot possibly gain (or vice versa) - Relative merits of the two offerings etc. Maybe this has all been thrashed out and agreed to. If so, please remind me. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: GFS, what's remaining
On Thu, 2005-09-01 at 18:46 +0800, David Teigland wrote: > Hi, this is the latest set of gfs patches, it includes some minor munging > since the previous set. Andrew, could this be added to -mm? there's not > much in the way of pending changes. can you post them here instead so that they can be actually reviewed? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
GFS, what's remaining
Hi, this is the latest set of gfs patches, it includes some minor munging since the previous set. Andrew, could this be added to -mm? there's not much in the way of pending changes. http://redhat.com/~teigland/gfs2/20050901/gfs2-full.patch http://redhat.com/~teigland/gfs2/20050901/broken-out/ I'd like to get a list of specific things remaining for merging. I believe we've responded to everything from earlier reviews, they were very helpful and more would be excellent. The list begins with one item from before that's still pending: - Adapt the vfs so gfs (and other cfs's) don't need to walk vma lists. [cf. ops_file.c:walk_vm(), gfs works fine as is, but some don't like it.] ... Thanks Dave - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/