Re: [LSF/MM TOPIC] [ATTEND] persistent memory progress, management of storage & file systems
On Mon, Jan 06, 2014 at 05:32:56PM -0500, faibish, sorin wrote: > Speaking of persistent memory I would like to discuss the PMFS as well as > RDMA aspects of the persistent memory model. Also I would like to discuss KV > stores and object stores on persistent memory. I was involved in the PMFS as > a tester and I found several issues that I would like to discuss with the > community. I assume that maybe others from Intel could join this discussion > except for Andy and Matt which already asked for this topic. Thanks Ooh, and the cluster/remote filesystem stories there (eg, RDMA etc) are probably pretty cool. Joel > > ./Sorin > > -Original Message- > From: linux-fsdevel-ow...@vger.kernel.org > [mailto:linux-fsdevel-ow...@vger.kernel.org] On Behalf Of Ric Wheeler > Sent: Monday, January 06, 2014 5:21 PM > To: linux-scsi@vger.kernel.org; linux-...@vger.kernel.org; > linux...@kvack.org; linux-fsde...@vger.kernel.org; > lsf...@lists.linux-foundation.org > Cc: linux-ker...@vger.kernel.org > Subject: [LSF/MM TOPIC] [ATTEND] persistent memory progress, management of > storage & file systems > > > I would like to attend this year and continue to talk about the work on > enabling the new class of persistent memory devices. Specifically, very > interested in talking about both using a block driver under our existing > stack and also progress at the file system layer (adding xip/mmap tweaks to > existing file systems and looking at new file systems). > > We also have a lot of work left to do on unifying management, it would be > good to resync on that. > > Regards, > > Ric > > -- > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in > the body of a message to majord...@vger.kernel.org More majordomo info at > http://vger.kernel.org/majordomo-info.html > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- The herd instinct among economists makes sheep look like independant thinkers. http://www.jlbec.org/ jl...@evilplan.org -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [LSF/MM TOPIC] really large storage sectors - going beyond 4096 bytes
On Tue, Jan 21, 2014 at 10:04:29PM -0500, Ric Wheeler wrote: > One topic that has been lurking forever at the edges is the current > 4k limitation for file system block sizes. Some devices in > production today and others coming soon have larger sectors and it > would be interesting to see if it is time to poke at this topic > again. > > LSF/MM seems to be pretty much the only event of the year that most > of the key people will be present, so should be a great topic for a > joint session. Oh yes, I want in on this. We handle 4k/16k/64k pages "seamlessly," and we would want to do the same for larger sectors. In theory, our code should handle it with the appropriate defines updated. Joel -- -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Lsf-pc] [LSF/MM TOPIC] really large storage sectors - going beyond 4096 bytes
On Thu, Jan 23, 2014 at 07:55:50AM -0500, Theodore Ts'o wrote: > On Thu, Jan 23, 2014 at 07:35:58PM +1100, Dave Chinner wrote: > > > > > > I expect it would be relatively simple to get large blocksizes working > > > on powerpc with 64k PAGE_SIZE. So before diving in and doing huge > > > amounts of work, perhaps someone can do a proof-of-concept on powerpc > > > (or ia64) with 64k blocksize. > > > > Reality check: 64k block sizes on 64k page Linux machines has been > > used in production on XFS for at least 10 years. It's exactly the > > same case as 4k block size on 4k page size - one page, one buffer > > head, one filesystem block. > > This is true for ext4 as well. Block size == page size support is > pretty easy; the hard part is when block size > page size, due to > assumptions in the VM layer that requires that FS system needs to do a > lot of extra work to fudge around. So the real problem comes with > trying to support 64k block sizes on a 4k page architecture, and can > we do it in a way where every single file system doesn't have to do > their own specific hacks to work around assumptions made in the VM > layer. Yup, ditto for ocfs2. Joel -- "One of the symptoms of an approaching nervous breakdown is the belief that one's work is terribly important." - Bertrand Russell http://www.jlbec.org/ jl...@evilplan.org -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Lsf-pc] [LSF/MM TOPIC] really large storage sectors - going beyond 4096 bytes
On Wed, Jan 22, 2014 at 10:47:01AM -0800, James Bottomley wrote: > On Wed, 2014-01-22 at 18:37 +, Chris Mason wrote: > > On Wed, 2014-01-22 at 10:13 -0800, James Bottomley wrote: > > > On Wed, 2014-01-22 at 18:02 +, Chris Mason wrote: > [agreement cut because it's boring for the reader] > > > Realistically, if you look at what the I/O schedulers output on a > > > standard (spinning rust) workload, it's mostly large transfers. > > > Obviously these are misalgned at the ends, but we can fix some of that > > > in the scheduler. Particularly if the FS helps us with layout. My > > > instinct tells me that we can fix 99% of this with layout on the FS + io > > > schedulers ... the remaining 1% goes to the drive as needing to do RMW > > > in the device, but the net impact to our throughput shouldn't be that > > > great. > > > > There are a few workloads where the VM and the FS would team up to make > > this fairly miserable > > > > Small files. Delayed allocation fixes a lot of this, but the VM doesn't > > realize that fileA, fileB, fileC, and fileD all need to be written at > > the same time to avoid RMW. Btrfs and MD have setup plugging callbacks > > to accumulate full stripes as much as possible, but it still hurts. > > > > Metadata. These writes are very latency sensitive and we'll gain a lot > > if the FS is explicitly trying to build full sector IOs. > > OK, so these two cases I buy ... the question is can we do something > about them today without increasing the block size? > > The metadata problem, in particular, might be block independent: we > still have a lot of small chunks to write out at fractured locations. > With a large block size, the FS knows it's been bad and can expect the > rolled up newspaper, but it's not clear what it could do about it. > > The small files issue looks like something we should be tackling today > since writing out adjacent files would actually help us get bigger > transfers. ocfs2 can actually take significant advantage here, because we store small file data in-inode. This would grow our in-inode size from ~3K to ~15K or ~63K. We'd actually have to do more work to start putting more than one inode in a block (thought that would be a promising avenue too once the coordination is solved generically. Joel -- "One of the symptoms of an approaching nervous breakdown is the belief that one's work is terribly important." - Bertrand Russell http://www.jlbec.org/ jl...@evilplan.org -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [LSF/MM TOPIC][ATTEND] protection information and userspace
On Wed, Feb 06, 2013 at 03:34:49PM -0500, Chuck Lever wrote: > > On Feb 6, 2013, at 3:24 PM, "Darrick J. Wong" wrote: > > > On Wed, Feb 06, 2013 at 01:51:22PM -0600, Ben Myers wrote: > >> Hi, > >> > >> I'm interested in discussing how to pass protection information to and from > >> userspace. Maybe Martin could be enlisted for the discussion. > >> > >> I read that some work has already been done in this area but have not been > >> able > >> to locate it. It looks like the bio-integrity code already makes it > >> possible > >> to generate the t10-dif crc in the filesystem. It would be good to be > >> able to > >> get the guard and application tags back out to backup applications such as > >> xfsdump. Enabling other applications to generate their own tags in > >> userspace > >> is also interesting. > > > > This one's been on my list for a couple of years (and companies) too. A few > > years ago Joel Becker had support for it in his sys_dio proposal (that > > hasn't > > gone anywhere), and more recently I've theorized that we could add a magic > > fcntl/ioctl to make the kernel recognize, say, the first iovec of a O_DIRECT > > *{read,write}v call as the PI buffer, which I think is similar to how DIX > > gets > > PI data to a disk. But it's not like I have any code to show for it. > > > > I /think/ it's fairly straightforward to change the directio submit code to > > find the userspace PI buffer and amend the block integrity code to attach > > our > > own PI buffer. You'd still have to let the block layer set the sector # > > field, > > but afaik that won't affect the crc or the app tag. > > > > I hear that the NFS guys want to propose some sort of protocol for > > transmitting > > PI data (across NFS), but I haven't seen anything concrete yet. > > I'm writing a requirements document for the NFS protocol which I can discuss > at LSF. The use cases for NFS for now would be virtual disk devices > (hypervisors) or direct NFS access to storage from user space. > > Like everyone else we are waiting for a magical VFS and user space API to > appear that can pass PI to and from storage. I'm happy to chat about it. Unfortunately, like Darrick says, sys_dio() coding hasn't happened. I do think we're better off with some kind of explicit API than some magic state on the file. I mean, even something like: ssize_t write_with_pi(int fd, const void *buf, size_t count, const void *pi, size_t pi_count); It's not as nice as a non-historical API (eg sys_dio), but it also probably plays nicer with buffered I/O. Joel > > > Well, I hope I'll scrape together the time to hack together a PoC before > > LSF... > > on the other hand, I ran the discussion about PI userland interfaces at > > LPC2011 > > and (shamefully) haven't done anything yet. > > > > > > > > --D > >> > >> Regards, > >>Ben > >> -- > >> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in > >> the body of a message to majord...@vger.kernel.org > >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in > > the body of a message to majord...@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- > Chuck Lever > chuck[dot]lever[at]oracle[dot]com > > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- "I think it would be a good idea." - Mahatma Ghandi, when asked what he thought of Western civilization http://www.jlbec.org/ jl...@evilplan.org -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [LSF/MM TOPIC][ATTEND] protection information and userspace
Dear LSF committee, I'd like to explicitly request attendance for this discussion :-) Joel On Thu, Feb 07, 2013 at 09:27:35AM -0800, Zach Brown wrote: > On Thu, Feb 07, 2013 at 11:19:59AM -0500, Jeff Moyer wrote: > > Boaz Harrosh writes: > > >> > > >> For aio we just need to add additional fields to an existing structure. > > >> > > >> So yeah, I'd be interested in that discussion as well. > > > > Sure, it's easy to start there, but then you eventually end up having to > > add a non-aio interface as well. Let's not take the latter off the > > table. > > I agree that a sync variant should't be ignored, but needing a sync > interface with PI arguments also shouldn't get in the way of adding > support to the aio+dio path. Simply because it's what people use :/. > > > I'm not sure how that's directly related to aio, but ok. If we're going > > to rewrite the aio code, I think Zach's acall would be a good start, at > > least on the API front: > > http://lwn.net/Articles/316806/ > > Yeah, I'm happy to chat about this stuff if people are interested. I > think I'd do things differently today than what was done in that aged > acall prototype. > > - z > -- > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- "You can get more with a kind word and a gun than you can with a kind word alone." - Al Capone http://www.jlbec.org/ jl...@evilplan.org -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [LSF/MM TOPIC][ATTEND] protection information and userspace
On Thu, Feb 07, 2013 at 02:12:57PM -0500, Martin K. Petersen wrote: > >>>>> "Joel" == Joel Becker writes: > > Joel> I'm happy to chat about it. Unfortunately, like Darrick says, > Joel> sys_dio() coding hasn't happened. I do think we're better off > Joel> with some kind of explicit API than some magic state on the file. > Joel> I mean, even something like: > > Joel> ssize_t write_with_pi(int fd, const void *buf, size_t count, > Joel> const void *pi, size_t pi_count); > > Joel> It's not as nice as a non-historical API (eg sys_dio), but it also > Joel> probably plays nicer with buffered I/O. > > Pretty much everyone I have talked to that are interested in explicitly > attaching PI (as opposed to relying on the kernel doing it) are using > Linux aio. > > I am not opposed to having more read()/write() like interface as > well. But I think it's important to cater to the I/O paradigm used by > the applications interested in this. It's a lot easier to tweak a few > IOCB fields than it is to rewrite how an application does I/O. You know I'm not going to argue with this. I was merely stating that I'm flexible in how we start :-) Joel > > -- > Martin K. PetersenOracle Linux Engineering > -- > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- "Depend on the rabbit's foot if you will, but remember, it didn't help the rabbit." - R. E. Shay http://www.jlbec.org/ jl...@evilplan.org -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[LSF/MM TOPIC][ATTEND] protection information batched I/O interfaces.
I'm definitely interested in attending to discuss PI injection from userspace, batched I/O interfaces, and potential O_DIRECT cleanups. Joel -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [LSF/MM TOPIC][ATTEND] protection information and userspace
On Thu, Feb 07, 2013 at 04:04:36PM -0500, J. Bruce Fields wrote: > On Thu, Feb 07, 2013 at 09:36:39AM -0800, Joel Becker wrote: > > Dear LSF committee, > > I'd like to explicitly request attendance for this discussion > > :-) > > http://marc.info/?l=linux-fsdevel&m=135894412908342&w=2 > > "Also, the way I compile the list of requests is from thread > heads ... that means don't send your attendee request as a > reply to something else either otherwise it might get missed." Ack. Send as such. Thanks, Joel > > --b. > > > > > Joel > > > > On Thu, Feb 07, 2013 at 09:27:35AM -0800, Zach Brown wrote: > > > On Thu, Feb 07, 2013 at 11:19:59AM -0500, Jeff Moyer wrote: > > > > Boaz Harrosh writes: > > > > >> > > > > >> For aio we just need to add additional fields to an existing > > > > >> structure. > > > > >> > > > > >> So yeah, I'd be interested in that discussion as well. > > > > > > > > Sure, it's easy to start there, but then you eventually end up having to > > > > add a non-aio interface as well. Let's not take the latter off the > > > > table. > > > > > > I agree that a sync variant should't be ignored, but needing a sync > > > interface with PI arguments also shouldn't get in the way of adding > > > support to the aio+dio path. Simply because it's what people use :/. > > > > > > > I'm not sure how that's directly related to aio, but ok. If we're going > > > > to rewrite the aio code, I think Zach's acall would be a good start, at > > > > least on the API front: > > > > http://lwn.net/Articles/316806/ > > > > > > Yeah, I'm happy to chat about this stuff if people are interested. I > > > think I'd do things differently today than what was done in that aged > > > acall prototype. > > > > > > - z > > > -- > > > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in > > > the body of a message to majord...@vger.kernel.org > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > -- > > > > "You can get more with a kind word and a gun than you can with > > a kind word alone." > > - Al Capone > > > > http://www.jlbec.org/ > > jl...@evilplan.org > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in > > the body of a message to majord...@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- "You look in her eyes, the music begins to play. Hopeless romantics, here we go again." http://www.jlbec.org/ jl...@evilplan.org -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html