Re: [LSF/MM TOPIC] [ATTEND] persistent memory progress, management of storage & file systems

2014-01-07 Thread Joel Becker
On Mon, Jan 06, 2014 at 05:32:56PM -0500, faibish, sorin wrote:
> Speaking of persistent memory I would like to discuss the PMFS as well as 
> RDMA aspects of the persistent memory model. Also I would like to discuss KV 
> stores and object stores on persistent memory. I was involved in the PMFS as 
> a tester and I found several issues that I would like to discuss with the 
> community. I assume that maybe others from Intel could join this discussion 
> except for Andy and Matt which already asked for this topic. Thanks

Ooh, and the cluster/remote filesystem stories there (eg, RDMA etc) are
probably pretty cool.

Joel

> 
> ./Sorin
> 
> -Original Message-
> From: linux-fsdevel-ow...@vger.kernel.org 
> [mailto:linux-fsdevel-ow...@vger.kernel.org] On Behalf Of Ric Wheeler
> Sent: Monday, January 06, 2014 5:21 PM
> To: linux-scsi@vger.kernel.org; linux-...@vger.kernel.org; 
> linux...@kvack.org; linux-fsde...@vger.kernel.org; 
> lsf...@lists.linux-foundation.org
> Cc: linux-ker...@vger.kernel.org
> Subject: [LSF/MM TOPIC] [ATTEND] persistent memory progress, management of 
> storage & file systems
> 
> 
> I would like to attend this year and continue to talk about the work on 
> enabling the new class of persistent memory devices. Specifically, very 
> interested in talking about both using a block driver under our existing 
> stack and also progress at the file system layer (adding xip/mmap tweaks to 
> existing file systems and looking at new file systems).
> 
> We also have a lot of work left to do on unifying management, it would be 
> good to resync on that.
> 
> Regards,
> 
> Ric
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in 
> the body of a message to majord...@vger.kernel.org More majordomo info at  
> http://vger.kernel.org/majordomo-info.html
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 

 The herd instinct among economists makes sheep look like
 independant thinkers.

http://www.jlbec.org/
jl...@evilplan.org
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [LSF/MM TOPIC] really large storage sectors - going beyond 4096 bytes

2014-01-21 Thread Joel Becker
On Tue, Jan 21, 2014 at 10:04:29PM -0500, Ric Wheeler wrote:
> One topic that has been lurking forever at the edges is the current
> 4k limitation for file system block sizes. Some devices in
> production today and others coming soon have larger sectors and it
> would be interesting to see if it is time to poke at this topic
> again.
> 
> LSF/MM seems to be pretty much the only event of the year that most
> of the key people will be present, so should be a great topic for a
> joint session.

Oh yes, I want in on this.  We handle 4k/16k/64k pages "seamlessly," and
we would want to do the same for larger sectors.  In theory, our code
should handle it with the appropriate defines updated.

Joel

-- 
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Lsf-pc] [LSF/MM TOPIC] really large storage sectors - going beyond 4096 bytes

2014-01-23 Thread Joel Becker
On Thu, Jan 23, 2014 at 07:55:50AM -0500, Theodore Ts'o wrote:
> On Thu, Jan 23, 2014 at 07:35:58PM +1100, Dave Chinner wrote:
> > > 
> > > I expect it would be relatively simple to get large blocksizes working
> > > on powerpc with 64k PAGE_SIZE.  So before diving in and doing huge
> > > amounts of work, perhaps someone can do a proof-of-concept on powerpc
> > > (or ia64) with 64k blocksize.
> > 
> > Reality check: 64k block sizes on 64k page Linux machines has been
> > used in production on XFS for at least 10 years. It's exactly the
> > same case as 4k block size on 4k page size - one page, one buffer
> > head, one filesystem block.
> 
> This is true for ext4 as well.  Block size == page size support is
> pretty easy; the hard part is when block size > page size, due to
> assumptions in the VM layer that requires that FS system needs to do a
> lot of extra work to fudge around.  So the real problem comes with
> trying to support 64k block sizes on a 4k page architecture, and can
> we do it in a way where every single file system doesn't have to do
> their own specific hacks to work around assumptions made in the VM
> layer.

Yup, ditto for ocfs2.

Joel

-- 

"One of the symptoms of an approaching nervous breakdown is the
 belief that one's work is terribly important."
 - Bertrand Russell 

http://www.jlbec.org/
jl...@evilplan.org
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Lsf-pc] [LSF/MM TOPIC] really large storage sectors - going beyond 4096 bytes

2014-01-23 Thread Joel Becker
On Wed, Jan 22, 2014 at 10:47:01AM -0800, James Bottomley wrote:
> On Wed, 2014-01-22 at 18:37 +, Chris Mason wrote:
> > On Wed, 2014-01-22 at 10:13 -0800, James Bottomley wrote:
> > > On Wed, 2014-01-22 at 18:02 +, Chris Mason wrote:
> [agreement cut because it's boring for the reader]
> > > Realistically, if you look at what the I/O schedulers output on a
> > > standard (spinning rust) workload, it's mostly large transfers.
> > > Obviously these are misalgned at the ends, but we can fix some of that
> > > in the scheduler.  Particularly if the FS helps us with layout.  My
> > > instinct tells me that we can fix 99% of this with layout on the FS + io
> > > schedulers ... the remaining 1% goes to the drive as needing to do RMW
> > > in the device, but the net impact to our throughput shouldn't be that
> > > great.
> > 
> > There are a few workloads where the VM and the FS would team up to make
> > this fairly miserable
> > 
> > Small files.  Delayed allocation fixes a lot of this, but the VM doesn't
> > realize that fileA, fileB, fileC, and fileD all need to be written at
> > the same time to avoid RMW.  Btrfs and MD have setup plugging callbacks
> > to accumulate full stripes as much as possible, but it still hurts.
> > 
> > Metadata.  These writes are very latency sensitive and we'll gain a lot
> > if the FS is explicitly trying to build full sector IOs.
> 
> OK, so these two cases I buy ... the question is can we do something
> about them today without increasing the block size?
> 
> The metadata problem, in particular, might be block independent: we
> still have a lot of small chunks to write out at fractured locations.
> With a large block size, the FS knows it's been bad and can expect the
> rolled up newspaper, but it's not clear what it could do about it.
> 
> The small files issue looks like something we should be tackling today
> since writing out adjacent files would actually help us get bigger
> transfers.

ocfs2 can actually take significant advantage here, because we store
small file data in-inode.  This would grow our in-inode size from ~3K to
~15K or ~63K.  We'd actually have to do more work to start putting more
than one inode in a block (thought that would be a promising avenue too
once the coordination is solved generically.

Joel


-- 

"One of the symptoms of an approaching nervous breakdown is the
 belief that one's work is terribly important."
 - Bertrand Russell 

http://www.jlbec.org/
jl...@evilplan.org
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [LSF/MM TOPIC][ATTEND] protection information and userspace

2013-02-07 Thread Joel Becker
On Wed, Feb 06, 2013 at 03:34:49PM -0500, Chuck Lever wrote:
> 
> On Feb 6, 2013, at 3:24 PM, "Darrick J. Wong"  wrote:
> 
> > On Wed, Feb 06, 2013 at 01:51:22PM -0600, Ben Myers wrote:
> >> Hi,
> >> 
> >> I'm interested in discussing how to pass protection information to and from
> >> userspace.  Maybe Martin could be enlisted for the discussion.
> >> 
> >> I read that some work has already been done in this area but have not been 
> >> able
> >> to locate it.  It looks like the bio-integrity code already makes it 
> >> possible
> >> to generate the t10-dif crc in the filesystem.  It would be good to be 
> >> able to
> >> get the guard and application tags back out to backup applications such as
> >> xfsdump.  Enabling other applications to generate their own tags in 
> >> userspace
> >> is also interesting.
> > 
> > This one's been on my list for a couple of years (and companies) too.  A few
> > years ago Joel Becker had support for it in his sys_dio proposal (that 
> > hasn't
> > gone anywhere), and more recently I've theorized that we could add a magic
> > fcntl/ioctl to make the kernel recognize, say, the first iovec of a O_DIRECT
> > *{read,write}v call as the PI buffer, which I think is similar to how DIX 
> > gets
> > PI data to a disk.  But it's not like I have any code to show for it.
> > 
> > I /think/ it's fairly straightforward to change the directio submit code to
> > find the userspace PI buffer and amend the block integrity code to attach 
> > our
> > own PI buffer.  You'd still have to let the block layer set the sector # 
> > field,
> > but afaik that won't affect the crc or the app tag.
> > 
> > I hear that the NFS guys want to propose some sort of protocol for 
> > transmitting
> > PI data (across NFS), but I haven't seen anything concrete yet.
> 
> I'm writing a requirements document for the NFS protocol which I can discuss 
> at LSF.  The use cases for NFS for now would be virtual disk devices 
> (hypervisors) or direct NFS access to storage from user space.
> 
> Like everyone else we are waiting for a magical VFS and user space API to 
> appear that can pass PI to and from storage.

I'm happy to chat about it.  Unfortunately, like Darrick says, sys_dio()
coding hasn't happened.  I do think we're better off with some kind of
explicit API than some magic state on the file.  I mean, even something
like:

ssize_t write_with_pi(int fd, const void *buf, size_t count,
  const void *pi, size_t pi_count);

It's not as nice as a non-historical API (eg sys_dio), but it also
probably plays nicer with buffered I/O.

Joel

> 
> > Well, I hope I'll scrape together the time to hack together a PoC before 
> > LSF...
> > on the other hand, I ran the discussion about PI userland interfaces at 
> > LPC2011
> > and (shamefully) haven't done anything yet.
> > 
> > 
> > 
> > --D
> >> 
> >> Regards,
> >>Ben
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> >> the body of a message to majord...@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> -- 
> Chuck Lever
> chuck[dot]lever[at]oracle[dot]com
> 
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 

"I think it would be a good idea."  
- Mahatma Ghandi, when asked what he thought of Western
  civilization

http://www.jlbec.org/
jl...@evilplan.org
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [LSF/MM TOPIC][ATTEND] protection information and userspace

2013-02-07 Thread Joel Becker
Dear LSF committee,
I'd like to explicitly request attendance for this discussion
:-)

Joel

On Thu, Feb 07, 2013 at 09:27:35AM -0800, Zach Brown wrote:
> On Thu, Feb 07, 2013 at 11:19:59AM -0500, Jeff Moyer wrote:
> > Boaz Harrosh  writes:
> > >> 
> > >> For aio we just need to add additional fields to an existing structure.
> > >> 
> > >> So yeah, I'd be interested in that discussion as well.
> > 
> > Sure, it's easy to start there, but then you eventually end up having to
> > add a non-aio interface as well.  Let's not take the latter off the
> > table.
> 
> I agree that a sync variant should't be ignored, but needing a sync
> interface with PI arguments also shouldn't get in the way of adding
> support to the aio+dio path.  Simply because it's what people use :/.
> 
> > I'm not sure how that's directly related to aio, but ok.  If we're going
> > to rewrite the aio code, I think Zach's acall would be a good start, at
> > least on the API front:
> >   http://lwn.net/Articles/316806/
> 
> Yeah, I'm happy to chat about this stuff if people are interested.  I
> think I'd do things differently today than what was done in that aged
> acall prototype.
> 
> - z
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 

"You can get more with a kind word and a gun than you can with
 a kind word alone."
 - Al Capone

http://www.jlbec.org/
jl...@evilplan.org
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [LSF/MM TOPIC][ATTEND] protection information and userspace

2013-02-08 Thread Joel Becker
On Thu, Feb 07, 2013 at 02:12:57PM -0500, Martin K. Petersen wrote:
> >>>>> "Joel" == Joel Becker  writes:
> 
> Joel> I'm happy to chat about it.  Unfortunately, like Darrick says,
> Joel> sys_dio() coding hasn't happened.  I do think we're better off
> Joel> with some kind of explicit API than some magic state on the file.
> Joel> I mean, even something like:
> 
> Joel> ssize_t write_with_pi(int fd, const void *buf, size_t count,
> Joel>   const void *pi, size_t pi_count);
> 
> Joel> It's not as nice as a non-historical API (eg sys_dio), but it also
> Joel> probably plays nicer with buffered I/O.
> 
> Pretty much everyone I have talked to that are interested in explicitly
> attaching PI (as opposed to relying on the kernel doing it) are using
> Linux aio.
> 
> I am not opposed to having more read()/write() like interface as
> well. But I think it's important to cater to the I/O paradigm used by
> the applications interested in this. It's a lot easier to tweak a few
> IOCB fields than it is to rewrite how an application does I/O.

You know I'm not going to argue with this.  I was merely stating that
I'm flexible in how we start :-)

Joel

> 
> -- 
> Martin K. PetersenOracle Linux Engineering
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 

"Depend on the rabbit's foot if you will, but remember, it didn't
 help the rabbit."
- R. E. Shay

http://www.jlbec.org/
jl...@evilplan.org
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[LSF/MM TOPIC][ATTEND] protection information batched I/O interfaces.

2013-02-08 Thread Joel Becker
I'm definitely interested in attending to discuss PI injection from
userspace, batched I/O interfaces, and potential O_DIRECT cleanups.

Joel

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [LSF/MM TOPIC][ATTEND] protection information and userspace

2013-02-08 Thread Joel Becker
On Thu, Feb 07, 2013 at 04:04:36PM -0500, J. Bruce Fields wrote:
> On Thu, Feb 07, 2013 at 09:36:39AM -0800, Joel Becker wrote:
> > Dear LSF committee,
> > I'd like to explicitly request attendance for this discussion
> > :-)
> 
> http://marc.info/?l=linux-fsdevel&m=135894412908342&w=2
> 
>   "Also, the way I compile the list of requests is from thread
>   heads ...  that means don't send your attendee request as a
>   reply to something else either otherwise it might get missed."

Ack.  Send as such.

Thanks,
Joel

> 
> --b.
> 
> > 
> > Joel
> > 
> > On Thu, Feb 07, 2013 at 09:27:35AM -0800, Zach Brown wrote:
> > > On Thu, Feb 07, 2013 at 11:19:59AM -0500, Jeff Moyer wrote:
> > > > Boaz Harrosh  writes:
> > > > >> 
> > > > >> For aio we just need to add additional fields to an existing 
> > > > >> structure.
> > > > >> 
> > > > >> So yeah, I'd be interested in that discussion as well.
> > > > 
> > > > Sure, it's easy to start there, but then you eventually end up having to
> > > > add a non-aio interface as well.  Let's not take the latter off the
> > > > table.
> > > 
> > > I agree that a sync variant should't be ignored, but needing a sync
> > > interface with PI arguments also shouldn't get in the way of adding
> > > support to the aio+dio path.  Simply because it's what people use :/.
> > > 
> > > > I'm not sure how that's directly related to aio, but ok.  If we're going
> > > > to rewrite the aio code, I think Zach's acall would be a good start, at
> > > > least on the API front:
> > > >   http://lwn.net/Articles/316806/
> > > 
> > > Yeah, I'm happy to chat about this stuff if people are interested.  I
> > > think I'd do things differently today than what was done in that aged
> > > acall prototype.
> > > 
> > > - z
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> > > the body of a message to majord...@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> > -- 
> > 
> > "You can get more with a kind word and a gun than you can with
> >  a kind word alone."
> >  - Al Capone
> > 
> > http://www.jlbec.org/
> > jl...@evilplan.org
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 

"You look in her eyes, the music begins to play.
 Hopeless romantics, here we go again."

http://www.jlbec.org/
jl...@evilplan.org
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html