Re: [PATCH V3 01/14] blk-mq: add blk_mq_max_nr_hw_queues()

2023-08-11 Thread Christoph Hellwig
On Thu, Aug 10, 2023 at 08:09:27AM +0800, Ming Lei wrote:
> 1) some archs support 'nr_cpus=1' for kdump kernel, which is fine, since
> num_possible_cpus becomes 1.
> 
> 2) some archs do not support 'nr_cpus=1', and have to rely on
> 'max_cpus=1', so num_possible_cpus isn't changed, and kernel just boots
> with single online cpu. That causes trouble because blk-mq limits single
> queue.

And we need to fix case 2.  We need to drop the is_kdump support, and
if they want to force less cpus they need to make nr_cpus=1 work.


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 3/6] fs: distinguish between user initiated freeze and kernel initiated freeze

2023-06-07 Thread Christoph Hellwig
On Thu, May 25, 2023 at 04:14:30PM +0200, Jan Kara wrote:
> Yes, this is exactly how I'd imagine it. Thanks for writing the patch!
> 
> I'd just note that this would need rebasing on top of Luis' patches 1 and
> 2. Also:

I'd not do that for now.  1 needs a lot more work, and 2 seems rather
questionable.

> Now the only remaining issue with the code is that the two different
> holders can be attempting to freeze the filesystem at once and in that case
> one of them has to wait for the other one instead of returning -EBUSY as
> would happen currently. This can happen because we temporarily drop
> s_umount in freeze_super() due to lock ordering issues. I think we could
> do something like:
> 
>   if (!sb_unfrozen(sb)) {
>   up_write(>s_umount);
>   wait_var_event(>s_writers.frozen,
>  sb_unfrozen(sb) || sb_frozen(sb));
>   down_write(>s_umount);
>   goto retry;
>   }
> 
> and then sprinkle wake_up_var(>s_writers.frozen) at appropriate places
> in freeze_super().

Let's do that separately as a follow on..

> 
> BTW, when reading this code, I've spotted attached cleanup opportunity but
> I'll queue that separately so that is JFYI.
> 
> > +#define FREEZE_HOLDER_USERSPACE(1U << 1)   /* userspace froze fs */
> > +#define FREEZE_HOLDER_KERNEL   (1U << 2)   /* kernel froze fs */
> 
> Why not start from 1U << 0? And bonus points for using BIT() macro :).

BIT() is a nasty thing and actually makes code harder to read. And it
doesn't interact very well with the __bitwise annotation that might
actually be useful here.


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 3/6] fs: distinguish between user initiated freeze and kernel initiated freeze

2023-06-07 Thread Christoph Hellwig
On Mon, May 22, 2023 at 04:42:00PM -0700, Darrick J. Wong wrote:
> How about this as an alternative patch?  Kernel and userspace freeze
> state are stored in s_writers; each type cannot block the other (though
> you still can't have nested kernel or userspace freezes); and the freeze
> is maintained until /both/ freeze types are dropped.
> 
> AFAICT this should work for the two other usecases (quiescing pagefaults
> for fsdax pmem pre-removal; and freezing fses during suspend) besides
> online fsck for xfs.

I think this is fundamentally the right thing.  Can you send this as
a standalone thread in a separate thread to make it sure it gets
expedited?

> -static int thaw_super_locked(struct super_block *sb);
> +static int thaw_super_locked(struct super_block *sb, unsigned short who);

Is the unsigned short really worth it?  Even if it's just two values
I'd also go for a __bitwise type to get the typechecking as that tends
to help a lot goind down the road.

>  /**
> - * freeze_super - lock the filesystem and force it into a consistent state
> + * __freeze_super - lock the filesystem and force it into a consistent state
>   * @sb: the super to lock
> + * @who: FREEZE_HOLDER_USERSPACE if userspace wants to freeze the fs;
> + * FREEZE_HOLDER_KERNEL if the kernel wants to freeze it
>   *
>   * Syncs the super to make sure the filesystem is consistent and calls the 
> fs's
> - * freeze_fs.  Subsequent calls to this without first thawing the fs will 
> return
> + * freeze_fs.  Subsequent calls to this without first thawing the fs may 
> return
>   * -EBUSY.
>   *
> + * The @who argument distinguishes between the kernel and userspace trying to
> + * freeze the filesystem.  Although there cannot be multiple kernel freezes 
> or
> + * multiple userspace freezes in effect at any given time, the kernel and
> + * userspace can both hold a filesystem frozen.  The filesystem remains 
> frozen
> + * until there are no kernel or userspace freezes in effect.
> + *
>   * During this function, sb->s_writers.frozen goes through these values:
>   *
>   * SB_UNFROZEN: File system is normal, all writes progress as usual.
> @@ -1668,12 +1676,61 @@ static void sb_freeze_unlock(struct super_block *sb, 
> int level)
>   *
>   * sb->s_writers.frozen is protected by sb->s_umount.
>   */

There's really no point in having a kerneldoc for a static function.
Either this moves to the actual exported functions, or it should
become a normal non-kerneldoc comment.  But I'm not even sre this split
makes much sense to start with.  I'd just add a the who argument
to freeze_super given that we have only very few callers anyway,
and it is way easier to follow than thse rappers hardoding the argument.

> +static int __freeze_super(struct super_block *sb, unsigned short who)
>  {
> + struct sb_writers *sbw = >s_writers;
>   int ret;
>  
>   atomic_inc(>s_active);
>   down_write(>s_umount);
> +
> + if (sbw->frozen == SB_FREEZE_COMPLETE) {
> + switch (who) {

Nit, but maybe split evetything inside this branch into a
freeze_frozen_super helper?

> +static int thaw_super_locked(struct super_block *sb, unsigned short who)
> +{
> + struct sb_writers *sbw = >s_writers;
>   int error;
>  
> + if (sbw->frozen == SB_FREEZE_COMPLETE) {
> + switch (who) {
> + case FREEZE_HOLDER_KERNEL:
> + if (!(sbw->freeze_holders & FREEZE_HOLDER_KERNEL)) {
> + /* Caller doesn't hold a kernel freeze. */
> + up_write(>s_umount);
> + return -EINVAL;
> + }
> + if (sbw->freeze_holders & FREEZE_HOLDER_USERSPACE) {
> + /*
> +  * We were sharing the freeze with userspace,
> +  * so drop the userspace freeze but exit
> +  * without unfreezing.
> +  */
> + sbw->freeze_holders &= ~who;
> + up_write(>s_umount);
> + return 0;
> + }
> + break;
> + case FREEZE_HOLDER_USERSPACE:
> + if (!(sbw->freeze_holders & FREEZE_HOLDER_USERSPACE)) {
> + /* Caller doesn't hold a userspace freeze. */
> + up_write(>s_umount);
> + return -EINVAL;
> + }
> + if (sbw->freeze_holders & FREEZE_HOLDER_KERNEL) {
> + /*
> +  * We were sharing the freeze with the kernel,
> +  * so drop the kernel freeze but exit without
> +  * unfreezing.
> +  */
> + sbw->freeze_holders &= ~who;
> + up_write(>s_umount);

Re: [PATCH 2/6] fs: add frozen sb state helpers

2023-06-07 Thread Christoph Hellwig
On Sun, May 07, 2023 at 06:17:13PM -0700, Luis Chamberlain wrote:
> Provide helpers so that we can check a superblock frozen state.
> This will make subsequent changes easier to read. This makes
> no functional changes.

I'll look at the further changes, but having frozen/unfrozen helpers
that sound binary while we have 4 shades of gray seems rather confusing
to me.


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 1/6] fs: unify locking semantics for fs freeze / thaw

2023-06-07 Thread Christoph Hellwig
On Sun, May 07, 2023 at 06:17:12PM -0700, Luis Chamberlain wrote:
> Right now freeze_super()  and thaw_super() are called with
> different locking contexts. To expand on this is messy, so
> just unify the requirement to require grabbing an active
> reference and keep the superblock locked.
> 
> Suggested-by: Christoph Hellwig 
> Signed-off-by: Luis Chamberlain 

Maybe I'm just getting old, but where did I suggest this?

That being said, holding an active reference over any operation is a
good thing.  As Jan said it can be done way simpler than this, though.

Also please explain the actual different locking contexts and what
semantics you unify in the commit log please.

> - * freeze_super - lock the filesystem and force it into a consistent state
> + * freeze_super - force a filesystem backed by a block device into a 
> consistent state
>   * @sb: the super to lock

This is not actually true.  Nothing in here has anything to do
with block devices, and it is used on a at least one non-block file
system.

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v5 RESEND 3/3] vmcore: Convert read_from_oldmem() to take an iov_iter

2022-04-13 Thread Christoph Hellwig
On Fri, Apr 08, 2022 at 05:06:36PM +0800, Baoquan He wrote:
> From: "Matthew Wilcox (Oracle)" 
> 
> Remove the read_from_oldmem() wrapper introduced earlier and convert
> all the remaining callers to pass an iov_iter.
> 
> Signed-off-by: Matthew Wilcox (Oracle) 
> Signed-off-by: Baoquan He 

Looks good:

Reviewed-by: Christoph Hellwig 

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v5 RESEND 2/3] vmcore: Convert __read_vmcore to use an iov_iter

2022-04-13 Thread Christoph Hellwig
On Fri, Apr 08, 2022 at 05:06:35PM +0800, Baoquan He wrote:
> From: "Matthew Wilcox (Oracle)" 
> 
> This gets rid of copy_to() and let us use proc_read_iter() instead
> of proc_read().
> 
> Signed-off-by: Matthew Wilcox (Oracle) 
> Signed-off-by: Baoquan He 

Looks good:

Reviewed-by: Christoph Hellwig 

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v5 RESEND 1/3] vmcore: Convert copy_oldmem_page() to take an iov_iter

2022-04-13 Thread Christoph Hellwig
Looks good:

Reviewed-by: Christoph Hellwig 

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v2] mm/vmalloc: fix spinning drain_vmap_work after reading from /proc/vmcore

2022-04-06 Thread Christoph Hellwig
Looks good:

Reviewed-by: Christoph Hellwig 

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] mm/vmalloc: fix spinning drain_vmap_work after reading from /proc/vmcore

2022-04-05 Thread Christoph Hellwig
On Tue, Apr 05, 2022 at 12:40:31PM -0700, Omar Sandoval wrote:
> A simple way to "fix" this would be to make set_iounmap_nonlazy() set
> vmap_lazy_nr to lazy_max_pages() instead of lazy_max_pages() + 1. But, I
> think it'd be better to get rid of this hack of clobbering vmap_lazy_nr.
> Instead, this fix makes __copy_oldmem_page() explicitly drain the vmap
> areas itself.

This fixes the bug and the interface also is better than what we had
before.  But a vmap/iounmap_eager would seem even better.  But hey,
right now it has one caller in always built іn x86 arch code, so maybe
it isn't worth spending more effort on this.

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v4 2/3] dma/pool: create dma atomic pool only if dma zone has managed pages

2021-12-23 Thread Christoph Hellwig
Looks good,

Reviewed-by: Christoph Hellwig 

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v3 5/5] mm/slub: do not create dma-kmalloc if no managed pages in DMA zone

2021-12-23 Thread Christoph Hellwig
On Wed, Dec 22, 2021 at 12:37:03PM +, Hyeonggon Yoo wrote:
> Oh I misunderstood this. Underlying physical address of vmalloc()-allocated 
> memory
> can be mapped using DMA API, and it needs to be setup as scatterlist because
> the allocated memory is not physically continuous. Right?

Yes.

> BTW, looking at the API I think the scsi case can be converted to use
> dma_alloc_pages(). but driver requires 512 bytes of buffer and the API
> supports allocating by at least page size.

Overallocating is not generally a problem, but if the allocations are for
a slow path it might make more sense to stick to dma_map_* and bounce
buffer if needed.

> It's not a big problem as it allocates a single buffer but in other
> cases maybe not. Can't we use dma pool for non-coherent pages?

No.

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v3 5/5] mm/slub: do not create dma-kmalloc if no managed pages in DMA zone

2021-12-21 Thread Christoph Hellwig
On Fri, Dec 17, 2021 at 11:38:27AM +, Hyeonggon Yoo wrote:
> My understanding is any buffer requested from kmalloc (without
> GFP_DMA/DMA32) can be used by device driver because it allocates
> continuous physical memory. It doesn't mean that buffer allocated
> with kmalloc is free of addressing limitation.

Yes.

> 
> the addressing limitation comes from the capability of device, not
> allocation size. if you allocate memory using alloc_pages() or kmalloc(),
> the device has same limitation. and vmalloc can't be used for
> devices because they have no MMU.

vmalloc can be used as well, it just needs to be setup as a scatterlist
and needs a little lover for DMA challenged platforms with the
invalidate_kernel_vmap_range and flush_kernel_vmap_range helpers.

> But we can map memory outside DMA zone into bounce buffer (which resides
> in DMA zone) using DMA API.

Yes, although in a few specific cases the bounce buffer could also come
from somewhere else.


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v3 2/3] vmcore: Convert __read_vmcore to use an iov_iter

2021-12-21 Thread Christoph Hellwig
Looks good,

Reviewed-by: Christoph Hellwig 

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v3 3/3] vmcore: Convert read_from_oldmem() to take an iov_iter

2021-12-21 Thread Christoph Hellwig
On Mon, Dec 13, 2021 at 02:39:27PM +, Matthew Wilcox (Oracle) wrote:
> Remove the read_from_oldmem() wrapper introduced earlier and convert
> all the remaining callers to pass an iov_iter.

Looks good,

Reviewed-by: Christoph Hellwig 

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v3 1/3] vmcore: Convert copy_oldmem_page() to take an iov_iter

2021-12-21 Thread Christoph Hellwig
Looks good,

Reviewed-by: Christoph Hellwig 

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v3 5/5] mm/slub: do not create dma-kmalloc if no managed pages in DMA zone

2021-12-14 Thread Christoph Hellwig
On Tue, Dec 14, 2021 at 11:07:34AM -0600, john.p.donne...@oracle.com wrote:
> Is CONFIG_ZONE_DMA even needed anymore in x86_64  ?

Yes.  There are still plenty of addressing challenged devices, mostly
ISA-like but also a few PCI/PCIe ones.

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v3 5/5] mm/slub: do not create dma-kmalloc if no managed pages in DMA zone

2021-12-14 Thread Christoph Hellwig
On Wed, Dec 15, 2021 at 07:03:35AM +, Hyeonggon Yoo wrote:
> I'm not sure that allocating from ZONE_DMA32 instead of ZONE_DMA
> for kdump kernel is nice way to solve this problem.

What is the problem with zones in kdump kernels?

> Devices that requires ZONE_DMA memory is rare but we still support them.

Indeed.

> > 1) Do not call warn_alloc in page allocator if will always fail
> > to allocate ZONE_DMA pages.
> > 
> > 
> > 2) let's check all callers of kmalloc with GFP_DMA
> > if they really need GFP_DMA flag and replace those by DMA API or
> > just remove GFP_DMA from kmalloc()
> > 
> > 3) Drop support for allocating DMA memory from slab allocator
> > (as Christoph Hellwig said) and convert them to use DMA32
> 
>   (as Christoph Hellwig said) and convert them to use *DMA API*
> 
> > and see what happens

This is the right thing to do, but it will take a while.  In fact
I dont think we really need the warning in step 1, a simple grep
already allows to go over them.  I just looked at the uses of GFP_DMA
in drivers/scsi for example, and all but one look bogus.

> > > > Yeah, I have the same guess too for get_capabilities(), not sure about 
> > > > other
> > > > callers. Or, as ChristophL and ChristophH said(Sorry, not sure if this 
> > > > is
> > > > the right way to call people when the first name is the same. Correct 
> > > > me if
> > > > it's wrong), any buffer requested from kmalloc can be used by device 
> > > > driver.
> > > > Means device enforces getting memory inside addressing limit for those
> > > > DMA transferring buffer which is usually large, Megabytes level with
> > > > vmalloc() or alloc_pages(), but doesn't care about this kind of small
> > > > piece buffer memory allocated with kmalloc()? Just a guess, please tell
> > > > a counter example if anyone happens to know, it could be easy.

The way this works is that the dma_map* calls will bounce buffer memory
that does to fall into the addressing limitations.  This is a performance
overhead, but allows drivers to address all memory in a system.  If the
driver controls memory allocation it should use one of the dma_alloc_*
APIs that allocate addressable memory from the start.  The allocator
will dip into ZONE_DMA and ZONE_DMA32 when needed.

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v3 5/5] mm/slub: do not create dma-kmalloc if no managed pages in DMA zone

2021-12-14 Thread Christoph Hellwig
On Mon, Dec 13, 2021 at 08:27:12PM +0800, Baoquan He wrote:
> Dma-kmalloc will be created as long as CONFIG_ZONE_DMA is enabled.
> However, it will fail if DMA zone has no managed pages. The failure
> can be seen in kdump kernel of x86_64 as below:

Please just switch the sr allocation to use GFP_KERNEL without GFP_DMA.
The block layer will do the proper bounce buffering underneath for the
very unlikely case that we're actually using the single HBA driver that
has ISA DMA addressing limitations.

Same for the ch drive, btw.

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 3/3] vmcore: Convert read_from_oldmem() to take an iov_iter

2021-12-13 Thread Christoph Hellwig
>  
>  ssize_t elfcorehdr_read(char *buf, size_t count, u64 *ppos)
>  {
> - return read_from_oldmem(buf, count, ppos, 0,
> + struct kvec kvec = { .iov_base = buf, .iov_len = count };
> + struct iov_iter iter;
> +
> + iov_iter_kvec(, READ, , 1, count);
> +
> + return read_from_oldmem(, count, ppos,
>   cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT));
>  }

elfcorehdr_read should probably also take an iov_iter while we're at it.

I also don't quite understand why we even need the arch overrides for it,
but that would require some digging into the history of this interface.

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 2/3] vmcore: Convert __read_vmcore to use an iov_iter

2021-12-13 Thread Christoph Hellwig
On Mon, Dec 13, 2021 at 12:06:35AM +, Matthew Wilcox (Oracle) wrote:
> + /* trim iter to not go beyond EOF */
> + if (iter->count > vmcore_size - *fpos)
> + iter->count = vmcore_size - *fpos;

Nit: iov_iter_truncate()

Otherwise this looks good from a cursory view.

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 1/3] vmcore: Convert copy_oldmem_page() to take an iov_iter

2021-12-12 Thread Christoph Hellwig
On Mon, Dec 13, 2021 at 12:06:34AM +, Matthew Wilcox (Oracle) wrote:
> Instead of passing in a 'buf' and 'userbuf' argument, pass in an iov_iter.
> s390 needs more work to pass the iov_iter down further, or refactor,
> but I'd be more comfortable if someone who can test on s390 did that work.
> 
> It's more convenient to convert the whole of read_from_oldmem() to
> take an iov_iter at the same time, so rename it to read_from_oldmem_iter()
> and add a temporary read_from_oldmem() wrapper that creates an iov_iter.

This looks pretty reasonable.  s390 could use some love from people that
know the code, and yes, the kerneldoc comments should go away.

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH RESEND v2 0/5] Avoid requesting page from DMA zone when no managed pages

2021-12-12 Thread Christoph Hellwig
On Mon, Dec 13, 2021 at 03:39:25PM +0800, Baoquan He wrote:
> > > As said at above, ia64 and riscv don't have ZONE_DMA at all, they just
> > > cover low 4G with ZONE_DMA32 alone.
> > 
> > If you do not have devices that are crap and cannot address the full
> > memory then you dont need these special zones.
> 
> I am not a DMA expert, with my understanding, on x86_64 and arm64, we
> have PCIe devices which dma mask is 32bit

Yes, way to many, and they keep getting newly introduce as well.  Also
weirdo masks like 40, 44 or 48 bits.

> , means they can only address
> ZONE_DMA32.

Yes and no.  Offset between cpu physical and device address make this
complicated, even ignoring iommus.

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH RESEND v2 0/5] Avoid requesting page from DMA zone when no managed pages

2021-12-12 Thread Christoph Hellwig
On Tue, Dec 07, 2021 at 09:05:26AM +0100, Christoph Lameter wrote:
> On Tue, 7 Dec 2021, Baoquan He wrote:
> 
> > into ZONE_DMA32 by default. The zone DMA covering low 16M is used to
> > take care of antique ISA devices. In fact, on 64bit system, it rarely
> > need ZONE_DMA (which is low 16M) to support almost extinct ISA devices.
> > However, some components treat DMA as a generic concept, e.g
> > kmalloc-dma, slab allocator initializes it for later any DMA related
> > buffer allocation, but not limited to ISA DMA.
> 
> The idea of the slab allocator DMA support is to have memory available
> for devices that can only support a limited range of physical addresses.
> These are only to be enabled for platforms that have such requirements.
> 
> The slab allocators guarantee that all kmalloc allocations are DMA able
> indepent of specifying ZONE_DMA/ZONE_DMA32

Yes.  And we never supported slab for ZONE_DMA32 and should work on
getting rid of it for ZONE_DMA as well.  The only thing that guarantees
device addressability is the DMA API.  The DMA API needs ZONE_DMA/DMA32
to back its page allocations, but supporting this in slab is a bad idea
only explained by historic reasons from before when we had a DMA API.

> > On arm64, even though both CONFIG_ZONE_DMA and CONFIG_ZONE_DMA32
> > are enabled, it makes ZONE_DMA covers the low 4G area, and ZONE_DMA32
> > empty. Unless on specific platforms (e.g. 30-bit on Raspberry Pi 4),
> > then zone DMA covers the 1st 1G area, zone DMA32 covers the rest of
> > the 32-bit addressable memory.
> 
> ZONE_NORMAL should cover all memory. ARM does not need ZONE_DMA32.

arm32 not, arm64 does.  And the Pi 4 is an arm64 device.

> > I am wondering if we can also change the size of DMA and DMA32 ZONE as
> > dynamically adjusted, just as arm64 is doing? On x86_64, we can make
> > zone DMA covers the 32-bit addressable memory, and empty zone DMA32 by
> > default. Once ISA_DMA_API is enabled, we go back to make zone DMA covers
> > low 16M area, zone DMA32 covers the rest of 32-bit addressable memory.
> > (I am not familiar with ISA_DMA_API, will it require 24-bit addressable
> > memory when enabled?)
> 
> The size of ZONE_DMA is traditionally depending on the platform. On some
> it is 16MB, on some 1G and on some 4GB. ZONE32 is always 4GB and should
> only be used if ZONE_DMA has already been used.

ZONE32 should be (and generally is) used whenever there is zone covering
the 32-bit CPU physical address limit.

> 
> ZONE_DMA is dynamic in the sense of being different on different
> platforms.

Agreed.

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH RESEND v2 2/5] dma-pool: allow user to disable atomic pool

2021-12-12 Thread Christoph Hellwig
On Tue, Dec 07, 2021 at 11:07:47AM +0800, Baoquan He wrote:
> In the current code, three atomic memory pools are always created,
> atomic_pool_kernel|dma|dma32, even though 'coherent_pool=0' is
> specified in kernel command line. In fact, atomic pool is only
> necessary when CONFIG_DMA_DIRECT_REMAP=y or mem_encrypt_active=y
> which are needed on few ARCHes.

And only these select the atomic pool, so it won't get created otherwise.
What problem are you trying to solve?

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [RFC PATCH 01/14] fs/proc/vmcore: Update read_from_oldmem() for user pointer

2021-12-07 Thread Christoph Hellwig
On Mon, Dec 06, 2021 at 03:07:15PM +, Matthew Wilcox wrote:
> > > What do you think to adding a generic copy_pfn_to_iter()?  Not sure
> > > which APIs to use to implement it ... some architectures have weird
> > > requirements about which APIs can be used for what kinds of PFNs.
> > 
> > Hmm.  I though kmap_local_pfn(_prot) is all we need?
> 
> In the !HIGHMEM case, that calls pfn_to_page(), and I think the
> point of this path is that we don't have a struct page for this pfn.

Indeed.  But to me this suggest that the !highmem stub is broken and
we should probably fix it rather than adding yet another interface.

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [RFC PATCH 01/14] fs/proc/vmcore: Update read_from_oldmem() for user pointer

2021-12-06 Thread Christoph Hellwig
On Mon, Dec 06, 2021 at 02:17:24PM +, Matthew Wilcox wrote:
> On Mon, Dec 06, 2021 at 03:04:51PM +0100, Christoph Hellwig wrote:
> > This looks like a huge mess.  What speak against using an iov_iter
> > here?
> 
> I coincidentally made a start on this last night.  Happy to stop.

Don't stop!

> What do you think to adding a generic copy_pfn_to_iter()?  Not sure
> which APIs to use to implement it ... some architectures have weird
> requirements about which APIs can be used for what kinds of PFNs.

Hmm.  I though kmap_local_pfn(_prot) is all we need?

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [RFC PATCH 01/14] fs/proc/vmcore: Update read_from_oldmem() for user pointer

2021-12-06 Thread Christoph Hellwig
On Fri, Dec 03, 2021 at 04:12:18PM +0530, Amit Daniel Kachhap wrote:
> + return read_from_oldmem_to_kernel(buf, count, ppos,
> +   
> cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT));

Overly long line.

> +ssize_t read_from_oldmem(char __user *ubuf, char *kbuf, size_t count,
> +  u64 *ppos, bool encrypted)
>  {
>   unsigned long pfn, offset;
>   size_t nr_bytes;
> @@ -156,19 +163,27 @@ ssize_t read_from_oldmem(char *buf, size_t count,
>   /* If pfn is not ram, return zeros for sparse dump files */
>   if (!pfn_is_ram(pfn)) {
>   tmp = 0;
> - if (!userbuf)
> - memset(buf, 0, nr_bytes);
> - else if (clear_user(buf, nr_bytes))
> + if (kbuf)
> + memset(kbuf, 0, nr_bytes);
> + else if (clear_user(ubuf, nr_bytes))
>   tmp = -EFAULT;

This looks like a huge mess.  What speak against using an iov_iter
here?

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v3 4/8] powerpc/pseries/svm: Add a powerpc version of cc_platform_has()

2021-09-16 Thread Christoph Hellwig
On Wed, Sep 15, 2021 at 07:18:34PM +0200, Christophe Leroy wrote:
> Could you please provide more explicit explanation why inlining such an
> helper is considered as bad practice and messy ?

Because now we get architectures to all subly differ.  Look at the mess
for ioremap and the ioremap* variant.

The only good reason to allow for inlines if if they are used in a hot
path.  Which cc_platform_has is not, especially not on powerpc.

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v2 03/12] x86/sev: Add an x86 version of prot_guest_has()

2021-08-24 Thread Christoph Hellwig
On Thu, Aug 19, 2021 at 01:33:09PM -0500, Tom Lendacky wrote:
> I did it as inline originally because the presence of the function will be
> decided based on the ARCH_HAS_PROTECTED_GUEST config. For now, that is
> only selected by the AMD memory encryption support, so if I went out of
> line I could put in mem_encrypt.c. But with TDX wanting to also use it, it
> would have to be in an always built file with some #ifdefs or in its own
> file that is conditionally built based on the ARCH_HAS_PROTECTED_GUEST
> setting (they've already tried building with ARCH_HAS_PROTECTED_GUEST=y
> and AMD_MEM_ENCRYPT not set).
> 
> To take it out of line, I'm leaning towards the latter, creating a new
> file that is built based on the ARCH_HAS_PROTECTED_GUEST setting.

Yes.  In general everytime architectures have to provide the prototype
and not just the implementation of something we end up with a giant mess
sooner or later.  In a few cases that is still warranted due to
performance concerns, but i don't think that is the case here.

> 
> > 
> >> +/* 0x800 - 0x8ff reserved for AMD */
> >> +#define PATTR_SME 0x800
> >> +#define PATTR_SEV 0x801
> >> +#define PATTR_SEV_ES  0x802
> > 
> > Why do we need reservations for a purely in-kernel namespace?
> > 
> > And why are you overoading a brand new generic API with weird details
> > of a specific implementation like this?
> 
> There was some talk about this on the mailing list where TDX and SEV may
> need to be differentiated, so we wanted to reserve a range of values per
> technology. I guess I can remove them until they are actually needed.

In that case add a flag for the differing behavior.  And only add them
when actually needed.  And either way there is absolutely no need to
reserve ranges.


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v2 04/12] powerpc/pseries/svm: Add a powerpc version of prot_guest_has()

2021-08-19 Thread Christoph Hellwig
On Fri, Aug 13, 2021 at 11:59:23AM -0500, Tom Lendacky wrote:
> +static inline bool prot_guest_has(unsigned int attr)

No reall need to have this inline.  In fact I'd suggest we havea the
prototype in a common header so that everyone must implement it out
of line.

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v2 03/12] x86/sev: Add an x86 version of prot_guest_has()

2021-08-19 Thread Christoph Hellwig
On Fri, Aug 13, 2021 at 11:59:22AM -0500, Tom Lendacky wrote:
> While the name suggests this is intended mainly for guests, it will
> also be used for host memory encryption checks in place of sme_active().

Which suggest that the name is not good to start with.  Maybe protected
hardware, system or platform might be a better choice?

> +static inline bool prot_guest_has(unsigned int attr)
> +{
> +#ifdef CONFIG_AMD_MEM_ENCRYPT
> + if (sme_me_mask)
> + return amd_prot_guest_has(attr);
> +#endif
> +
> + return false;
> +}

Shouldn't this be entirely out of line?

> +/* 0x800 - 0x8ff reserved for AMD */
> +#define PATTR_SME0x800
> +#define PATTR_SEV0x801
> +#define PATTR_SEV_ES 0x802

Why do we need reservations for a purely in-kernel namespace?

And why are you overoading a brand new generic API with weird details
of a specific implementation like this?

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v2 02/12] mm: Introduce a function to check for virtualization protection features

2021-08-19 Thread Christoph Hellwig
On Fri, Aug 13, 2021 at 11:59:21AM -0500, Tom Lendacky wrote:
> +#define PATTR_MEM_ENCRYPT0   /* Encrypted memory */
> +#define PATTR_HOST_MEM_ENCRYPT   1   /* Host encrypted 
> memory */
> +#define PATTR_GUEST_MEM_ENCRYPT  2   /* Guest encrypted 
> memory */
> +#define PATTR_GUEST_PROT_STATE   3   /* Guest encrypted 
> state */

Please write an actual detailed explanaton of what these mean, that
is what implications it has on the kernel.


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH RFC 0/2] dma-pool: allow user to disable atomic pool

2021-08-10 Thread Christoph Hellwig
On Tue, Aug 10, 2021 at 03:52:25PM -0500, Tom Lendacky via iommu wrote:
> I think the atomic pool is used by the NVMe driver. My understanding is
> that driver will do a dma_alloc_coherent() from interrupt context, so it
> needs to use GFP_ATOMIC. The pool was created because dma_alloc_coherent()
> would perform a set_memory_decrypted() call, which can sleep. The pool
> eliminates that issue (David can correct me if I got that wrong).

Not just the NVMe driver.  We have plenty of drivers doing that, just
do a quick grep for dma_alloc_* dma_poll_alloc, dma_pool_zalloc with
GFP_ATOMIC (and that won't even find multi-line strings).

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 02/11] x86/sev: Add an x86 version of prot_guest_has()

2021-07-28 Thread Christoph Hellwig
On Tue, Jul 27, 2021 at 05:26:05PM -0500, Tom Lendacky via iommu wrote:
> Introduce an x86 version of the prot_guest_has() function. This will be
> used in the more generic x86 code to replace vendor specific calls like
> sev_active(), etc.
> 
> While the name suggests this is intended mainly for guests, it will
> also be used for host memory encryption checks in place of sme_active().
> 
> The amd_prot_guest_has() function does not use EXPORT_SYMBOL_GPL for the
> same reasons previously stated when changing sme_active(), sev_active and

None of that applies here as none of the callers get pulled into
random macros.  The only case of that is sme_me_mask through
sme_mask, but that's not something this series replaces as far as I can
tell.

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 01/11] mm: Introduce a function to check for virtualization protection features

2021-07-28 Thread Christoph Hellwig
On Tue, Jul 27, 2021 at 05:26:04PM -0500, Tom Lendacky via iommu wrote:
> In prep for other protected virtualization technologies, introduce a
> generic helper function, prot_guest_has(), that can be used to check
> for specific protection attributes, like memory encryption. This is
> intended to eliminate having to add multiple technology-specific checks
> to the code (e.g. if (sev_active() || tdx_active())).

So common checks obviously make sense, but I really hate the stupid
multiplexer.  Having one well-documented helper per feature is much
easier to follow.

> +#define PATTR_MEM_ENCRYPT0   /* Encrypted memory */
> +#define PATTR_HOST_MEM_ENCRYPT   1   /* Host encrypted 
> memory */
> +#define PATTR_GUEST_MEM_ENCRYPT  2   /* Guest encrypted 
> memory */
> +#define PATTR_GUEST_PROT_STATE   3   /* Guest encrypted 
> state */

The kerneldoc comments on these individual helpers will give you plenty
of space to properly document what they indicate and what a (potential)
caller should do based on them.  Something the above comments completely
fail to.

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v4 1/4] kexec: avoid compat_alloc_user_space

2021-07-20 Thread Christoph Hellwig
This can be simplified a little more by killing off
copy_user_segment_list entirely, using memdup_user and dropping the
not really required _locked wrapper.  The locking move might make
most sense as a separate prep patch.

diff --git a/kernel/kexec.c b/kernel/kexec.c
index c82c6c06f051..3140fe7af801 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -19,26 +19,9 @@
 
 #include "kexec_internal.h"
 
-static int copy_user_segment_list(struct kimage *image,
- unsigned long nr_segments,
- struct kexec_segment __user *segments)
-{
-   int ret;
-   size_t segment_bytes;
-
-   /* Read in the segments */
-   image->nr_segments = nr_segments;
-   segment_bytes = nr_segments * sizeof(*segments);
-   ret = copy_from_user(image->segment, segments, segment_bytes);
-   if (ret)
-   ret = -EFAULT;
-
-   return ret;
-}
-
 static int kimage_alloc_init(struct kimage **rimage, unsigned long entry,
 unsigned long nr_segments,
-struct kexec_segment __user *segments,
+struct kexec_segment *segments,
 unsigned long flags)
 {
int ret;
@@ -58,10 +41,8 @@ static int kimage_alloc_init(struct kimage **rimage, 
unsigned long entry,
return -ENOMEM;
 
image->start = entry;
-
-   ret = copy_user_segment_list(image, nr_segments, segments);
-   if (ret)
-   goto out_free_image;
+   image->nr_segments = nr_segments;
+   memcpy(image->segment, segments, nr_segments * sizeof(*segments));
 
if (kexec_on_panic) {
/* Enable special crash kernel control page alloc policy. */
@@ -104,11 +85,22 @@ static int kimage_alloc_init(struct kimage **rimage, 
unsigned long entry,
 }
 
 static int do_kexec_load(unsigned long entry, unsigned long nr_segments,
-   struct kexec_segment __user *segments, unsigned long flags)
+   struct kexec_segment *segments, unsigned long flags)
 {
struct kimage **dest_image, *image;
unsigned long i;
-   int ret;
+   int ret = 0;
+
+   /*
+* Because we write directly to the reserved memory region when loading
+* crash kernels we need a mutex here to prevent multiple crash kernels
+* from attempting to load simultaneously, and to prevent a crash kernel
+* from loading over the top of a in use crash kernel.
+*
+* KISS: always take the mutex.
+*/
+   if (!mutex_trylock(_mutex))
+   return -EBUSY;
 
if (flags & KEXEC_ON_CRASH) {
dest_image = _crash_image;
@@ -121,7 +113,7 @@ static int do_kexec_load(unsigned long entry, unsigned long 
nr_segments,
if (nr_segments == 0) {
/* Uninstall image */
kimage_free(xchg(dest_image, NULL));
-   return 0;
+   goto out_unlock;
}
if (flags & KEXEC_ON_CRASH) {
/*
@@ -134,7 +126,7 @@ static int do_kexec_load(unsigned long entry, unsigned long 
nr_segments,
 
ret = kimage_alloc_init(, entry, nr_segments, segments, flags);
if (ret)
-   return ret;
+   goto out_unlock;
 
if (flags & KEXEC_PRESERVE_CONTEXT)
image->preserve_context = 1;
@@ -171,6 +163,8 @@ static int do_kexec_load(unsigned long entry, unsigned long 
nr_segments,
arch_kexec_protect_crashkres();
 
kimage_free(image);
+out_unlock:
+   mutex_unlock(_mutex);
return ret;
 }
 
@@ -236,7 +230,8 @@ static inline int kexec_load_check(unsigned long 
nr_segments,
 SYSCALL_DEFINE4(kexec_load, unsigned long, entry, unsigned long, nr_segments,
struct kexec_segment __user *, segments, unsigned long, flags)
 {
-   int result;
+   struct kexec_segment *ksegments;
+   unsigned long result;
 
result = kexec_load_check(nr_segments, flags);
if (result)
@@ -247,21 +242,11 @@ SYSCALL_DEFINE4(kexec_load, unsigned long, entry, 
unsigned long, nr_segments,
((flags & KEXEC_ARCH_MASK) != KEXEC_ARCH_DEFAULT))
return -EINVAL;
 
-   /* Because we write directly to the reserved memory
-* region when loading crash kernels we need a mutex here to
-* prevent multiple crash  kernels from attempting to load
-* simultaneously, and to prevent a crash kernel from loading
-* over the top of a in use crash kernel.
-*
-* KISS: always take the mutex.
-*/
-   if (!mutex_trylock(_mutex))
-   return -EBUSY;
-
-   result = do_kexec_load(entry, nr_segments, segments, flags);
-
-   mutex_unlock(_mutex);
-
+   ksegments = memdup_user(segments, nr_segments * sizeof(ksegments[0]));
+   if (IS_ERR(ksegments))
+   return PTR_ERR(ksegments);
+   result = 

Re: [PATCH RFC 0/2] dma-pool: allow user to disable atomic pool

2021-06-24 Thread Christoph Hellwig
On Thu, Jun 24, 2021 at 11:47:31AM +0100, Robin Murphy wrote:
> Hmm, I think the Kconfig reshuffle has actually left a slight wrinkle here.
> For DMA_DIRECT_REMAP=y we can assume an atomic pool is always needed, since
> that was the original behaviour anyway. However the implications of
> AMD_MEM_ENCRYPT=y are different - even if support is enabled, it still
> should only be relevant if mem_encrypt_active(), so it probably does make
> sense to have an additional runtime gate on that.

Yeah, a check for that would probably be useful.

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH RFC 0/2] dma-pool: allow user to disable atomic pool

2021-06-24 Thread Christoph Hellwig
So reduce the amount allocated.  But the pool is needed for proper
operation on systems with memory encryption.  And please add the right
maintainer or at least mailing list for the code you're touching next
time.

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v3 4/4] compat: remove some compat entry points

2021-05-18 Thread Christoph Hellwig
Looks good,

Reviewed-by: Christoph Hellwig 

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v3 3/4] mm: simplify compat numa syscalls

2021-05-18 Thread Christoph Hellwig
Except for the various annoying overly long lines this looks fine to me:

Reviewed-by: Christoph Hellwig 

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v3 2/4] mm: simplify compat_sys_move_pages

2021-05-18 Thread Christoph Hellwig
Looks good,

Reviewed-by: Christoph Hellwig 

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v3 1/4] kexec: simplify compat_sys_kexec_load

2021-05-18 Thread Christoph Hellwig
On Mon, May 17, 2021 at 10:57:24PM -0500, Eric W. Biederman wrote:
> We open ourselves up to bugs whenever we lie to the type system.
> 
> Skimming through the code it looks like it should be possible
> to not need the in_compat_syscall and the casts to the wrong
> type by changing the order of the code a little bit.

What kind of bug do you expect?  We must only copy from user addresses
once anyway.  I've never seen bugs due the use of in_compat_syscall,
but plenty due to cruft code trying to avoid it.

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v3 1/4] kexec: simplify compat_sys_kexec_load

2021-05-18 Thread Christoph Hellwig
> + if (in_compat_syscall())
> + return copy_user_compat_segment_list(image, nr_segments, 
> segments);

Annoying overly lone line here.

Otherwise:

Reviewed-by: Christoph Hellwig 


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [RFC v2 6/6] fs: add automatic kernel fs freeze / thaw and remove kthread freezing

2021-04-20 Thread Christoph Hellwig
> This also removes all the superflous freezer calls on all filesystems
> as they are no longer needed as the VFS now performs filesystem
> freezing/thaw if the filesystem has support for it. The filesystem
> therefore is in charge of properly dealing with quiescing of the
> filesystem through its callbacks.

Can you split that out from the main logic change?  Maybe even into one
patch per file system?

> +#ifdef CONFIG_PM_SLEEP
> +static bool super_should_freeze(struct super_block *sb)
> +{
> + if (!sb->s_root)
> + return false;
> + if (!(sb->s_flags & MS_BORN))
> + return false;

This is already done in the iterate_supers_excl and
iterate_supers_reverse_excl helpers that this helper is always called
through.

> + /*
> +  * We don't freeze virtual filesystems, we skip those filesystems with
> +  * no backing device.
> +  */
> + if (sb->s_bdi == _backing_dev_info)
> + return false;

Why?

> + /* No need to freeze read-only filesystems */
> + if (sb_rdonly(sb))
> + return false;

freeze_super/thaw_super already takes care of read-only file systems,
and IMHO in a better way.

> + int error = 0;
> +
> + spin_lock(_lock);
> + if (!super_should_freeze(sb))
> + goto out;
> +
> + pr_info("%s (%s): freezing\n", sb->s_type->name, sb->s_id);
> +
> + spin_unlock(_lock);

I don't see how super_should_freeze needs sb_lock.  But if it does
the lock should be taken in the function.

> + atomic_inc(>s_active);

Doesn't this need a atomic_inc_not_zero if we're racing with a delayed
unmount?

> + error = freeze_locked_super(sb, false);
> + if (error)
> + atomic_dec(>s_active);

And this really needs something like deactivate_locked_super.

> + spin_lock(_lock);
> + if (error && error != -EBUSY)
> + pr_notice("%s (%s): Unable to freeze, error=%d",
> +   sb->s_type->name, sb->s_id, error);
> +
> +out:
> + spin_unlock(_lock);

Huh, what is the point of sb_lock here?

> +int fs_suspend_freeze(void)
> +{
> + return iterate_supers_reverse_excl(fs_suspend_freeze_sb, NULL);
> +}

I'd just fold this helper into its only caller.

> + error = __thaw_super_locked(sb, false);
> + if (!error)
> + atomic_dec(>s_active);
> +
> + spin_lock(_lock);
> + if (error && error != -EBUSY)
> + pr_notice("%s (%s): Unable to unfreeze, error=%d",
> +   sb->s_type->name, sb->s_id, error);
> +
> +out:
> + spin_unlock(_lock);
> + return error;
> +}
> +
> +int fs_resume_unfreeze(void)
> +{
> + return iterate_supers_excl(fs_suspend_thaw_sb, NULL);
> +}

Same comments as on the freeze side.

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [RFC v2 4/6] fs: distinguish between user initiated freeze and kernel initiated freeze

2021-04-20 Thread Christoph Hellwig
Wouldn't it be simpler to just add a new flag to signal a kernel
initiated freeze, or even better the exact reason (suspend) instead of
overloading the state machine?

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [RFC v2 1/6] fs: provide unlocked helper for freeze_super()

2021-04-20 Thread Christoph Hellwig
On Sat, Apr 17, 2021 at 12:10:21AM +, Luis Chamberlain wrote:
> freeze_super() holds a write lock, however we wish to also enable
> callers which already hold the write lock. To do this provide a helper
> and make freeze_super() use it. This way, all that freeze_super() does
> now is lock handling and active count management.

Can we take a step back and think about this a bit more?

freeze_super() has three callers:

 1) freeze_bdev
 2) ioctl_fsfreeze
 3) freeze_store (in gfs2)

The first gets its reference from get_active_super, and is the only
caller of get_active_super.  So IMHO we should just not drop the lock
in get_active_super and directly call the unlocked version.

The other two really should just call grab_super to get an active
reference and s_umount.

In other words: I don't think we need both variants, just move the
locking and s_active acquisition out of free_super.  Same for the
thaw side.

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v2 00/12] Add build ID to stacktraces

2021-03-24 Thread Christoph Hellwig
On Tue, Mar 23, 2021 at 07:04:31PM -0700, Stephen Boyd wrote:
>  x5 :  x4 : 0001
>  x3 : 0008 x2 : ff93fef25a70
>  x1 : ff93fef15788 x0 : ffe3622352e0
>  Call trace:
>   lkdtm_WARNING+0x28/0x30 [lkdtm ed5019fdf5e53be37cb1ba7899292d7e143b259e]
>   direct_entry+0x16c/0x1b4 [lkdtm ed5019fdf5e53be37cb1ba7899292d7e143b259e]

Yikes.  No, please do not make the backtraces a complete mess for
something that serves absolutely no need.

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v2 4/4] compat: remove some compat entry points

2020-12-08 Thread Christoph Hellwig
On Tue, Nov 03, 2020 at 10:45:54AM +0100, Arnd Bergmann wrote:
> I had it there originally, I guess I should have left it there ;-)
> 
> When I changed it, I was considering to do the same for additional
> syscalls that have very small differences now (timer_create,
> rt_sigqueueinfo, rt_sigpending, recvmsg, sendmsg) and use
> in_compat_syscall() there, but then I decided against that.
> 
> In the end, I did like the split, as I found the smaller three
> patches that contain the real change easier to review, and
> it turns the larger patch at the end into a more obvious
> transformation.

Oh well, let's just keep the split as-is then.

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v2 2/4] mm: simplify compat_sys_move_pages

2020-11-03 Thread Christoph Hellwig
>  #ifdef CONFIG_COMPAT
>  COMPAT_SYSCALL_DEFINE6(move_pages, pid_t, pid, compat_ulong_t, nr_pages,
> +compat_uptr_t __user *, pages,
>  const int __user *, nodes,
>  int __user *, status,
>  int, flags)
>  {
> + return kernel_move_pages(pid, nr_pages,
> +  (const void __user *__user *)pages,
> +  nodes, status, flags);
>  }

Same as for kexec - all the compat syscall tables can just point to
the native version now.

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v2 4/4] compat: remove some compat entry points

2020-11-03 Thread Christoph Hellwig
On Mon, Nov 02, 2020 at 01:31:51PM +0100, Arnd Bergmann wrote:
> From: Arnd Bergmann 
> 
> These are all handled correctly when calling the native
> system call entry point, so remove the special cases.

Ok, this is where you do it.  I think this belongs into the main
patches.

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v2 1/4] kexec: simplify compat_sys_kexec_load

2020-11-03 Thread Christoph Hellwig
> + for (i=0; i < nr_segments; i++) {

Missing spaces around the "=".

> +SYSCALL_DEFINE4(kexec_load, unsigned long, entry, unsigned long, nr_segments,
> + struct kexec_segment __user *, segments, unsigned long, flags)
> +{
> + return kernel_kexec_load(entry, nr_segments, segments, flags);
> +}
> +
>  #ifdef CONFIG_COMPAT
>  COMPAT_SYSCALL_DEFINE4(kexec_load, compat_ulong_t, entry,
>  compat_ulong_t, nr_segments,
>  struct compat_kexec_segment __user *, segments,
>  compat_ulong_t, flags)
>  {
> + return kernel_kexec_load(entry, nr_segments,
> +  (struct kexec_segment __user *)segments,
> +  flags);
>  }

I don't think we need sys_compat_kexec_load at all now, all the syscall
tables can simply switch to sys_kexec_load for the compat case as well
now.

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH RFC PKS/PMEM 24/58] fs/freevxfs: Utilize new kmap_thread()

2020-10-15 Thread Christoph Hellwig
> - kaddr = kmap(pp);
> + kaddr = kmap_thread(pp);
>   memcpy(kaddr, vip->vii_immed.vi_immed + offset, PAGE_SIZE);
> - kunmap(pp);
> + kunmap_thread(pp);

You only Cced me on this particular patch, which means I have absolutely
no idea what kmap_thread and kunmap_thread actually do, and thus can't
provide an informed review.

That being said I think your life would be a lot easier if you add
helpers for the above code sequence and its counterpart that copies
to a potential hughmem page first, as that hides the implementation
details from most users.

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 4/4] mm: remove compat numa syscalls

2020-09-18 Thread Christoph Hellwig
On Fri, Sep 18, 2020 at 03:24:39PM +0200, Arnd Bergmann wrote:
> The compat implementations for mbind, get_mempolicy, set_mempolicy
> and migrate_pages are just there to handle the subtly different
> layout of bitmaps on 32-bit hosts.
> 
> The compat implementation however lacks some of the checks that
> are present in the native one, in particular for checking that
> the extra bits are all zero when user space has a larger mask
> size than the kernel. Worse, those extra bits do not get cleared
> when copying in or out of the kernel, which can lead to incorrect
> data as well.
> 
> Unify the implementation to handle the compat bitmap layout directly
> in the get_nodes() and copy_nodes_to_user() helpers.  Splitting out
> the get_bitmap() helper from get_nodes() also helps readability of the
> native case.
> 
> On x86, two additional problems are addressed by this: compat tasks can
> pass a bitmap at the end of a mapping, causing a fault when reading
> across the page boundary for a 64-bit word. x32 tasks might also run
> into problems with get_mempolicy corrupting data when an odd number of
> 32-bit words gets passed.
> 
> On parisc the migrate_pages() system call apparently had the wrong
> calling convention, as big-endian architectures expect the words
> inside of a bitmap to be swapped. This is not a problem though
> since parisc has no NUMA support.
> 
> Signed-off-by: Arnd Bergmann 
> ---
>  arch/arm64/include/asm/unistd32.h |   8 +-
>  arch/mips/kernel/syscalls/syscall_n32.tbl |   8 +-
>  arch/mips/kernel/syscalls/syscall_o32.tbl |   8 +-
>  arch/parisc/kernel/syscalls/syscall.tbl   |   6 +-
>  arch/powerpc/kernel/syscalls/syscall.tbl  |   8 +-
>  arch/s390/kernel/syscalls/syscall.tbl |   8 +-
>  arch/sparc/kernel/syscalls/syscall.tbl|   8 +-
>  arch/x86/entry/syscalls/syscall_32.tbl|   2 +-
>  include/linux/compat.h|  15 --
>  include/uapi/asm-generic/unistd.h |   8 +-
>  kernel/kexec.c|   6 +-
>  kernel/sys_ni.c   |   4 -
>  mm/mempolicy.c| 193 +-
>  13 files changed, 79 insertions(+), 203 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/unistd32.h 
> b/arch/arm64/include/asm/unistd32.h
> index af793775ba98..31479f7120a0 100644
> --- a/arch/arm64/include/asm/unistd32.h
> +++ b/arch/arm64/include/asm/unistd32.h
> @@ -649,11 +649,11 @@ __SYSCALL(__NR_inotify_add_watch, sys_inotify_add_watch)
>  #define __NR_inotify_rm_watch 318
>  __SYSCALL(__NR_inotify_rm_watch, sys_inotify_rm_watch)
>  #define __NR_mbind 319
> -__SYSCALL(__NR_mbind, compat_sys_mbind)
> +__SYSCALL(__NR_mbind, sys_mbind)
>  #define __NR_get_mempolicy 320
> -__SYSCALL(__NR_get_mempolicy, compat_sys_get_mempolicy)
> +__SYSCALL(__NR_get_mempolicy, sys_get_mempolicy)
>  #define __NR_set_mempolicy 321
> -__SYSCALL(__NR_set_mempolicy, compat_sys_set_mempolicy)
> +__SYSCALL(__NR_set_mempolicy, sys_set_mempolicy)
>  #define __NR_openat 322
>  __SYSCALL(__NR_openat, compat_sys_openat)
>  #define __NR_mkdirat 323
> @@ -811,7 +811,7 @@ __SYSCALL(__NR_rseq, sys_rseq)
>  #define __NR_io_pgetevents 399
>  __SYSCALL(__NR_io_pgetevents, compat_sys_io_pgetevents)
>  #define __NR_migrate_pages 400
> -__SYSCALL(__NR_migrate_pages, compat_sys_migrate_pages)
> +__SYSCALL(__NR_migrate_pages, sys_migrate_pages)
>  #define __NR_kexec_file_load 401
>  __SYSCALL(__NR_kexec_file_load, sys_kexec_file_load)
>  /* 402 is unused */
> diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl 
> b/arch/mips/kernel/syscalls/syscall_n32.tbl
> index 7fa1ca45e44c..15fda882d07e 100644
> --- a/arch/mips/kernel/syscalls/syscall_n32.tbl
> +++ b/arch/mips/kernel/syscalls/syscall_n32.tbl
> @@ -239,9 +239,9 @@
>  228  n32 clock_nanosleep sys_clock_nanosleep_time32
>  229  n32 tgkill  sys_tgkill
>  230  n32 utimes  sys_utimes_time32
> -231  n32 mbind   compat_sys_mbind
> -232  n32 get_mempolicy   compat_sys_get_mempolicy
> -233  n32 set_mempolicy   compat_sys_set_mempolicy
> +231  n32 mbind   sys_mbind
> +232  n32 get_mempolicy   sys_get_mempolicy
> +233  n32 set_mempolicy   sys_set_mempolicy
>  234  n32 mq_open compat_sys_mq_open
>  235  n32 mq_unlink   sys_mq_unlink
>  236  n32 mq_timedsendsys_mq_timedsend_time32
> @@ -258,7 +258,7 @@
>  247  n32 inotify_initsys_inotify_init
>  248  n32 inotify_add_watch   sys_inotify_add_watch
>  249  n32 inotify_rm_watchsys_inotify_rm_watch
> -250  n32 migrate_pages   compat_sys_migrate_pages
> +250  n32 migrate_pages   sys_migrate_pages
>  251  n32 openat  sys_openat
>  252  n32 

Re: [PATCH 3/4] mm: remove compat_sys_move_pages

2020-09-18 Thread Christoph Hellwig
On Fri, Sep 18, 2020 at 03:24:38PM +0200, Arnd Bergmann wrote:
> The compat move_pages() implementation uses compat_alloc_user_space()
> for converting the pointer array. Moving the compat handling into
> the function itself is a bit simpler and lets us avoid the
> compat_alloc_user_space() call.
> 
> Signed-off-by: Arnd Bergmann 
> ---
>  arch/arm64/include/asm/unistd32.h |  2 +-
>  arch/mips/kernel/syscalls/syscall_n32.tbl |  2 +-
>  arch/mips/kernel/syscalls/syscall_o32.tbl |  2 +-
>  arch/parisc/kernel/syscalls/syscall.tbl   |  2 +-
>  arch/powerpc/kernel/syscalls/syscall.tbl  |  2 +-
>  arch/s390/kernel/syscalls/syscall.tbl |  2 +-
>  arch/sparc/kernel/syscalls/syscall.tbl|  2 +-
>  arch/x86/entry/syscalls/syscall_32.tbl|  2 +-
>  arch/x86/entry/syscalls/syscall_64.tbl|  2 +-
>  include/linux/compat.h|  5 ---
>  include/uapi/asm-generic/unistd.h |  2 +-
>  kernel/sys_ni.c   |  1 -
>  mm/migrate.c  | 45 +++
>  13 files changed, 32 insertions(+), 39 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/unistd32.h 
> b/arch/arm64/include/asm/unistd32.h
> index b6517df74037..af793775ba98 100644
> --- a/arch/arm64/include/asm/unistd32.h
> +++ b/arch/arm64/include/asm/unistd32.h
> @@ -699,7 +699,7 @@ __SYSCALL(__NR_tee, sys_tee)
>  #define __NR_vmsplice 343
>  __SYSCALL(__NR_vmsplice, compat_sys_vmsplice)
>  #define __NR_move_pages 344
> -__SYSCALL(__NR_move_pages, compat_sys_move_pages)
> +__SYSCALL(__NR_move_pages, sys_move_pages)
>  #define __NR_getcpu 345
>  __SYSCALL(__NR_getcpu, sys_getcpu)
>  #define __NR_epoll_pwait 346
> diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl 
> b/arch/mips/kernel/syscalls/syscall_n32.tbl
> index ad157aab4c09..7fa1ca45e44c 100644
> --- a/arch/mips/kernel/syscalls/syscall_n32.tbl
> +++ b/arch/mips/kernel/syscalls/syscall_n32.tbl
> @@ -279,7 +279,7 @@
>  268  n32 sync_file_range sys_sync_file_range
>  269  n32 tee sys_tee
>  270  n32 vmsplicecompat_sys_vmsplice
> -271  n32 move_pages  compat_sys_move_pages
> +271  n32 move_pages  sys_move_pages
>  272  n32 set_robust_list compat_sys_set_robust_list
>  273  n32 get_robust_list compat_sys_get_robust_list
>  274  n32 kexec_load  sys_kexec_load
> diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl 
> b/arch/mips/kernel/syscalls/syscall_o32.tbl
> index 57baf6c8008f..194c7fbeedf7 100644
> --- a/arch/mips/kernel/syscalls/syscall_o32.tbl
> +++ b/arch/mips/kernel/syscalls/syscall_o32.tbl
> @@ -319,7 +319,7 @@
>  305  o32 sync_file_range sys_sync_file_range 
> sys32_sync_file_range
>  306  o32 tee sys_tee
>  307  o32 vmsplicesys_vmsplice
> compat_sys_vmsplice
> -308  o32 move_pages  sys_move_pages  
> compat_sys_move_pages
> +308  o32 move_pages  sys_move_pages
>  309  o32 set_robust_list sys_set_robust_list 
> compat_sys_set_robust_list
>  310  o32 get_robust_list sys_get_robust_list 
> compat_sys_get_robust_list
>  311  o32 kexec_load  sys_kexec_load
> diff --git a/arch/parisc/kernel/syscalls/syscall.tbl 
> b/arch/parisc/kernel/syscalls/syscall.tbl
> index 778bf166d7bd..5c17edaffe70 100644
> --- a/arch/parisc/kernel/syscalls/syscall.tbl
> +++ b/arch/parisc/kernel/syscalls/syscall.tbl
> @@ -331,7 +331,7 @@
>  292  64  sync_file_range sys_sync_file_range
>  293  common  tee sys_tee
>  294  common  vmsplicesys_vmsplice
> compat_sys_vmsplice
> -295  common  move_pages  sys_move_pages  
> compat_sys_move_pages
> +295  common  move_pages  sys_move_pages
>  296  common  getcpu  sys_getcpu
>  297  common  epoll_pwait sys_epoll_pwait 
> compat_sys_epoll_pwait
>  298  common  statfs64sys_statfs64
> compat_sys_statfs64
> diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl 
> b/arch/powerpc/kernel/syscalls/syscall.tbl
> index f128ba8b9a71..04fb42d7b377 100644
> --- a/arch/powerpc/kernel/syscalls/syscall.tbl
> +++ b/arch/powerpc/kernel/syscalls/syscall.tbl
> @@ -389,7 +389,7 @@
>  298  common  faccessat   sys_faccessat
>  299  common  get_robust_list sys_get_robust_list 
> compat_sys_get_robust_list
>  300  common  set_robust_list sys_set_robust_list 
> compat_sys_set_robust_list
> -301  common  move_pages  sys_move_pages  
> compat_sys_move_pages
> +301  common  

Re: [PATCH 2/4] kexec: remove compat_sys_kexec_load syscall

2020-09-18 Thread Christoph Hellwig
On Fri, Sep 18, 2020 at 03:24:37PM +0200, Arnd Bergmann wrote:
> The compat version of sys_kexec_load() uses compat_alloc_user_space to
> convert the user-provided arguments into the native format.
> 
> Move the conversion into the regular implementation with
> an in_compat_syscall() check to simplify it and avoid the
> compat_alloc_user_space() call.
> 
> Signed-off-by: Arnd Bergmann 
> ---
>  arch/arm64/include/asm/unistd32.h |  2 +-
>  arch/mips/kernel/syscalls/syscall_n32.tbl |  2 +-
>  arch/mips/kernel/syscalls/syscall_o32.tbl |  2 +-
>  arch/parisc/kernel/syscalls/syscall.tbl   |  2 +-
>  arch/powerpc/kernel/syscalls/syscall.tbl  |  2 +-
>  arch/s390/kernel/syscalls/syscall.tbl |  2 +-
>  arch/sparc/kernel/syscalls/syscall.tbl|  2 +-
>  arch/x86/entry/syscalls/syscall_32.tbl|  2 +-
>  arch/x86/entry/syscalls/syscall_64.tbl|  2 +-
>  include/linux/compat.h|  6 --
>  include/uapi/asm-generic/unistd.h |  2 +-
>  kernel/kexec.c| 75 ++-
>  12 files changed, 29 insertions(+), 72 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/unistd32.h 
> b/arch/arm64/include/asm/unistd32.h
> index 734860ac7cf9..b6517df74037 100644
> --- a/arch/arm64/include/asm/unistd32.h
> +++ b/arch/arm64/include/asm/unistd32.h
> @@ -705,7 +705,7 @@ __SYSCALL(__NR_getcpu, sys_getcpu)
>  #define __NR_epoll_pwait 346
>  __SYSCALL(__NR_epoll_pwait, compat_sys_epoll_pwait)
>  #define __NR_kexec_load 347
> -__SYSCALL(__NR_kexec_load, compat_sys_kexec_load)
> +__SYSCALL(__NR_kexec_load, sys_kexec_load)
>  #define __NR_utimensat 348
>  __SYSCALL(__NR_utimensat, sys_utimensat_time32)
>  #define __NR_signalfd 349
> diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl 
> b/arch/mips/kernel/syscalls/syscall_n32.tbl
> index f9df9edb67a4..ad157aab4c09 100644
> --- a/arch/mips/kernel/syscalls/syscall_n32.tbl
> +++ b/arch/mips/kernel/syscalls/syscall_n32.tbl
> @@ -282,7 +282,7 @@
>  271  n32 move_pages  compat_sys_move_pages
>  272  n32 set_robust_list compat_sys_set_robust_list
>  273  n32 get_robust_list compat_sys_get_robust_list
> -274  n32 kexec_load  compat_sys_kexec_load
> +274  n32 kexec_load  sys_kexec_load
>  275  n32 getcpu  sys_getcpu
>  276  n32 epoll_pwait compat_sys_epoll_pwait
>  277  n32 ioprio_set  sys_ioprio_set
> diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl 
> b/arch/mips/kernel/syscalls/syscall_o32.tbl
> index 195b43cf27c8..57baf6c8008f 100644
> --- a/arch/mips/kernel/syscalls/syscall_o32.tbl
> +++ b/arch/mips/kernel/syscalls/syscall_o32.tbl
> @@ -322,7 +322,7 @@
>  308  o32 move_pages  sys_move_pages  
> compat_sys_move_pages
>  309  o32 set_robust_list sys_set_robust_list 
> compat_sys_set_robust_list
>  310  o32 get_robust_list sys_get_robust_list 
> compat_sys_get_robust_list
> -311  o32 kexec_load  sys_kexec_load  
> compat_sys_kexec_load
> +311  o32 kexec_load  sys_kexec_load
>  312  o32 getcpu  sys_getcpu
>  313  o32 epoll_pwait sys_epoll_pwait 
> compat_sys_epoll_pwait
>  314  o32 ioprio_set  sys_ioprio_set
> diff --git a/arch/parisc/kernel/syscalls/syscall.tbl 
> b/arch/parisc/kernel/syscalls/syscall.tbl
> index def64d221cd4..778bf166d7bd 100644
> --- a/arch/parisc/kernel/syscalls/syscall.tbl
> +++ b/arch/parisc/kernel/syscalls/syscall.tbl
> @@ -336,7 +336,7 @@
>  297  common  epoll_pwait sys_epoll_pwait 
> compat_sys_epoll_pwait
>  298  common  statfs64sys_statfs64
> compat_sys_statfs64
>  299  common  fstatfs64   sys_fstatfs64   
> compat_sys_fstatfs64
> -300  common  kexec_load  sys_kexec_load  
> compat_sys_kexec_load
> +300  common  kexec_load  sys_kexec_load
>  301  32  utimensat   sys_utimensat_time32
>  301  64  utimensat   sys_utimensat
>  302  common  signalfdsys_signalfd
> compat_sys_signalfd
> diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl 
> b/arch/powerpc/kernel/syscalls/syscall.tbl
> index c2d737ff2e7b..f128ba8b9a71 100644
> --- a/arch/powerpc/kernel/syscalls/syscall.tbl
> +++ b/arch/powerpc/kernel/syscalls/syscall.tbl
> @@ -350,7 +350,7 @@
>  265  64  mq_timedreceive sys_mq_timedreceive
>  266  nospu   mq_notify   sys_mq_notify   
> compat_sys_mq_notify
>  267  nospu   mq_getsetattr   sys_mq_getsetattr   
> compat_sys_mq_getsetattr
> -268  nospu   kexec_load   

Re: [PATCH 1/4] x86: add __X32_COND_SYSCALL() macro

2020-09-18 Thread Christoph Hellwig
On Fri, Sep 18, 2020 at 03:24:36PM +0200, Arnd Bergmann wrote:
> sys_move_pages() is an optional syscall, and once we remove
> the compat version of it in favor of the native one with an
> in_compat_syscall() check, the x32 syscall table refers to
> a __x32_sys_move_pages symbol that may not exist when the
> syscall is disabled.
> 
> Change the COND_SYSCALL() definition on x86 to also include
> the redirection for x32.
> 
> Signed-off-by: Arnd Bergmann 

Adding the x86 maintainers and Brian Gerst.  Brian proposed another
problem to the mess that most of the compat syscall handlers used by
x32 here:

   https://lkml.org/lkml/2020/6/16/664

hpa didn't particularly like it, but with your and my pending series
we'll soon use more native than compat syscalls for x32, so something
will need to change..

> ---
>  arch/x86/include/asm/syscall_wrapper.h | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/arch/x86/include/asm/syscall_wrapper.h 
> b/arch/x86/include/asm/syscall_wrapper.h
> index a84333adeef2..5eacd35a7f97 100644
> --- a/arch/x86/include/asm/syscall_wrapper.h
> +++ b/arch/x86/include/asm/syscall_wrapper.h
> @@ -171,12 +171,16 @@ extern long __ia32_sys_ni_syscall(const struct pt_regs 
> *regs);
>   __SYS_STUBx(x32, compat_sys##name,  \
>   SC_X86_64_REGS_TO_ARGS(x, __VA_ARGS__))
>  
> +#define __X32_COND_SYSCALL(name) \
> + __COND_SYSCALL(x32, sys_##name)
> +
>  #define __X32_COMPAT_COND_SYSCALL(name)  
> \
>   __COND_SYSCALL(x32, compat_sys_##name)
>  
>  #define __X32_COMPAT_SYS_NI(name)\
>   __SYS_NI(x32, compat_sys_##name)
>  #else /* CONFIG_X86_X32 */
> +#define __X32_COND_SYSCALL(name)
>  #define __X32_COMPAT_SYS_STUB0(name)
>  #define __X32_COMPAT_SYS_STUBx(x, name, ...)
>  #define __X32_COMPAT_COND_SYSCALL(name)
> @@ -253,6 +257,7 @@ extern long __ia32_sys_ni_syscall(const struct pt_regs 
> *regs);
>   static long __do_sys_##sname(const struct pt_regs *__unused)
>  
>  #define COND_SYSCALL(name)   \
> + __X32_COND_SYSCALL(name)\
>   __X64_COND_SYSCALL(name)\
>   __IA32_COND_SYSCALL(name)
>  
> -- 
> 2.27.0
> 
---end quoted text---

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v3 1/1] fs: move kernel_read_file* to its own include file

2020-06-24 Thread Christoph Hellwig
Looks good,

Reviewed-by: Christoph Hellwig 

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 0/3] fs: reduce export usage of kerne_read*() calls

2020-05-18 Thread Christoph Hellwig
On Fri, May 15, 2020 at 09:29:33PM +, Luis Chamberlain wrote:
> On Wed, May 13, 2020 at 11:17:36AM -0700, Christoph Hellwig wrote:
> > Can you also move kernel_read_* out of fs.h?  That header gets pulled
> > in just about everywhere and doesn't really need function not related
> > to the general fs interface.
> 
> Sure, where should I dump these?

Maybe a new linux/kernel_read_file.h?  Bonus points for a small top
of the file comment explaining the point of the interface, which I
still don't get :)

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 0/3] fs: reduce export usage of kerne_read*() calls

2020-05-13 Thread Christoph Hellwig
Can you also move kernel_read_* out of fs.h?  That header gets pulled
in just about everywhere and doesn't really need function not related
to the general fs interface.

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v6 25/34] swiotlb: Add warnings for use of bounce buffers with SME

2017-06-08 Thread Christoph Hellwig
On Wed, Jun 07, 2017 at 02:17:32PM -0500, Tom Lendacky wrote:
> Add warnings to let the user know when bounce buffers are being used for
> DMA when SME is active.  Since the bounce buffers are not in encrypted
> memory, these notifications are to allow the user to determine some
> appropriate action - if necessary.

And what would the action be?  Do we need a boot or other option to
disallow this fallback for people who care deeply?

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] Revert "mm: rename _count, field of the struct page, to _refcount"

2016-06-16 Thread Christoph Hellwig
On Thu, Jun 16, 2016 at 11:22:46AM +0200, Vitaly Kuznetsov wrote:
> _count -> _refcount rename in commit 0139aa7b7fa12 ("mm: rename _count,
> field of the struct page, to _refcount") broke kdump. makedumpfile(8) does
> stuff like READ_MEMBER_OFFSET("page._count", page._count) and fails. While
> it is definitely possible to fix this particular tool I'm not sure about
> other tools which might be doing the same.
> 
> I suggest we remember the "we don't break userspace" rule and revert for
> 4.7 while it's not too late.

Err, sorry - this is not "userspace".  It's crazy crap digging into
kernel internal structure.

The rename was absolutely useful, so fix up your stinking pike in kdump.


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 31/40] trace syscalls: Convert various generic compat syscalls

2010-06-23 Thread Christoph Hellwig
On Wed, Jun 23, 2010 at 02:35:38PM +0200, Andi Kleen wrote:
 I haven't heard any complains about existing syscalls wrappers.
 
 At least for me they always interrupt my grepping.
 
 
 What kind of annotations could solve that?
 
 If you put the annotation in a separate macro and leave the original
 prototype alone. Then C parsers could still parse it.

I personally hate the way SYSCALL_DEFINE works with passion, mostly
for the grep reason, but also because it looks horribly ugly.

But there is no reason not to be consistent here.  We already use
the wrappers for all native system calls, so leaving the compat
calls out doesn't make any sense.  And I'd cheer for anyone who
comes up with a better scheme for the native and compat wrappers.


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec