date:20221109

Re: [6.1.0-rc4-next-20221108] Boot failure on powerpc

2022-11-09 Thread Jason A. Donenfeld

Should be fixed already in today's next.

Re: [6.1.0-rc3-next-20221104] Boot failure - kernel BUG at mm/memblock.c:519

2022-11-09 Thread Yajun Deng

Hey Mike,

Can you help me test the attached file? 
Please use this new patch instead of the one in memblock tree.

November 8, 2022 3:55 PM, "Mike Rapoport"  wrote:

> Hi Yajun,
> 
> On Tue, Nov 08, 2022 at 02:27:53AM +, Yajun Deng wrote:
> 
>> Hi Sachin,
>> I didn't have a powerpc architecture machine. I don't know why this happened.
>> 
>> Hi Mike,
>> Do you have any suggestions?
> 
> You can try reproducing the bug qemu or work with Sachin to debug the
> issue.
> 
>> I tested in tools/testing/memblock, and it was successful.
> 
> Memblock tests provide limited coverage still and they don't deal with all
> possible cases.
> 
> For now I'm dropping this patch from the memblock tree until the issue is
> fixed.
> 
>> November 6, 2022 8:07 PM, "Sachin Sant"  wrote:
>> 
>> While booting recent linux-next on a IBM Power10 Server LPAR
>> following crash is observed:
>> 
>> [ 0.00] numa: Partition configured for 32 NUMA nodes.
>> [ 0.00] [ cut here ]
>> [ 0.00] kernel BUG at mm/memblock.c:519!
>> [ 0.00] Oops: Exception in kernel mode, sig: 5 [#1]
>> [ 0.00] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
>> [ 0.00] Modules linked in:
>> [ 0.00] CPU: 0 PID: 0 Comm: swapper Not tainted 6.1.0-rc3-next-20221104 
>> #1
>> [ 0.00] Hardware name: IBM,9080-HEX POWER10 (raw) 0x800200 0xf06 
>> of:IBM,FW1030.00
>> (NH1030_026) hv:phyp pSeries
>> [ 0.00] NIP: c04ba240 LR: c04bb240 CTR: c04ba210
>> [ 0.00] REGS: c2a8b7b0 TRAP: 0700 Not tainted 
>> (6.1.0-rc3-next-20221104)
>> [ 0.00] MSR: 80021033  CR: 24042424 XER: 
>> 0001
>> [ 0.00] CFAR: c04ba290 IRQMASK: 1
>> [ 0.00] GPR00: c04bb240 c2a8ba50 c136ee00 
>> c010f3ac00a8
>> [ 0.00] GPR04:  c010f3ac0090 0010f3ac 
>> 0d00
>> [ 0.00] GPR08: 0001 0007 0001 
>> 0081
>> [ 0.00] GPR12: c04ba210 c2e1  
>> 000d
>> [ 0.00] GPR16: 0f6be620 0f6be8e8 0f6be788 
>> 0f6bed58
>> [ 0.00] GPR20: 0f6f6d58 c29a8de8 0010f3ad8800 
>> 0080
>> [ 0.00] GPR24: 0010f3ad7b00  0100 
>> 0d00
>> [ 0.00] GPR28: 0010f3ad7b00 c29a8de8 c29a8e00 
>> 0006
>> [ 0.00] NIP [c04ba240] memblock_merge_regions.isra.12+0x40/0x130
>> [ 0.00] LR [c04bb240] memblock_add_range+0x190/0x300
>> [ 0.00] Call Trace:
>> [ 0.00] [c2a8ba50] [0100] 0x100 (unreliable)
>> [ 0.00] [c2a8ba90] [c04bb240] 
>> memblock_add_range+0x190/0x300
>> [ 0.00] [c2a8bb10] [c04bb5e0] memblock_reserve+0x70/0xd0
>> [ 0.00] [c2a8bba0] [c2045234] 
>> memblock_alloc_range_nid+0x11c/0x1e8
>> [ 0.00] [c2a8bc60] [c20453a4] 
>> memblock_alloc_internal+0xa4/0x110
>> [ 0.00] [c2a8bcb0] [c20456cc] 
>> memblock_alloc_try_nid+0x94/0xcc
>> [ 0.00] [c2a8bd40] [c200b570] alloc_paca_data+0x7c/0xcc
>> [ 0.00] [c2a8bdb0] [c200b770] allocate_paca+0x8c/0x28c
>> [ 0.00] [c2a8be50] [c200a26c] setup_arch+0x1c4/0x4d8
>> [ 0.00] [c2a8bed0] [c2004378] start_kernel+0xb4/0xa84
>> [ 0.00] [c2a8bf90] [c000da90] start_here_common+0x1c/0x20
>> [ 0.00] Instruction dump:
>> [ 0.00] 7c0802a6 fba1ffe8 fbc1fff0 fbe1fff8 7c7d1b78 7c9e2378 3be0 
>> f8010010
>> [ 0.00] f821ffc1 e923 3969 480c <0b0a> 7d3f4b78 393f0001 
>> 7fbf5840
>> [ 0.00] ---[ end trace  ]---
>> [ 0.00]
>> [ 0.00] Kernel panic - not syncing: Fatal exception
>> [ 0.00] Rebooting in 180 seconds..
>> 
>> This problem was introduced with next-20221101. Git bisect points to
>> following patch
>> 
>> commit 3f82c9c4ac377082e1230f5299e0ccce07b15e12
>> Date: Tue Oct 25 15:09:43 2022 +0800
>> memblock: don't run loop in memblock_add_range() twice
>> 
>> Reverting this patch helps boot the kernel to login prompt.
>> 
>> Have attached .config
>> 
>> - Sachin
> 
> --
> Sincerely yours,
> Mike.


0001-memblock-don-t-run-loop-in-memblock_add_range-twice-.patch
Description: Binary data

Re: [PATCH 2/2] tools/perf: Fix printing field separator in CSV metrics output

2022-11-09 Thread Athira Rajeev




> On 09-Nov-2022, at 2:27 AM, Arnaldo Carvalho de Melo  wrote:
> 
> Em Wed, Nov 02, 2022 at 02:07:06PM +0530, Athira Rajeev escreveu:
>> 
>> 
>>> On 18-Oct-2022, at 2:26 PM, Athira Rajeev  
>>> wrote:
>>> 
>>> In perf stat with CSV output option, number of fields
>>> in metrics output is not matching with number of fields
>>> in other event output lines.
>>> 
>>> Sample output below after applying patch to fix
>>> printing os->prefix.
>>> 
>>> # ./perf stat -x, --per-socket -a -C 1 ls
>>> S0,1,1.89,msec,cpu-clock,1887692,100.00,1.013,CPUs utilized
>>> S0,1,2,,context-switches,1885842,100.00,1.060,K/sec
>>> S0,1,0,,cpu-migrations,1885374,100.00,0.000,/sec
>>> S0,1,2,,page-faults,1884880,100.00,1.060,K/sec
>>> S0,1,189544,,cycles,1263158,67.00,0.100,GHz
>>> S0,1,64602,,stalled-cycles-frontend,1876146,100.00,34.08,frontend 
>>> cycles idle
>>> S0,1,128241,,stalled-cycles-backend,1875512,100.00,67.66,backend cycles 
>>> idle
>>> S0,1,95578,,instructions,1874676,100.00,0.50,insn per cycle
>>> ===>S0,1,,,1.34,stalled cycles per insn
>>> 
>>> The above command line uses field separator as ","
>>> via "-x," option and per-socket option displays
>>> socket value as first field. But here the last line
>>> for "stalled cycles per insn" has more separators.
>>> Each csv output line is expected to have 8 field
>>> separatorsi (for the 9 fields), where as last line
>>> has 10 "," in the result. Patch fixes this issue.
>>> 
>>> The counter stats are displayed by function
>>> "perf_stat__print_shadow_stats" in code
>>> "util/stat-shadow.c". While printing the stats info
>>> for "stalled cycles per insn", function "new_line_csv"
>>> is used as new_line callback.
>>> 
>>> The fields printed in each line contains:
>>> "Socket_id,aggr nr,Avg,unit,event_name,run,enable_percent,ratio,unit"
>>> 
>>> The metric output prints Socket_id, aggr nr, ratio
>>> and unit. It has to skip through remaining five fields
>>> ie, Avg,unit,event_name,run,enable_percent. The csv
>>> line callback uses "os->nfields" to know the number of
>>> fields to skip to match with other lines.
>>> Currently it is set as:
>>> os.nfields = 3 + aggr_fields[config->aggr_mode] + (counter->cgrp ? 1 : 
>>> 0);
>>> 
>>> But in case of aggregation modes, csv_sep already
>>> gets printed along with each field (Function "aggr_printout"
>>> in util/stat-display.c). So aggr_fields can be
>>> removed from nfields. And fixed number of fields to
>>> skip has to be "4". This is to skip fields for:
>>> "avg, unit, event name, run, enable_percent"
>>> Example from line for instructions:
>>> "1.89,msec,cpu-clock,1887692,100.00"
>>> 
>>> This needs 4 csv separators. Patch removes aggr_fields
>>> and uses 4 as fixed number of os->nfields to skip.
>>> 
>>> After the patch:
>>> 
>>> # ./perf stat -x, --per-socket -a -C 1 ls
>>> S0,1,1.92,msec,cpu-clock,1917648,100.00,1.010,CPUs utilized
>>> S0,1,54,,context-switches,1916762,100.00,28.176,K/sec
>>> ---
>>> S0,1,528693,,instructions,1908854,100.00,0.36,insn per cycle
>>> S0,1,,1.81,stalled cycles per insn
>>> 
>>> Fixes: 92a61f6412d3 ("perf stat: Implement CSV metrics output")
>>> Reported-by: Disha Goel 
>>> Signed-off-by: Athira Rajeev 
>> 
>> Hi All,
>> 
>> Looking for review comments for this change.
> 
> This clashed with a patch from Namhyung that I just applied:
> 
> http://lore.kernel.org/lkml/20221107213314.3239159-2-namhy...@kernel.org
> 
> Can you please check? I just applied the other patch in this series.
> 
> Thanks,
> 
> - Arnaldo

Hi Arnaldo,

Thanks for checking the patch series.
Please find the updated patch below which is created on top of perf/urgent.

>From dde8f830ad318c9111c3fea5415fd8170b4c51bd Mon Sep 17 00:00:00 2001
From: Athira Rajeev 
Date: Tue, 18 Oct 2022 14:26:05 +0530
Subject: [PATCH] tools/perf: Fix printing field separator in CSV metrics
 output

In perf stat with CSV output option, number of fields
in metrics output is not matching with number of fields
in other event output lines.

Sample output below after applying patch to fix
printing os->prefix.

# ./perf stat -x, --per-socket -a -C 1 ls
S0,1,1.89,msec,cpu-clock,1887692,100.00,1.013,CPUs utilized
S0,1,2,,context-switches,1885842,100.00,1.060,K/sec
S0,1,0,,cpu-migrations,1885374,100.00,0.000,/sec
S0,1,2,,page-faults,1884880,100.00,1.060,K/sec
S0,1,189544,,cycles,1263158,67.00,0.100,GHz
S0,1,64602,,stalled-cycles-frontend,1876146,100.00,34.08,frontend 
cycles idle
S0,1,128241,,stalled-cycles-backend,1875512,100.00,67.66,backend cycles 
idle
S0,1,95578,,instructions,1874676,100.00,0.50,insn per cycle
===>S0,1,,,1.34,stalled cycles per insn

The above command line uses field separator as ","
via "-x," option and per-socket option displays
socket value as first field. But here the last line
for "stalled cycles per insn" has more separators.
Each csv output line is expected to have 8

Re: [6.1.0-rc3-next-20221104] Boot failure - kernel BUG at mm/memblock.c:519

2022-11-09 Thread Yajun Deng

November 9, 2022 6:03 PM, "Yajun Deng"  wrote:

> Hey Mike,
> 
Sorry, this email should be sent to Sachin but not Mike. 
Please forgive my confusion. So:

Hey Sachin,
Can you help me test the attached file? 
Please use this new patch instead of the one in memblock tree.

> Can you help me test the attached file? 
> Please use this new patch instead of the one in memblock tree.
> 
> November 8, 2022 3:55 PM, "Mike Rapoport"  wrote:
> 
>> Hi Yajun,
>> 
>> On Tue, Nov 08, 2022 at 02:27:53AM +, Yajun Deng wrote:
>> 
>>> Hi Sachin,
>>> I didn't have a powerpc architecture machine. I don't know why this 
>>> happened.
>>> 
>>> Hi Mike,
>>> Do you have any suggestions?
>> 
>> You can try reproducing the bug qemu or work with Sachin to debug the
>> issue.
>> 
>>> I tested in tools/testing/memblock, and it was successful.
>> 
>> Memblock tests provide limited coverage still and they don't deal with all
>> possible cases.
>> 
>> For now I'm dropping this patch from the memblock tree until the issue is
>> fixed.
>> 
>>> November 6, 2022 8:07 PM, "Sachin Sant"  wrote:
>>> 
>>> While booting recent linux-next on a IBM Power10 Server LPAR
>>> following crash is observed:
>>> 
>>> [ 0.00] numa: Partition configured for 32 NUMA nodes.
>>> [ 0.00] [ cut here ]
>>> [ 0.00] kernel BUG at mm/memblock.c:519!
>>> [ 0.00] Oops: Exception in kernel mode, sig: 5 [#1]
>>> [ 0.00] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
>>> [ 0.00] Modules linked in:
>>> [ 0.00] CPU: 0 PID: 0 Comm: swapper Not tainted 6.1.0-rc3-next-20221104 
>>> #1
>>> [ 0.00] Hardware name: IBM,9080-HEX POWER10 (raw) 0x800200 0xf06 
>>> of:IBM,FW1030.00
>>> (NH1030_026) hv:phyp pSeries
>>> [ 0.00] NIP: c04ba240 LR: c04bb240 CTR: c04ba210
>>> [ 0.00] REGS: c2a8b7b0 TRAP: 0700 Not tainted 
>>> (6.1.0-rc3-next-20221104)
>>> [ 0.00] MSR: 80021033  CR: 24042424 XER: 
>>> 0001
>>> [ 0.00] CFAR: c04ba290 IRQMASK: 1
>>> [ 0.00] GPR00: c04bb240 c2a8ba50 c136ee00 
>>> c010f3ac00a8
>>> [ 0.00] GPR04:  c010f3ac0090 0010f3ac 
>>> 0d00
>>> [ 0.00] GPR08: 0001 0007 0001 
>>> 0081
>>> [ 0.00] GPR12: c04ba210 c2e1  
>>> 000d
>>> [ 0.00] GPR16: 0f6be620 0f6be8e8 0f6be788 
>>> 0f6bed58
>>> [ 0.00] GPR20: 0f6f6d58 c29a8de8 0010f3ad8800 
>>> 0080
>>> [ 0.00] GPR24: 0010f3ad7b00  0100 
>>> 0d00
>>> [ 0.00] GPR28: 0010f3ad7b00 c29a8de8 c29a8e00 
>>> 0006
>>> [ 0.00] NIP [c04ba240] memblock_merge_regions.isra.12+0x40/0x130
>>> [ 0.00] LR [c04bb240] memblock_add_range+0x190/0x300
>>> [ 0.00] Call Trace:
>>> [ 0.00] [c2a8ba50] [0100] 0x100 (unreliable)
>>> [ 0.00] [c2a8ba90] [c04bb240] 
>>> memblock_add_range+0x190/0x300
>>> [ 0.00] [c2a8bb10] [c04bb5e0] memblock_reserve+0x70/0xd0
>>> [ 0.00] [c2a8bba0] [c2045234] 
>>> memblock_alloc_range_nid+0x11c/0x1e8
>>> [ 0.00] [c2a8bc60] [c20453a4] 
>>> memblock_alloc_internal+0xa4/0x110
>>> [ 0.00] [c2a8bcb0] [c20456cc] 
>>> memblock_alloc_try_nid+0x94/0xcc
>>> [ 0.00] [c2a8bd40] [c200b570] alloc_paca_data+0x7c/0xcc
>>> [ 0.00] [c2a8bdb0] [c200b770] allocate_paca+0x8c/0x28c
>>> [ 0.00] [c2a8be50] [c200a26c] setup_arch+0x1c4/0x4d8
>>> [ 0.00] [c2a8bed0] [c2004378] start_kernel+0xb4/0xa84
>>> [ 0.00] [c2a8bf90] [c000da90] 
>>> start_here_common+0x1c/0x20
>>> [ 0.00] Instruction dump:
>>> [ 0.00] 7c0802a6 fba1ffe8 fbc1fff0 fbe1fff8 7c7d1b78 7c9e2378 3be0 
>>> f8010010
>>> [ 0.00] f821ffc1 e923 3969 480c <0b0a> 7d3f4b78 
>>> 393f0001 7fbf5840
>>> [ 0.00] ---[ end trace  ]---
>>> [ 0.00]
>>> [ 0.00] Kernel panic - not syncing: Fatal exception
>>> [ 0.00] Rebooting in 180 seconds..
>>> 
>>> This problem was introduced with next-20221101. Git bisect points to
>>> following patch
>>> 
>>> commit 3f82c9c4ac377082e1230f5299e0ccce07b15e12
>>> Date: Tue Oct 25 15:09:43 2022 +0800
>>> memblock: don't run loop in memblock_add_range() twice
>>> 
>>> Reverting this patch helps boot the kernel to login prompt.
>>> 
>>> Have attached .config
>>> 
>>> - Sachin
>> 
>> --
>> Sincerely yours,
>> Mike.


0001-memblock-don-t-run-loop-in-memblock_add_range-twice-.patch
Description: Binary data

Re: [6.1.0-rc3-next-20221104] Boot failure - kernel BUG at mm/memblock.c:519

2022-11-09 Thread Sachin Sant




> On 09-Nov-2022, at 3:55 PM, Yajun Deng  wrote:
> 
> November 9, 2022 6:03 PM, "Yajun Deng"  wrote:
> 
>> Hey Mike,
>> 
> Sorry, this email should be sent to Sachin but not Mike. 
> Please forgive my confusion. So:
> 
> Hey Sachin,
> Can you help me test the attached file? 
> Please use this new patch instead of the one in memblock tree.

Thanks for the fix. With the updated patch kernel boots correctly.

Tested-by: Sachin Sant mailto:sach...@linux.ibm.com>>

- Sachin

Re: [6.1.0-rc4-next-20221108] Boot failure on powerpc

2022-11-09 Thread Sachin Sant




> On 09-Nov-2022, at 3:25 PM, Jason A. Donenfeld  wrote:
> 
> Should be fixed already in today's next.

Yup, thanks. next-20221109 boots successfully.

- Sachin

Re: [6.1.0-rc3-next-20221104] Boot failure - kernel BUG at mm/memblock.c:519

2022-11-09 Thread Yajun Deng

November 9, 2022 6:55 PM, "Sachin Sant"  wrote:

>> On 09-Nov-2022, at 3:55 PM, Yajun Deng  wrote:
>> 
>> November 9, 2022 6:03 PM, "Yajun Deng"  wrote:
>> 
>>> Hey Mike,
>> 
>> Sorry, this email should be sent to Sachin but not Mike.
>> Please forgive my confusion. So:
>> 
>> Hey Sachin,
>> Can you help me test the attached file?
>> Please use this new patch instead of the one in memblock tree.
> 
> Thanks for the fix. With the updated patch kernel boots correctly.
> 

Thanks for your test results.

Hi Mike,
Do you have any other suggestions for this patch? If not, I'll send a v3 patch.

> Tested-by: Sachin Sant >
> 
> - Sachin

Re: [6.1.0-rc3-next-20221104] Boot failure - kernel BUG at mm/memblock.c:519

2022-11-09 Thread Mike Rapoport

Hi Yajun,

On Wed, Nov 09, 2022 at 11:32:27AM +, Yajun Deng wrote:
> November 9, 2022 6:55 PM, "Sachin Sant"  wrote:
> 
> >> On 09-Nov-2022, at 3:55 PM, Yajun Deng  wrote:
> >> 
> >> November 9, 2022 6:03 PM, "Yajun Deng"  wrote:
> >> 
> >>> Hey Mike,
> >> 
> >> Sorry, this email should be sent to Sachin but not Mike.
> >> Please forgive my confusion. So:
> >> 
> >> Hey Sachin,
> >> Can you help me test the attached file?
> >> Please use this new patch instead of the one in memblock tree.
> > 
> > Thanks for the fix. With the updated patch kernel boots correctly.
> > 
> 
> Thanks for your test results.
> 
> Hi Mike,
> Do you have any other suggestions for this patch? If not, I'll send a v3 
> patch.

Unfortunately I don't think the new version has much value as it does not
really eliminate the second loop in case memory allocation is required.
I'd say the improvement is not worth the churn.
 
> > Tested-by: Sachin Sant >
> > 
> > - Sachin

-- 
Sincerely yours,
Mike.

Re: [6.1.0-rc3-next-20221104] Boot failure - kernel BUG at mm/memblock.c:519

2022-11-09 Thread Yajun Deng

November 9, 2022 7:42 PM, "Mike Rapoport"  wrote:

> Hi Yajun,
> 
> On Wed, Nov 09, 2022 at 11:32:27AM +, Yajun Deng wrote:
> 
>> November 9, 2022 6:55 PM, "Sachin Sant"  wrote:
>> 
>> On 09-Nov-2022, at 3:55 PM, Yajun Deng  wrote:
>> 
>> November 9, 2022 6:03 PM, "Yajun Deng"  wrote:
>> 
>> Hey Mike,
>> 
>> Sorry, this email should be sent to Sachin but not Mike.
>> Please forgive my confusion. So:
>> 
>> Hey Sachin,
>> Can you help me test the attached file?
>> Please use this new patch instead of the one in memblock tree.
>> 
>> Thanks for the fix. With the updated patch kernel boots correctly.
>> 
>> Thanks for your test results.
>> 
>> Hi Mike,
>> Do you have any other suggestions for this patch? If not, I'll send a v3 
>> patch.
> 
> Unfortunately I don't think the new version has much value as it does not
> really eliminate the second loop in case memory allocation is required.
> I'd say the improvement is not worth the churn.
> 
OK, I got it.

>> Tested-by: Sachin Sant >
>> 
>> - Sachin
> 
> --
> Sincerely yours,
> Mike.

Re: [PATCH 2/4] fs: define a firmware security filesystem named fwsecurityfs

2022-11-09 Thread Greg Kroah-Hartman

On Sun, Nov 06, 2022 at 04:07:42PM -0500, Nayna Jain wrote:
> securityfs is meant for Linux security subsystems to expose policies/logs
> or any other information. However, there are various firmware security
> features which expose their variables for user management via the kernel.
> There is currently no single place to expose these variables. Different
> platforms use sysfs/platform specific filesystem(efivarfs)/securityfs
> interface as they find it appropriate. Thus, there is a gap in kernel
> interfaces to expose variables for security features.
> 
> Define a firmware security filesystem (fwsecurityfs) to be used by
> security features enabled by the firmware. These variables are platform
> specific. This filesystem provides platforms a way to implement their
>  own underlying semantics by defining own inode and file operations.
> 
> Similar to securityfs, the firmware security filesystem is recommended
> to be exposed on a well known mount point /sys/firmware/security.
> Platforms can define their own directory or file structure under this path.
> 
> Example:
> 
> # mount -t fwsecurityfs fwsecurityfs /sys/firmware/security

Why not juset use securityfs in /sys/security/firmware/ instead?  Then
you don't have to create a new filesystem and convince userspace to
mount it in a specific location?

thanks,

greg k-h

Re: [PATCH v1 2/2] stackprotector: actually use get_random_canary()

2022-11-09 Thread Catalin Marinas

On Sun, Oct 23, 2022 at 10:32:08PM +0200, Jason A. Donenfeld wrote:
> The RNG always mixes in the Linux version extremely early in boot. It
> also always includes a cycle counter, not only during early boot, but
> each and every time it is invoked prior to being fully initialized.
> Together, this means that the use of additional xors inside of the
> various stackprotector.h files is superfluous and over-complicated.
> Instead, we can get exactly the same thing, but better, by just calling
> `get_random_canary()`.
> 
> Signed-off-by: Jason A. Donenfeld 
> ---
>  arch/arm/include/asm/stackprotector.h |  9 +
>  arch/arm64/include/asm/stackprotector.h   |  9 +

For arm64:

Acked-by: Catalin Marinas

Re: [PATCH 2/4] fs: define a firmware security filesystem named fwsecurityfs

2022-11-09 Thread Nayna

On 11/9/22 08:46, Greg Kroah-Hartman wrote:

On Sun, Nov 06, 2022 at 04:07:42PM -0500, Nayna Jain wrote:

securityfs is meant for Linux security subsystems to expose policies/logs
or any other information. However, there are various firmware security
features which expose their variables for user management via the kernel.
There is currently no single place to expose these variables. Different
platforms use sysfs/platform specific filesystem(efivarfs)/securityfs
interface as they find it appropriate. Thus, there is a gap in kernel
interfaces to expose variables for security features.

Define a firmware security filesystem (fwsecurityfs) to be used by
security features enabled by the firmware. These variables are platform
specific. This filesystem provides platforms a way to implement their
own underlying semantics by defining own inode and file operations.

Similar to securityfs, the firmware security filesystem is recommended
to be exposed on a well known mount point /sys/firmware/security.
Platforms can define their own directory or file structure under this path.

Example:

# mount -t fwsecurityfs fwsecurityfs /sys/firmware/security

Why not juset use securityfs in /sys/security/firmware/ instead? Then
you don't have to create a new filesystem and convince userspace to
mount it in a specific location?

From man 5 sysfs page:

/sys/firmware: This subdirectory contains interfaces for viewing and
manipulating firmware-specific objects and attributes.

/sys/kernel: This subdirectory contains various files and subdirectories
that provide information about the running kernel.

The security variables which are being exposed via fwsecurityfs are
managed by firmware, stored in firmware managed space and also often
consumed by firmware for enabling various security features.

From git commit b67dbf9d4c1987c370fd18fdc4cf9d8aaea604c2, the purpose
of securityfs(/sys/kernel/security) is to provide a common place for all
kernel LSMs. The idea of
fwsecurityfs(/sys/firmware/security) is to similarly provide a common
place for all firmware security objects.

/sys/firmware already exists. The patch now defines a new /security
directory in it for firmware security features. Using
/sys/kernel/security would mean scattering firmware objects in multiple
places and confusing the purpose of /sys/kernel and /sys/firmware.

Even though fwsecurityfs code is based on securityfs, since the two
filesystems expose different types of objects and have different
requirements, there are distinctions:

1. fwsecurityfs lets users create files in userspace, securityfs only
allows kernel subsystems to create files.

2. firmware and kernel objects may have different requirements. For
example, consideration of namespacing. As per my understanding,
namespacing is applied to kernel resources and not firmware resources.
That's why it makes sense to add support for namespacing in securityfs,
but we concluded that fwsecurityfs currently doesn't need it. Another
but similar example of it is: TPM space, which is exposed from hardware.
For containers, the TPM would be made as virtual/software TPM. Similarly
for firmware space for containers, it would have to be something
virtualized/software version of it.

3. firmware objects are persistent and read at boot time by interaction
with firmware, unlike kernel objects which are not persistent.

For a more detailed explanation refer to the LSS-NA 2022 "PowerVM
Platform Keystore - Securing Linux Credentials Locally" talk and
slides[1]. The link to previously posted RFC version is [2].

[1]
https://static.sched.com/hosted_files/lssna2022/25/NaynaJain_PowerVM_PlatformKeyStore_SecuringLinuxCredentialsLocally.pdf

[2] https://lore.kernel.org/linuxppc-dev/yrqqphi4+jhz1...@kroah.com/

Thanks & Regards,

- Nayna

thanks,

greg k-h

Re: [PATCH bpf-next v2 0/5] execmem_alloc for BPF programs

2022-11-09 Thread Christophe Leroy

+ linuxppc-dev list as we start mentioning powerpc.

Le 09/11/2022 à 18:43, Song Liu a écrit :
> On Wed, Nov 9, 2022 at 3:18 AM Mike Rapoport  wrote:
>>
> [...]
> 

 The proposed execmem_alloc() looks to me very much tailored for x86
 to be
 used as a replacement for module_alloc(). Some architectures have
 module_alloc() that is quite different from the default or x86
 version, so
 I'd expect at least some explanation how modules etc can use execmem_
 APIs
 without breaking !x86 architectures.
>>>
>>> I think this is fair, but I think we should ask ask ourselves - how
>>> much should we do in one step?
>>
>> I think that at least we need an evidence that execmem_alloc() etc can be
>> actually used by modules/ftrace/kprobes. Luis said that RFC v2 didn't work
>> for him at all, so having a core MM API for code allocation that only works
>> with BPF on x86 seems not right to me.
> 
> While using execmem_alloc() et. al. in module support is difficult, folks are
> making progress with it. For example, the prototype would be more difficult
> before CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC
> (introduced by Christophe).

By the way, the motivation for CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC 
was completely different: This was because on powerpc book3s/32, no-exec 
flaggin is per segment of size 256 Mbytes, so in order to provide 
STRICT_MODULES_RWX it was necessary to put data outside of the segment 
that holds module text in order to be able to flag RW data as no-exec.

But I'm happy if it can also serve other purposes.

> 
> We also have other users that we can onboard soon: BPF trampoline on
> x86_64, BPF jit and trampoline on arm64, and maybe also on powerpc and
> s390.
> 
>>
>>> For non-text_poke() architectures, the way you can make it work is have
>>> the API look like:
>>> execmem_alloc()  <- Does the allocation, but necessarily usable yet
>>> execmem_write()  <- Loads the mapping, doesn't work after finish()
>>> execmem_finish() <- Makes the mapping live (loaded, executable, ready)
>>>
>>> So for text_poke():
>>> execmem_alloc()  <- reserves the mapping
>>> execmem_write()  <- text_pokes() to the mapping
>>> execmem_finish() <- does nothing
>>>
>>> And non-text_poke():
>>> execmem_alloc()  <- Allocates a regular RW vmalloc allocation
>>> execmem_write()  <- Writes normally to it
>>> execmem_finish() <- does set_memory_ro()/set_memory_x() on it
>>>
>>> Non-text_poke() only gets the benefits of centralized logic, but the
>>> interface works for both. This is pretty much what the perm_alloc() RFC
>>> did to make it work with other arch's and modules. But to fit with the
>>> existing modules code (which is actually spread all over) and also
>>> handle RO sections, it also needed some additional bells and whistles.
>>
>> I'm less concerned about non-text_poke() part, but rather about
>> restrictions where code and data can live on different architectures and
>> whether these restrictions won't lead to inability to use the centralized
>> logic on, say, arm64 and powerpc.

Until recently, powerpc CPU didn't implement PC-relative data access. 
Only very recent powerpc CPUs (power10 only ?) have capability to do 
PC-relative accesses, but the kernel doesn't use it yet. So there's no 
constraint about distance between text and data. What matters is the 
distance between core kernel text and module text to avoid trampolines.

>>
>> For instance, if we use execmem_alloc() for modules, it means that data
>> sections should be allocated separately with plain vmalloc(). Will this
>> work universally? Or this will require special care with additional
>> complexity in the modules code?
>>
>>> So the question I'm trying to ask is, how much should we target for the
>>> next step? I first thought that this functionality was so intertwined,
>>> it would be too hard to do iteratively. So if we want to try
>>> iteratively, I'm ok if it doesn't solve everything.
>>
>> With execmem_alloc() as the first step I'm failing to see the large
>> picture. If we want to use it for modules, how will we allocate RO data?
>> with similar rodata_alloc() that uses yet another tree in vmalloc?
>> How the caching of large pages in vmalloc can be made useful for use cases
>> like secretmem and PKS?
> 
> If RO data causes problems with direct map fragmentation, we can use
> similar logic. I think we will need another tree in vmalloc for this case.
> Since the logic will be mostly identical, I personally don't think adding
> another tree is a big overhead.

On powerpc, kernel core RAM is not mapped by pages but is mapped by 
blocks. There are only two blocks: One ROX block which contains both 
text and rodata, and one RW block that contains everything else. Maybe 
the same can be done for modules. What matters is to be sure you never 
have WX memory. Having ROX rodata is not an issue.

Christophe

Re: [PATCH net-next v2 00/11] net: pcs: Add support for devices probed in the "usual" manner

2022-11-09 Thread Vladimir Oltean

On Thu, Nov 03, 2022 at 05:06:39PM -0400, Sean Anderson wrote:
> Several (later) patches in this series cannot be applied until a stable
> release has occured containing the dts updates.

New kernels must remain compatible with old device trees.

Re: [PATCH net-next v2 00/11] net: pcs: Add support for devices probed in the "usual" manner

2022-11-09 Thread Vladimir Oltean

On Thu, Nov 03, 2022 at 05:06:39PM -0400, Sean Anderson wrote:
> For a long time, PCSs have been tightly coupled with their MACs. For
> this reason, the MAC creates the "phy" or mdio device, and then passes
> it to the PCS to initialize. This has a few disadvantages:
> 
> - Each MAC must re-implement the same steps to look up/create a PCS
> - The PCS cannot use functions tied to device lifetime, such as devm_*.
> - Generally, the PCS does not have easy access to its device tree node

Is there a clear need to solve these disadvantages? There comes extra
runtime complexity with the PCS-as-device scheme (plus the extra
complexity needed to address the DT backwards compatibility problems
it causes; not addressed here).

41 matches

Mail list logo